2. Analysis of Time-Series Data

Set up working environment

Here are the packages I will be using:

# R Markdown
library(knitr)

# Projects
library(here)

# Statistics
library(psych)

# Tidyverse
library(readxl)
library(tidyverse)
library(broom)
library(glue)

I will import and take a look at the data.

tbl.2 <- read_excel(here('data', 'sde.xlsx'))

tbl.2[1:10, ] %>%
  kable

ID	GENDER	PSDE1	PSDE2	PSDE3	PSDE4	FSDE1	FSDE2	FSDE3	FSDE4	A0	A1
2	1	6.00	5.25	6.00	6.25	2.25	1.25	1.25	0.00	2.200	-0.675
6	1	3.50	3.25	2.25	2.25	2.50	-1.25	-0.75	-2.00	1.575	-1.300
8	1	6.75	7.00	5.75	4.75	2.25	2.75	1.00	-1.25	3.025	-1.225
9	1	4.50	5.00	3.25	3.00	1.00	1.25	-0.25	-1.25	1.425	-0.825
14	1	4.00	5.50	5.50	5.00	-1.25	1.50	2.00	1.50	-0.375	0.875
15	1	4.75	4.25	5.50	4.75	2.50	1.50	2.00	1.75	2.200	-0.175
23	1	6.00	6.50	6.75	6.00	2.25	2.25	2.25	2.75	2.150	0.150
24	0	6.50	6.25	5.75	5.75	2.00	2.75	2.25	2.25	2.275	0.025
27	1	7.00	6.50	5.25	6.75	2.25	2.75	2.50	2.25	2.475	-0.025
28	1	4.50	4.25	6.00	6.75	1.50	1.50	2.00	2.50	1.350	0.350

describe(tbl.2)[c('n', 'mean', 'sd', 'min', 'max')] %>%
  kable

	n	mean	sd	min	max
ID	73	115.6301370	69.6035413	2.00	231.000
GENDER	73	0.7808219	0.4165525	0.00	1.000
PSDE1	73	5.6780822	1.1322908	1.75	7.000
PSDE2	73	5.3390411	1.1639625	2.00	7.000
PSDE3	73	5.3184932	1.1080986	2.25	7.000
PSDE4	73	5.4212329	1.0679003	2.25	7.000
FSDE1	73	1.8869863	0.9913174	-1.25	3.000
FSDE2	73	1.6438356	1.1996222	-1.75	3.000
FSDE3	73	1.5821918	1.0712911	-1.50	3.000
FSDE4	73	1.5376712	1.2095333	-2.00	3.000
A0	73	1.8291096	0.9882470	-1.05	3.425
A1	73	-0.1109589	0.4293484	-1.30	1.075

No missing data, sufficient variation—looks good!

Finally, I will capture the reliabilities and time codes, as well as the categories for gender.

dbl.2.p <- c(.864, .915, .903, .886)
dbl.2.f <- c(.910, .927, .928, .947)
int.2.t <- 0:3

tbl.2 <- tbl.2 %>%
  mutate(gen = factor(GENDER, labels = c("female", "male")))

A. First-Difference Model

To implement the first-difference approach, I relied on Equation 2 in Finkel (1995):

ΔY = Δβ₀ + β₁ΔX + β₂ΔZ + Δε,

where Δ signifies the change in a variable between time periods and Z is an additional independent variable.

This equation models the change in the dependent variable between time t and time t+1 as a function of the change in the intercept and change in the independent variables.

Inputting our variables into Equation 2, we have:

ΔFSDE = Δβ₀ + β₁ΔPSDE + Δε.

To estimate this model, I will first create three new variables each for ΔFSDE and ΔPSDE by taking the difference in scores in each pair of scores from adjacent waves.

tbl.2.a <- tbl.2 %>%
  select(contains("SDE")) %>%
  {
    .[2:length(.)] - .[2:length(.) - 1]
  } %>%
  select(-FSDE1) %>%
  set_names(paste(names(.), rep(1:3,2), sep = "_")) %>%
  as_tibble

tbl.2.a %>%
  head %>%
  kable

PSDE2_1	PSDE3_2	PSDE4_3	FSDE2_1	FSDE3_2	FSDE4_3
-0.75	0.75	0.25	-1.00	0.00	-1.25
-0.25	-1.00	0.00	-3.75	0.50	-1.25
0.25	-1.25	-1.00	0.50	-1.75	-2.25
0.50	-1.75	-0.25	0.25	-1.50	-1.00
1.50	0.00	-0.50	2.75	0.50	-0.50
-0.50	1.25	-0.75	-1.00	0.50	-0.25

Next I will estimate the three models using OLS.

# in:  t = latter time
# out: first-difference linear regression model
fun.2.a.1 <- function(data, t) {
  glue("FSDE{t}_{t-1} ~ PSDE{t}_{t-1}") %>%
    as.character %>%
    as.formula %>%
    lm(data)
}

2:4 %>%
  map_df(~ fun.2.a.1(tbl.2.a, .) %>% tidy()) %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	0.0390433	0.1143646	0.3413929	0.7338161
PSDE2_1	0.8323296	0.1214906	6.8509778	0.0000000
(Intercept)	-0.0510947	0.0661938	-0.7718960	0.4427377
PSDE3_2	0.5133903	0.0659122	7.7889982	0.0000000
(Intercept)	-0.1049785	0.0922747	-1.1376735	0.2590816
PSDE4_3	0.5884570	0.1318578	4.4628139	0.0000297

Change in PSDE significantly predicts change in FSDE at time 2 (b1 ≈ .832, t ≈ 6.85, p < .001), time 3 (b1 ≈ .0513, t ≈ 7.79, p < .001), and time 4 (b1 ≈ .588, t ≈ 4,46, p < .001). The effect peaks in strength at time 3.

Now let’s look at the reliabilities of the change scores.

# in: rel = vector of reliabilities
#     v = "P" for PSDE or "F" for FSDE
#     t = time
# out: reliability of difference at time t
fun.2.a.2 <- function(data, rel, v, t) {
  x <- data[paste0(v, "SDE", t)]
  y <- data[paste0(v, "SDE", t - 1)] 
  s2x <- var(x)
  s2y <- var(y)
  ax <- rel[t]
  ay <- rel[t - 1]
  sxy <- cov(x, y)
  ((s2x * ax) + (s2y * ay) - (2 * sxy)) / (s2x + s2y + (2 * sxy))
}

map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.p, "P", .)) %>%
  flatten_dbl %>%
  set_names(c("PSDE2_1", "PSDE3_2", "PSDE4_3")) %>%
  enframe("variable", "reliability") %>%
  bind_rows(
    map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.f, "F", .)) %>%
      flatten_dbl %>%
      set_names(c("FSDE2_1", "FSDE3_2", "FSDE4_3")) %>%
      enframe("variable", "reliability")
  ) %>%
  kable

variable	reliability
PSDE2_1	0.1096131
PSDE3_2	0.1901598
PSDE4_3	0.0556875
FSDE2_1	0.3352624
FSDE3_2	0.0865152
FSDE4_3	0.1364795

The original reliabilities ranged from .864 for PSDE1 to .947 for FSDE4. The new reliabilities are dramatically lower, ranging from just .056 for PSDE4_3 to .335 for FSDE2_1.

To understand the relationships among the four variables implicitly embedded in the first-difference model, we could use a similar approach to that presented by Edwards (2002): a regression equation with the difference-scored variables pulled apart into their component parts. However, because we are also using a difference-scored variable as a dependent variable, we would need to use a multivariate regression method, like CCA or MMR (Dwyer, 1983; Thompson, 1991).

B. Static-Score Model

To develop a static-score model for the data, I relied on Equation 2.5 from Finkel (1995):

Yₜ = β₀ + β₁Xₜ + β₂Yₜ₋₁ + εₜ

Whereas the first-difference approach modeled change in Y as a function of change in X, the static-score approach models Y as a function of X and the prior value of Y. As before, we can get estimates for three models, representing change from period one to two, two to three, and three to four.

map(2:4, ~ glue("FSDE{.} ~ PSDE{.} + FSDE{. - 1}")) %>%
  map(lm, data = tbl.2) %>%
  map_df(tidy) %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	-2.7326594	0.3994077	-6.841779	0.0000000
PSDE2	0.7720880	0.0816718	9.453548	0.0000000
FSDE1	0.1347573	0.0958955	1.405252	0.1643690
(Intercept)	-1.6897887	0.3182402	-5.309791	0.0000012
PSDE3	0.4790255	0.0677060	7.075081	0.0000000
FSDE2	0.4406076	0.0625405	7.045160	0.0000000
(Intercept)	-2.4939910	0.5087453	-4.902239	0.0000059
PSDE4	0.6442929	0.1134432	5.679430	0.0000003
FSDE3	0.3405407	0.1130842	3.011391	0.0036160

First, the lagged dependent variable (Yₜ₋₁) has significant effects at time three and time four. That suggests some degree of serial auto-correlation for FSDE.

Second, PSDE positively and significantly predicts FSDE at all time periods, suggesting that the two variables are correlated.

C. Hierarchical Linear Model

In HLM, we model change for an individual with intercept and slope parameters using variance across time periods. Each individual will have their own intercept and slope, representing their personal rate of change.

In the present case, we can model the effect of PSDE on individual’s rate of change in FSDE. I chose to use PSDE1 as the predictor as it is the baseline value of PSDE in the dataset. (Note: I tested PSDE2-4, and the effect on A0 and A1 diminish over time. I think that makes sense.)

First, let’s look at the effect of PSDE1 on A0.

tbl.2 %>%
  lm(A0 ~ PSDE1, .) %>%
  tidy %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	-2.0970079	0.3658623	-5.731686	2e-07
PSDE1	0.6914513	0.0632065	10.939555	0e+00

The effect is positive and significant. We can interpret that to mean that the stronger an individual’s sense of perceived self-development opportunities, the higher one’s baseline level of feelings about self-development opportunities. Nothing terribly surprising there.

Next, let’s look at the effect of PSDE1 on A1.

tbl.2 %>%
  lm(A1 ~ PSDE1, .) %>%
  tidy %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	0.5530551	0.2477851	2.231995	0.0287721
PSDE1	-0.1169434	0.0428075	-2.731845	0.0079404

The effect is negative and significant. This means that the stronger an individual’s sense of perceived self-development opportunities, the less change one experiences in one’s own feelings about self-development opportunities. That is, people who think they have plenty of opportunities are more stable in their feelings about it.

Finally, let’s see whether there are gender differences in A0 and A1.

tbl.2 %>%
  lm(A0 ~ GENDER, .) %>%
  tidy %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	2.2500000	0.2422887	9.286443	0.0000000
GENDER	-0.5390351	0.2741935	-1.965893	0.0532222

tbl.2 %>%
  lm(A1 ~ GENDER, .) %>%
  tidy %>%
  kable

term	estimate	std.error	statistic	p.value
(Intercept)	-0.1406250	0.1080169	-1.3018797	0.1971656
GENDER	0.0379934	0.1222407	0.3108084	0.7568569

The effect of gender on A0 is negative and nearly, but not quite, significant. If I were to interpret the result anyway, I would say that female participants had more positive feelings about self-development opportunities than men.

There was no significant gender difference on change in feelings about self-development opportunities.

D. Conclusion

Overall, there’s a positive relationship between perceptions and feelings about self-development opportunities that is stable over time. There’s also a positive relationship between the change in perceptions and the change in feelings. As the first-difference model showed, a change in perceived self-development opportunities was associated with a change in feelings toward self-development opportunities in all time periods.

There did appear to be some stability in the variables themselves, but even when controlling for the lagged dependent variable, there was a significant relationship between perceptions and feelings at all time periods, as shown by the static-score model.

Finally, the hierarchical linear model allowed us to examine the rates of change. Using HLM, I found a significant relationship between the magnitude of initial perceived self-development opportunities (PSDE1) and the average rate of change of feelings toward self-development opportunities (A1). The higher a participant scored on perceptions at the initial assessment, the less change in feelings they experienced over the course of the study.

References

Dwyer. (1983). Multivariate Regression and Multivariate ANOVA. In Statistical Models for the Social and Behavioral Sciences.

Edwards, J. R. (2002). Alternatives to Difference Scores. In Advances in measurement and data analysis.

Finkel, S. E. (1995). Modeling Change with Panel Data. In Causal analysis with panel data.

Thompson, B. (1991). A Primer on the Logic and Use of Canonical Correlation Analysis. Measurement and Evaluation in Counseling and Development.

HW #8: The Study of Change

Daniel Lewis

3/1/20