1. Comparison of approaches

First-difference model improves on cross-sectional model by reducing omitted variable bias (assuming the omitted variable is unchanging over time, it cancels out when subtracted from itself in the change score). But it assumes that the lagged dependent variable has no relationship to the dependent variable, which is usually not a viable assumption.

The static score model improves on the first-difference model by including the lagged dependent variable, which often influences the dependent variable or the change score. Further, including the lagged dependent variable controls for regression to the mean effects.

The hierarchical linear model improves on both of the previous models in two ways. First, by partitioning variance into a between-subjects effect and a within-subjects effect, it allows researchers to understand effects and make predictions at the level of the subject. Second, by aggregating over more than just two time points, it may produce better estimates of change.

2. Analysis of Time-Series Data

Set up working environment

Here are the packages I will be using:

# R Markdown
library(knitr)

# Projects
library(here)

# Statistics
library(psych)

# Tidyverse
library(readxl)
library(tidyverse)
library(broom)
library(glue)

I will import and take a look at the data.

tbl.2 <- read_excel(here('data', 'sde.xlsx'))

tbl.2[1:10, ] %>%
  kable
ID GENDER PSDE1 PSDE2 PSDE3 PSDE4 FSDE1 FSDE2 FSDE3 FSDE4 A0 A1
2 1 6.00 5.25 6.00 6.25 2.25 1.25 1.25 0.00 2.200 -0.675
6 1 3.50 3.25 2.25 2.25 2.50 -1.25 -0.75 -2.00 1.575 -1.300
8 1 6.75 7.00 5.75 4.75 2.25 2.75 1.00 -1.25 3.025 -1.225
9 1 4.50 5.00 3.25 3.00 1.00 1.25 -0.25 -1.25 1.425 -0.825
14 1 4.00 5.50 5.50 5.00 -1.25 1.50 2.00 1.50 -0.375 0.875
15 1 4.75 4.25 5.50 4.75 2.50 1.50 2.00 1.75 2.200 -0.175
23 1 6.00 6.50 6.75 6.00 2.25 2.25 2.25 2.75 2.150 0.150
24 0 6.50 6.25 5.75 5.75 2.00 2.75 2.25 2.25 2.275 0.025
27 1 7.00 6.50 5.25 6.75 2.25 2.75 2.50 2.25 2.475 -0.025
28 1 4.50 4.25 6.00 6.75 1.50 1.50 2.00 2.50 1.350 0.350
describe(tbl.2)[c('n', 'mean', 'sd', 'min', 'max')] %>%
  kable
n mean sd min max
ID 73 115.6301370 69.6035413 2.00 231.000
GENDER 73 0.7808219 0.4165525 0.00 1.000
PSDE1 73 5.6780822 1.1322908 1.75 7.000
PSDE2 73 5.3390411 1.1639625 2.00 7.000
PSDE3 73 5.3184932 1.1080986 2.25 7.000
PSDE4 73 5.4212329 1.0679003 2.25 7.000
FSDE1 73 1.8869863 0.9913174 -1.25 3.000
FSDE2 73 1.6438356 1.1996222 -1.75 3.000
FSDE3 73 1.5821918 1.0712911 -1.50 3.000
FSDE4 73 1.5376712 1.2095333 -2.00 3.000
A0 73 1.8291096 0.9882470 -1.05 3.425
A1 73 -0.1109589 0.4293484 -1.30 1.075

No missing data, sufficient variation—looks good!

Finally, I will capture the reliabilities and time codes, as well as the categories for gender.

dbl.2.p <- c(.864, .915, .903, .886)
dbl.2.f <- c(.910, .927, .928, .947)
int.2.t <- 0:3

tbl.2 <- tbl.2 %>%
  mutate(gen = factor(GENDER, labels = c("female", "male")))

A. First-Difference Model

To implement the first-difference approach, I relied on Equation 2 in Finkel (1995):

ΔY = Δβ₀ + β₁ΔX + β₂ΔZ + Δε,

where Δ signifies the change in a variable between time periods and Z is an additional independent variable.

This equation models the change in the dependent variable between time t and time t+1 as a function of the change in the intercept and change in the independent variables.

Inputting our variables into Equation 2, we have:

ΔFSDE = Δβ₀ + β₁ΔPSDE + Δε.

To estimate this model, I will first create three new variables each for ΔFSDE and ΔPSDE by taking the difference in scores in each pair of scores from adjacent waves.

tbl.2.a <- tbl.2 %>%
  select(contains("SDE")) %>%
  {
    .[2:length(.)] - .[2:length(.) - 1]
  } %>%
  select(-FSDE1) %>%
  set_names(paste(names(.), rep(1:3,2), sep = "_")) %>%
  as_tibble

tbl.2.a %>%
  head %>%
  kable
PSDE2_1 PSDE3_2 PSDE4_3 FSDE2_1 FSDE3_2 FSDE4_3
-0.75 0.75 0.25 -1.00 0.00 -1.25
-0.25 -1.00 0.00 -3.75 0.50 -1.25
0.25 -1.25 -1.00 0.50 -1.75 -2.25
0.50 -1.75 -0.25 0.25 -1.50 -1.00
1.50 0.00 -0.50 2.75 0.50 -0.50
-0.50 1.25 -0.75 -1.00 0.50 -0.25

Next I will estimate the three models using OLS.

# in:  t = latter time
# out: first-difference linear regression model
fun.2.a.1 <- function(data, t) {
  glue("FSDE{t}_{t-1} ~ PSDE{t}_{t-1}") %>%
    as.character %>%
    as.formula %>%
    lm(data)
}

2:4 %>%
  map_df(~ fun.2.a.1(tbl.2.a, .) %>% tidy()) %>%
  kable
term estimate std.error statistic p.value
(Intercept) 0.0390433 0.1143646 0.3413929 0.7338161
PSDE2_1 0.8323296 0.1214906 6.8509778 0.0000000
(Intercept) -0.0510947 0.0661938 -0.7718960 0.4427377
PSDE3_2 0.5133903 0.0659122 7.7889982 0.0000000
(Intercept) -0.1049785 0.0922747 -1.1376735 0.2590816
PSDE4_3 0.5884570 0.1318578 4.4628139 0.0000297

Change in PSDE significantly predicts change in FSDE at time 2 (b1 ≈ .832, t ≈ 6.85, p < .001), time 3 (b1 ≈ .0513, t ≈ 7.79, p < .001), and time 4 (b1 ≈ .588, t ≈ 4,46, p < .001). The effect peaks in strength at time 3.

Now let’s look at the reliabilities of the change scores.

# in: rel = vector of reliabilities
#     v = "P" for PSDE or "F" for FSDE
#     t = time
# out: reliability of difference at time t
fun.2.a.2 <- function(data, rel, v, t) {
  x <- data[paste0(v, "SDE", t)]
  y <- data[paste0(v, "SDE", t - 1)] 
  s2x <- var(x)
  s2y <- var(y)
  ax <- rel[t]
  ay <- rel[t - 1]
  sxy <- cov(x, y)
  ((s2x * ax) + (s2y * ay) - (2 * sxy)) / (s2x + s2y + (2 * sxy))
}

map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.p, "P", .)) %>%
  flatten_dbl %>%
  set_names(c("PSDE2_1", "PSDE3_2", "PSDE4_3")) %>%
  enframe("variable", "reliability") %>%
  bind_rows(
    map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.f, "F", .)) %>%
      flatten_dbl %>%
      set_names(c("FSDE2_1", "FSDE3_2", "FSDE4_3")) %>%
      enframe("variable", "reliability")
  ) %>%
  kable
variable reliability
PSDE2_1 0.1096131
PSDE3_2 0.1901598
PSDE4_3 0.0556875
FSDE2_1 0.3352624
FSDE3_2 0.0865152
FSDE4_3 0.1364795

The original reliabilities ranged from .864 for PSDE1 to .947 for FSDE4. The new reliabilities are dramatically lower, ranging from just .056 for PSDE4_3 to .335 for FSDE2_1.

To understand the relationships among the four variables implicitly embedded in the first-difference model, we could use a similar approach to that presented by Edwards (2002): a regression equation with the difference-scored variables pulled apart into their component parts. However, because we are also using a difference-scored variable as a dependent variable, we would need to use a multivariate regression method, like CCA or MMR (Dwyer, 1983; Thompson, 1991).

B. Static-Score Model

To develop a static-score model for the data, I relied on Equation 2.5 from Finkel (1995):

Yₜ = β₀ + β₁Xₜ + β₂Yₜ₋₁ + εₜ

Whereas the first-difference approach modeled change in Y as a function of change in X, the static-score approach models Y as a function of X and the prior value of Y. As before, we can get estimates for three models, representing change from period one to two, two to three, and three to four.

map(2:4, ~ glue("FSDE{.} ~ PSDE{.} + FSDE{. - 1}")) %>%
  map(lm, data = tbl.2) %>%
  map_df(tidy) %>%
  kable
term estimate std.error statistic p.value
(Intercept) -2.7326594 0.3994077 -6.841779 0.0000000
PSDE2 0.7720880 0.0816718 9.453548 0.0000000
FSDE1 0.1347573 0.0958955 1.405252 0.1643690
(Intercept) -1.6897887 0.3182402 -5.309791 0.0000012
PSDE3 0.4790255 0.0677060 7.075081 0.0000000
FSDE2 0.4406076 0.0625405 7.045160 0.0000000
(Intercept) -2.4939910 0.5087453 -4.902239 0.0000059
PSDE4 0.6442929 0.1134432 5.679430 0.0000003
FSDE3 0.3405407 0.1130842 3.011391 0.0036160

First, the lagged dependent variable (Yₜ₋₁) has significant effects at time three and time four. That suggests some degree of serial auto-correlation for FSDE.

Second, PSDE positively and significantly predicts FSDE at all time periods, suggesting that the two variables are correlated.

C. Hierarchical Linear Model

In HLM, we model change for an individual with intercept and slope parameters using variance across time periods. Each individual will have their own intercept and slope, representing their personal rate of change.

In the present case, we can model the effect of PSDE on individual’s rate of change in FSDE. I chose to use PSDE1 as the predictor as it is the baseline value of PSDE in the dataset. (Note: I tested PSDE2-4, and the effect on A0 and A1 diminish over time. I think that makes sense.)

First, let’s look at the effect of PSDE1 on A0.

tbl.2 %>%
  lm(A0 ~ PSDE1, .) %>%
  tidy %>%
  kable
term estimate std.error statistic p.value
(Intercept) -2.0970079 0.3658623 -5.731686 2e-07
PSDE1 0.6914513 0.0632065 10.939555 0e+00

The effect is positive and significant. We can interpret that to mean that the stronger an individual’s sense of perceived self-development opportunities, the higher one’s baseline level of feelings about self-development opportunities. Nothing terribly surprising there.

Next, let’s look at the effect of PSDE1 on A1.

tbl.2 %>%
  lm(A1 ~ PSDE1, .) %>%
  tidy %>%
  kable
term estimate std.error statistic p.value
(Intercept) 0.5530551 0.2477851 2.231995 0.0287721
PSDE1 -0.1169434 0.0428075 -2.731845 0.0079404

The effect is negative and significant. This means that the stronger an individual’s sense of perceived self-development opportunities, the less change one experiences in one’s own feelings about self-development opportunities. That is, people who think they have plenty of opportunities are more stable in their feelings about it.

Finally, let’s see whether there are gender differences in A0 and A1.

tbl.2 %>%
  lm(A0 ~ GENDER, .) %>%
  tidy %>%
  kable
term estimate std.error statistic p.value
(Intercept) 2.2500000 0.2422887 9.286443 0.0000000
GENDER -0.5390351 0.2741935 -1.965893 0.0532222
tbl.2 %>%
  lm(A1 ~ GENDER, .) %>%
  tidy %>%
  kable
term estimate std.error statistic p.value
(Intercept) -0.1406250 0.1080169 -1.3018797 0.1971656
GENDER 0.0379934 0.1222407 0.3108084 0.7568569

The effect of gender on A0 is negative and nearly, but not quite, significant. If I were to interpret the result anyway, I would say that female participants had more positive feelings about self-development opportunities than men.

There was no significant gender difference on change in feelings about self-development opportunities.

D. Conclusion

Overall, there’s a positive relationship between perceptions and feelings about self-development opportunities that is stable over time. There’s also a positive relationship between the change in perceptions and the change in feelings. As the first-difference model showed, a change in perceived self-development opportunities was associated with a change in feelings toward self-development opportunities in all time periods.

There did appear to be some stability in the variables themselves, but even when controlling for the lagged dependent variable, there was a significant relationship between perceptions and feelings at all time periods, as shown by the static-score model.

Finally, the hierarchical linear model allowed us to examine the rates of change. Using HLM, I found a significant relationship between the magnitude of initial perceived self-development opportunities (PSDE1) and the average rate of change of feelings toward self-development opportunities (A1). The higher a participant scored on perceptions at the initial assessment, the less change in feelings they experienced over the course of the study.

References

Dwyer. (1983). Multivariate Regression and Multivariate ANOVA. In Statistical Models for the Social and Behavioral Sciences.

Edwards, J. R. (2002). Alternatives to Difference Scores. In Advances in measurement and data analysis.

Finkel, S. E. (1995). Modeling Change with Panel Data. In Causal analysis with panel data.

Thompson, B. (1991). A Primer on the Logic and Use of Canonical Correlation Analysis. Measurement and Evaluation in Counseling and Development.