First-difference model improves on cross-sectional model by reducing omitted variable bias (assuming the omitted variable is unchanging over time, it cancels out when subtracted from itself in the change score). But it assumes that the lagged dependent variable has no relationship to the dependent variable, which is usually not a viable assumption.
The static score model improves on the first-difference model by including the lagged dependent variable, which often influences the dependent variable or the change score. Further, including the lagged dependent variable controls for regression to the mean effects.
The hierarchical linear model improves on both of the previous models in two ways. First, by partitioning variance into a between-subjects effect and a within-subjects effect, it allows researchers to understand effects and make predictions at the level of the subject. Second, by aggregating over more than just two time points, it may produce better estimates of change.
Here are the packages I will be using:
# R Markdown
library(knitr)
# Projects
library(here)
# Statistics
library(psych)
# Tidyverse
library(readxl)
library(tidyverse)
library(broom)
library(glue)
I will import and take a look at the data.
tbl.2 <- read_excel(here('data', 'sde.xlsx'))
tbl.2[1:10, ] %>%
kable
ID | GENDER | PSDE1 | PSDE2 | PSDE3 | PSDE4 | FSDE1 | FSDE2 | FSDE3 | FSDE4 | A0 | A1 |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 1 | 6.00 | 5.25 | 6.00 | 6.25 | 2.25 | 1.25 | 1.25 | 0.00 | 2.200 | -0.675 |
6 | 1 | 3.50 | 3.25 | 2.25 | 2.25 | 2.50 | -1.25 | -0.75 | -2.00 | 1.575 | -1.300 |
8 | 1 | 6.75 | 7.00 | 5.75 | 4.75 | 2.25 | 2.75 | 1.00 | -1.25 | 3.025 | -1.225 |
9 | 1 | 4.50 | 5.00 | 3.25 | 3.00 | 1.00 | 1.25 | -0.25 | -1.25 | 1.425 | -0.825 |
14 | 1 | 4.00 | 5.50 | 5.50 | 5.00 | -1.25 | 1.50 | 2.00 | 1.50 | -0.375 | 0.875 |
15 | 1 | 4.75 | 4.25 | 5.50 | 4.75 | 2.50 | 1.50 | 2.00 | 1.75 | 2.200 | -0.175 |
23 | 1 | 6.00 | 6.50 | 6.75 | 6.00 | 2.25 | 2.25 | 2.25 | 2.75 | 2.150 | 0.150 |
24 | 0 | 6.50 | 6.25 | 5.75 | 5.75 | 2.00 | 2.75 | 2.25 | 2.25 | 2.275 | 0.025 |
27 | 1 | 7.00 | 6.50 | 5.25 | 6.75 | 2.25 | 2.75 | 2.50 | 2.25 | 2.475 | -0.025 |
28 | 1 | 4.50 | 4.25 | 6.00 | 6.75 | 1.50 | 1.50 | 2.00 | 2.50 | 1.350 | 0.350 |
describe(tbl.2)[c('n', 'mean', 'sd', 'min', 'max')] %>%
kable
n | mean | sd | min | max | |
---|---|---|---|---|---|
ID | 73 | 115.6301370 | 69.6035413 | 2.00 | 231.000 |
GENDER | 73 | 0.7808219 | 0.4165525 | 0.00 | 1.000 |
PSDE1 | 73 | 5.6780822 | 1.1322908 | 1.75 | 7.000 |
PSDE2 | 73 | 5.3390411 | 1.1639625 | 2.00 | 7.000 |
PSDE3 | 73 | 5.3184932 | 1.1080986 | 2.25 | 7.000 |
PSDE4 | 73 | 5.4212329 | 1.0679003 | 2.25 | 7.000 |
FSDE1 | 73 | 1.8869863 | 0.9913174 | -1.25 | 3.000 |
FSDE2 | 73 | 1.6438356 | 1.1996222 | -1.75 | 3.000 |
FSDE3 | 73 | 1.5821918 | 1.0712911 | -1.50 | 3.000 |
FSDE4 | 73 | 1.5376712 | 1.2095333 | -2.00 | 3.000 |
A0 | 73 | 1.8291096 | 0.9882470 | -1.05 | 3.425 |
A1 | 73 | -0.1109589 | 0.4293484 | -1.30 | 1.075 |
No missing data, sufficient variation—looks good!
Finally, I will capture the reliabilities and time codes, as well as the categories for gender.
dbl.2.p <- c(.864, .915, .903, .886)
dbl.2.f <- c(.910, .927, .928, .947)
int.2.t <- 0:3
tbl.2 <- tbl.2 %>%
mutate(gen = factor(GENDER, labels = c("female", "male")))
To implement the first-difference approach, I relied on Equation 2 in Finkel (1995):
ΔY = Δβ₀ + β₁ΔX + β₂ΔZ + Δε,
where Δ signifies the change in a variable between time periods and Z is an additional independent variable.
This equation models the change in the dependent variable between time t and time t+1 as a function of the change in the intercept and change in the independent variables.
Inputting our variables into Equation 2, we have:
ΔFSDE = Δβ₀ + β₁ΔPSDE + Δε.
To estimate this model, I will first create three new variables each for ΔFSDE and ΔPSDE by taking the difference in scores in each pair of scores from adjacent waves.
tbl.2.a <- tbl.2 %>%
select(contains("SDE")) %>%
{
.[2:length(.)] - .[2:length(.) - 1]
} %>%
select(-FSDE1) %>%
set_names(paste(names(.), rep(1:3,2), sep = "_")) %>%
as_tibble
tbl.2.a %>%
head %>%
kable
PSDE2_1 | PSDE3_2 | PSDE4_3 | FSDE2_1 | FSDE3_2 | FSDE4_3 |
---|---|---|---|---|---|
-0.75 | 0.75 | 0.25 | -1.00 | 0.00 | -1.25 |
-0.25 | -1.00 | 0.00 | -3.75 | 0.50 | -1.25 |
0.25 | -1.25 | -1.00 | 0.50 | -1.75 | -2.25 |
0.50 | -1.75 | -0.25 | 0.25 | -1.50 | -1.00 |
1.50 | 0.00 | -0.50 | 2.75 | 0.50 | -0.50 |
-0.50 | 1.25 | -0.75 | -1.00 | 0.50 | -0.25 |
Next I will estimate the three models using OLS.
# in: t = latter time
# out: first-difference linear regression model
fun.2.a.1 <- function(data, t) {
glue("FSDE{t}_{t-1} ~ PSDE{t}_{t-1}") %>%
as.character %>%
as.formula %>%
lm(data)
}
2:4 %>%
map_df(~ fun.2.a.1(tbl.2.a, .) %>% tidy()) %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.0390433 | 0.1143646 | 0.3413929 | 0.7338161 |
PSDE2_1 | 0.8323296 | 0.1214906 | 6.8509778 | 0.0000000 |
(Intercept) | -0.0510947 | 0.0661938 | -0.7718960 | 0.4427377 |
PSDE3_2 | 0.5133903 | 0.0659122 | 7.7889982 | 0.0000000 |
(Intercept) | -0.1049785 | 0.0922747 | -1.1376735 | 0.2590816 |
PSDE4_3 | 0.5884570 | 0.1318578 | 4.4628139 | 0.0000297 |
Change in PSDE significantly predicts change in FSDE at time 2 (b1 ≈ .832, t ≈ 6.85, p < .001), time 3 (b1 ≈ .0513, t ≈ 7.79, p < .001), and time 4 (b1 ≈ .588, t ≈ 4,46, p < .001). The effect peaks in strength at time 3.
Now let’s look at the reliabilities of the change scores.
# in: rel = vector of reliabilities
# v = "P" for PSDE or "F" for FSDE
# t = time
# out: reliability of difference at time t
fun.2.a.2 <- function(data, rel, v, t) {
x <- data[paste0(v, "SDE", t)]
y <- data[paste0(v, "SDE", t - 1)]
s2x <- var(x)
s2y <- var(y)
ax <- rel[t]
ay <- rel[t - 1]
sxy <- cov(x, y)
((s2x * ax) + (s2y * ay) - (2 * sxy)) / (s2x + s2y + (2 * sxy))
}
map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.p, "P", .)) %>%
flatten_dbl %>%
set_names(c("PSDE2_1", "PSDE3_2", "PSDE4_3")) %>%
enframe("variable", "reliability") %>%
bind_rows(
map(2:4, ~ fun.2.a.2(tbl.2, dbl.2.f, "F", .)) %>%
flatten_dbl %>%
set_names(c("FSDE2_1", "FSDE3_2", "FSDE4_3")) %>%
enframe("variable", "reliability")
) %>%
kable
variable | reliability |
---|---|
PSDE2_1 | 0.1096131 |
PSDE3_2 | 0.1901598 |
PSDE4_3 | 0.0556875 |
FSDE2_1 | 0.3352624 |
FSDE3_2 | 0.0865152 |
FSDE4_3 | 0.1364795 |
The original reliabilities ranged from .864 for PSDE1 to .947 for FSDE4. The new reliabilities are dramatically lower, ranging from just .056 for PSDE4_3 to .335 for FSDE2_1.
To understand the relationships among the four variables implicitly embedded in the first-difference model, we could use a similar approach to that presented by Edwards (2002): a regression equation with the difference-scored variables pulled apart into their component parts. However, because we are also using a difference-scored variable as a dependent variable, we would need to use a multivariate regression method, like CCA or MMR (Dwyer, 1983; Thompson, 1991).
To develop a static-score model for the data, I relied on Equation 2.5 from Finkel (1995):
Yₜ = β₀ + β₁Xₜ + β₂Yₜ₋₁ + εₜ
Whereas the first-difference approach modeled change in Y as a function of change in X, the static-score approach models Y as a function of X and the prior value of Y. As before, we can get estimates for three models, representing change from period one to two, two to three, and three to four.
map(2:4, ~ glue("FSDE{.} ~ PSDE{.} + FSDE{. - 1}")) %>%
map(lm, data = tbl.2) %>%
map_df(tidy) %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -2.7326594 | 0.3994077 | -6.841779 | 0.0000000 |
PSDE2 | 0.7720880 | 0.0816718 | 9.453548 | 0.0000000 |
FSDE1 | 0.1347573 | 0.0958955 | 1.405252 | 0.1643690 |
(Intercept) | -1.6897887 | 0.3182402 | -5.309791 | 0.0000012 |
PSDE3 | 0.4790255 | 0.0677060 | 7.075081 | 0.0000000 |
FSDE2 | 0.4406076 | 0.0625405 | 7.045160 | 0.0000000 |
(Intercept) | -2.4939910 | 0.5087453 | -4.902239 | 0.0000059 |
PSDE4 | 0.6442929 | 0.1134432 | 5.679430 | 0.0000003 |
FSDE3 | 0.3405407 | 0.1130842 | 3.011391 | 0.0036160 |
First, the lagged dependent variable (Yₜ₋₁) has significant effects at time three and time four. That suggests some degree of serial auto-correlation for FSDE.
Second, PSDE positively and significantly predicts FSDE at all time periods, suggesting that the two variables are correlated.
In HLM, we model change for an individual with intercept and slope parameters using variance across time periods. Each individual will have their own intercept and slope, representing their personal rate of change.
In the present case, we can model the effect of PSDE on individual’s rate of change in FSDE. I chose to use PSDE1 as the predictor as it is the baseline value of PSDE in the dataset. (Note: I tested PSDE2-4, and the effect on A0 and A1 diminish over time. I think that makes sense.)
First, let’s look at the effect of PSDE1 on A0.
tbl.2 %>%
lm(A0 ~ PSDE1, .) %>%
tidy %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -2.0970079 | 0.3658623 | -5.731686 | 2e-07 |
PSDE1 | 0.6914513 | 0.0632065 | 10.939555 | 0e+00 |
The effect is positive and significant. We can interpret that to mean that the stronger an individual’s sense of perceived self-development opportunities, the higher one’s baseline level of feelings about self-development opportunities. Nothing terribly surprising there.
Next, let’s look at the effect of PSDE1 on A1.
tbl.2 %>%
lm(A1 ~ PSDE1, .) %>%
tidy %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.5530551 | 0.2477851 | 2.231995 | 0.0287721 |
PSDE1 | -0.1169434 | 0.0428075 | -2.731845 | 0.0079404 |
The effect is negative and significant. This means that the stronger an individual’s sense of perceived self-development opportunities, the less change one experiences in one’s own feelings about self-development opportunities. That is, people who think they have plenty of opportunities are more stable in their feelings about it.
Finally, let’s see whether there are gender differences in A0 and A1.
tbl.2 %>%
lm(A0 ~ GENDER, .) %>%
tidy %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 2.2500000 | 0.2422887 | 9.286443 | 0.0000000 |
GENDER | -0.5390351 | 0.2741935 | -1.965893 | 0.0532222 |
tbl.2 %>%
lm(A1 ~ GENDER, .) %>%
tidy %>%
kable
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.1406250 | 0.1080169 | -1.3018797 | 0.1971656 |
GENDER | 0.0379934 | 0.1222407 | 0.3108084 | 0.7568569 |
The effect of gender on A0 is negative and nearly, but not quite, significant. If I were to interpret the result anyway, I would say that female participants had more positive feelings about self-development opportunities than men.
There was no significant gender difference on change in feelings about self-development opportunities.
Overall, there’s a positive relationship between perceptions and feelings about self-development opportunities that is stable over time. There’s also a positive relationship between the change in perceptions and the change in feelings. As the first-difference model showed, a change in perceived self-development opportunities was associated with a change in feelings toward self-development opportunities in all time periods.
There did appear to be some stability in the variables themselves, but even when controlling for the lagged dependent variable, there was a significant relationship between perceptions and feelings at all time periods, as shown by the static-score model.
Finally, the hierarchical linear model allowed us to examine the rates of change. Using HLM, I found a significant relationship between the magnitude of initial perceived self-development opportunities (PSDE1) and the average rate of change of feelings toward self-development opportunities (A1). The higher a participant scored on perceptions at the initial assessment, the less change in feelings they experienced over the course of the study.
Dwyer. (1983). Multivariate Regression and Multivariate ANOVA. In Statistical Models for the Social and Behavioral Sciences.
Edwards, J. R. (2002). Alternatives to Difference Scores. In Advances in measurement and data analysis.
Finkel, S. E. (1995). Modeling Change with Panel Data. In Causal analysis with panel data.
Thompson, B. (1991). A Primer on the Logic and Use of Canonical Correlation Analysis. Measurement and Evaluation in Counseling and Development.