I now have all the data collected (unless I have to throw some out and replace it), so let’s look at some descriptive statistics!
First, import packages.
# import packages
library(corrr)
library(tidyverse)
library(flextable)
library(psych)
library(lavaan)
library(semTools)Import my R objects from previous documents.
formatAsTable <- readRDS("format.rds")
adhd.data <- readRDS(file.path("..", "data", "quality-data.rds"))The manipulations should have been randomized 50/50 by Qualtrics such that we end up with about 100 per cell, but let’s check.
adhd.data %>%
count(interdep, disclose) %>%
formatAsTableinterdep | disclose | n |
FALSE | FALSE | 107 |
FALSE | TRUE | 115 |
TRUE | FALSE | 115 |
TRUE | TRUE | 107 |
Sweet, that looks just about right.
And I’ve already looked at the manipulation checks in a previous document, but I’ll port that analysis in here.
adhd.data <- adhd.data %>%
mutate(across(where(is.factor),
~ as.integer(.),
.names = "{.col}.int"))
adhd.data %>%
count(interdep, intcheck) %>%
arrange(interdep, intcheck) %>%
formatAsTable %>%
bg(bg = "#074005",
i = ~ ifelse(interdep == 0,
as.integer(intcheck) < 3,
as.integer(intcheck) > 3))interdep | intcheck | n |
FALSE | Strongly disagree | 133 |
FALSE | Somewhat disagree | 35 |
FALSE | Neither agree nor disagree | 24 |
FALSE | Somewhat agree | 20 |
FALSE | Strongly agree | 10 |
TRUE | Strongly disagree | 3 |
TRUE | Somewhat disagree | 8 |
TRUE | Neither agree nor disagree | 21 |
TRUE | Somewhat agree | 77 |
TRUE | Strongly agree | 113 |
adhd.data %>%
count(disclose, discheck) %>%
arrange(disclose, discheck) %>%
formatAsTable %>%
bg(bg = "#074005",
i = ~ ifelse(disclose == 0,
as.integer(discheck) < 3,
as.integer(discheck) > 3))disclose | discheck | n |
FALSE | Strongly disagree | 197 |
FALSE | Somewhat disagree | 10 |
FALSE | Neither agree nor disagree | 4 |
FALSE | Somewhat agree | 7 |
FALSE | Strongly agree | 4 |
TRUE | Somewhat disagree | 2 |
TRUE | Neither agree nor disagree | 4 |
TRUE | Somewhat agree | 17 |
TRUE | Strongly agree | 199 |
# correlation matrix with variances on the diagonal
adhd.data %>%
select(c(interdep, intcheck.int,
disclose, discheck.int)) %>%
correlate(diagonal = map_dbl(., ~ var(.))) %>%
shave %>%
fashion(leading_zeros = F) %>%
formatAsTableterm | interdep | intcheck.int | disclose | discheck.int |
interdep | .25 | |||
intcheck.int | .76 | 2.65 | ||
disclose | -.04 | .04 | .25 | |
discheck.int | -.05 | .07 | .94 | 3.69 |
Clearly, both manipulations were effective, but disclosure was much more effective than interdep. To be honest, I’m not sure why. My two best guesses are that
interdep came first and the participants forgot what they had read by the time they got to the manipulation check or
either the vignette or manipulation check were somehow confusing.
I do think interdependence is a more difficult concept to grok than disclosure. Disclosure is just “Alex told me”. Interdependence requires making sense of your relationship with another person. All in all, I think it’s good enough.
adhd.data %>%
select(matches("\\w{3}\\d.int")) %>%
describe(fast = T) %>%
as_tibble(rownames = "var") %>%
select(-vars) %>%
mutate(across(c(n, min, max, range), as.integer)) %>%
formatAsTablevar | n | mean | sd | min | max | range | se |
aff1.int | 444 | 3.42 | 1.22 | 1 | 5 | 4 | 0.06 |
aff2.int | 444 | 3.65 | 1.04 | 1 | 5 | 4 | 0.05 |
aff3.int | 444 | 3.53 | 1.07 | 1 | 5 | 4 | 0.05 |
aff4.int | 444 | 3.56 | 1.01 | 1 | 5 | 4 | 0.05 |
cog1.int | 444 | 3.78 | 0.90 | 1 | 5 | 4 | 0.04 |
cog2.int | 444 | 3.66 | 1.10 | 1 | 5 | 4 | 0.05 |
cog3.int | 444 | 3.27 | 1.12 | 1 | 5 | 4 | 0.05 |
cog4.int | 444 | 3.26 | 1.13 | 1 | 5 | 4 | 0.05 |
lik1.int | 444 | 3.79 | 0.84 | 1 | 5 | 4 | 0.04 |
lik2.int | 444 | 3.72 | 0.89 | 1 | 5 | 4 | 0.04 |
lik3.int | 444 | 2.16 | 1.06 | 1 | 5 | 4 | 0.05 |
lik4.int | 444 | 1.71 | 0.88 | 1 | 5 | 4 | 0.04 |
Most of the means are hovering around the middle of the scale, which isn’t bad to see. And I can tell at a glance that most people said they liked Alex, which is kind of interesting. Standard deviations are somewhat low; hopefully that doesn’t become a problem when computing regressions.
What are the reliabilities of the measures? Using the lavaan package, I will do a confirmatory factor analysis of the twelve items. I’m planning to use McDonald’s \(\omega\) in addition to Cronbach’s \(\alpha\) because it performs better and is preferable especially where there is skew.
cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
cog =~ cog1 + cog2 + cog3 + cog4
lik =~ lik1 + lik2 + lik3 + lik4'
cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)
cfa.fit %>%
reliability %>%
as_tibble(rownames = "stat") %>%
formatAsTablestat | aff | cog | lik |
alpha | 0.91 | 0.44 | -0.49 |
omega | 0.88 | 0.55 | 0.07 |
omega2 | 0.88 | 0.55 | 0.07 |
omega3 | 0.88 | 0.60 | 0.06 |
avevar | 0.71 | 0.41 | 0.50 |
The reliabilities for cog and lik are low because some of their items are reverse-coded. I will try again.
adhd.data <- adhd.data %>%
mutate(across(c(cog4, lik3, lik4),
fct_rev,
.names = "{.col}.r"),
lik4.r = fct_collapse(lik4.r,
agree = c("Strongly agree",
"Somewhat agree")))
cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
cog =~ cog1 + cog2 + cog3 + cog4.r
lik =~ lik1 + lik2 + lik3.r + lik4.r'
cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)
cfa.fit %>%
reliability %>%
as_tibble(rownames = "stat") %>%
formatAsTablestat | aff | cog | lik |
alpha | 0.91 | 0.66 | 0.80 |
omega | 0.88 | 0.61 | 0.72 |
omega2 | 0.88 | 0.61 | 0.72 |
omega3 | 0.87 | 0.56 | 0.70 |
avevar | 0.71 | 0.40 | 0.51 |
Ouch, cog was not a terribly reliable measure.
Anyway, as long as I have the CFA model, I might as well look at the loadings and fit statistics.
# loadings
cfa.fit %>%
parameterEstimates %>%
filter(op == "=~") %>%
formatAsTablelhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper |
aff | =~ | aff1 | 0.94 | 0.02 | 47.15 | 0.00 | 0.90 | 0.98 |
aff | =~ | aff2 | 0.99 | 0.01 | 65.95 | 0.00 | 0.96 | 1.02 |
aff | =~ | aff3 | 1.06 | 0.01 | 71.92 | 0.00 | 1.03 | 1.09 |
aff | =~ | aff4 | 1.01 | 0.01 | 71.29 | 0.00 | 0.99 | 1.04 |
cog | =~ | cog1 | 1.67 | 0.06 | 26.25 | 0.00 | 1.55 | 1.80 |
cog | =~ | cog2 | 1.16 | 0.05 | 23.05 | 0.00 | 1.06 | 1.25 |
cog | =~ | cog3 | 0.98 | 0.06 | 16.39 | 0.00 | 0.87 | 1.10 |
cog | =~ | cog4.r | 0.19 | 0.08 | 2.28 | 0.02 | 0.03 | 0.35 |
lik | =~ | lik1 | 1.29 | 0.04 | 36.86 | 0.00 | 1.22 | 1.36 |
lik | =~ | lik2 | 1.08 | 0.03 | 35.47 | 0.00 | 1.02 | 1.14 |
lik | =~ | lik3.r | 0.65 | 0.04 | 15.50 | 0.00 | 0.57 | 0.74 |
lik | =~ | lik4.r | 0.97 | 0.04 | 27.67 | 0.00 | 0.90 | 1.04 |
# fit statistics
m <- c("chisq", "df", "pvalue", "rmsea", "tli")
cfa.fit %>%
fitMeasures(fit.measures = m) %>%
round(3) %>%
as_tibble(rownames = 'stat') %>%
formatAsTablestat | value |
chisq | 290.53 |
df | 51.00 |
pvalue | 0.00 |
rmsea | 0.10 |
tli | 0.98 |
Interesting. It does look like cog4 deviated from the other cog items. Still loaded significantly, though.
I’m curious how this would all look if we dropped the potentially problematic observations I previously identified.
adhd.data.hq <- adhd.data %>%
# exclude low-effort participants
filter(!if_any(c(random_clicker, nonpart, ftl)))
cfa.fit.hq <- cfa.model %>%
cfa(adhd.data.hq, effect.coding = T)
cfa.fit.hq %>%
reliability %>%
as_tibble(rownames = "stat") %>%
formatAsTablestat | aff | cog | lik |
alpha | 0.91 | 0.67 | 0.80 |
omega | 0.88 | 0.61 | 0.73 |
omega2 | 0.88 | 0.61 | 0.73 |
omega3 | 0.88 | 0.57 | 0.71 |
avevar | 0.71 | 0.41 | 0.51 |
cfa.fit.hq %>%
parameterEstimates %>%
filter(op == "=~") %>%
formatAsTablelhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper |
aff | =~ | aff1 | 0.93 | 0.02 | 44.16 | 0.00 | 0.89 | 0.97 |
aff | =~ | aff2 | 0.99 | 0.02 | 64.73 | 0.00 | 0.96 | 1.02 |
aff | =~ | aff3 | 1.06 | 0.02 | 69.58 | 0.00 | 1.03 | 1.09 |
aff | =~ | aff4 | 1.01 | 0.01 | 69.31 | 0.00 | 0.98 | 1.04 |
cog | =~ | cog1 | 1.67 | 0.07 | 25.64 | 0.00 | 1.54 | 1.80 |
cog | =~ | cog2 | 1.16 | 0.05 | 21.70 | 0.00 | 1.05 | 1.26 |
cog | =~ | cog3 | 0.98 | 0.06 | 15.37 | 0.00 | 0.85 | 1.10 |
cog | =~ | cog4.r | 0.20 | 0.08 | 2.39 | 0.02 | 0.04 | 0.36 |
lik | =~ | lik1 | 1.30 | 0.04 | 35.51 | 0.00 | 1.23 | 1.37 |
lik | =~ | lik2 | 1.08 | 0.03 | 32.75 | 0.00 | 1.01 | 1.14 |
lik | =~ | lik3.r | 0.63 | 0.05 | 13.82 | 0.00 | 0.54 | 0.72 |
lik | =~ | lik4.r | 0.99 | 0.04 | 25.89 | 0.00 | 0.92 | 1.07 |
Looks like cog4 had some marginal improvement that shows up in both the loading and the reliability.
I will run it one more time without cog4 and we’ll see if that improves things.
cfa.fit.cog <- cfa.model %>%
str_remove(fixed(" + cog4.r")) %>%
cfa(adhd.data.hq, effect.coding = T)
cfa.fit.cog %>%
reliability %>%
as_tibble(rownames = "stat") %>%
formatAsTablestat | aff | cog | lik |
alpha | 0.91 | 0.75 | 0.80 |
omega | 0.88 | 0.70 | 0.73 |
omega2 | 0.88 | 0.70 | 0.73 |
omega3 | 0.88 | 0.69 | 0.71 |
avevar | 0.71 | 0.54 | 0.51 |
Nice. The McDonald’s \(\omega\) for cog increased to 0.7!
Save data:
adhd.data %>%
saveRDS(file.path("..", "data", "adhd-data.rds"))
adhd.data %>%
write_csv(file.path("..", "data", "adhd-data.csv"))
adhd.data.hq %>%
saveRDS(file.path("..", "data", "hq-data.rds"))
adhd.data.hq %>%
write_csv(file.path("..", "data", "hq-data.csv"))Output document:
options(knitr.duplicate.label = "allow")
rmarkdown::render("descriptives.Rmd",
output_dir = file.path("..", "github", "thesis"))