Descriptive Statistics

I now have all the data collected (unless I have to throw some out and replace it), so let’s look at some descriptive statistics!

First, import packages.

# import packages
library(corrr)
library(tidyverse)

library(flextable)

library(psych)
library(lavaan)
library(semTools)

Import my R objects from previous documents.

formatAsTable <- readRDS("format.rds")
adhd.data <- readRDS(file.path("..", "data", "quality-data.rds"))

Basic descriptives

Manipulated variables

The manipulations should have been randomized 50/50 by Qualtrics such that we end up with about 100 per cell, but let’s check.

adhd.data %>%
  count(interdep, disclose) %>%
  formatAsTable

interdep	disclose	n
FALSE	FALSE	107
FALSE	TRUE	115
TRUE	FALSE	115
TRUE	TRUE	107

Sweet, that looks just about right.

And I’ve already looked at the manipulation checks in a previous document, but I’ll port that analysis in here.

adhd.data <- adhd.data %>%
  mutate(across(where(is.factor),
                ~ as.integer(.),
                .names = "{.col}.int"))

adhd.data %>%
  count(interdep, intcheck) %>%
  arrange(interdep, intcheck) %>%
  formatAsTable %>%
  bg(bg = "#074005",
     i = ~ ifelse(interdep == 0,
                  as.integer(intcheck) < 3,
                  as.integer(intcheck) > 3))

interdep	intcheck	n
FALSE	Strongly disagree	133
FALSE	Somewhat disagree	35
FALSE	Neither agree nor disagree	24
FALSE	Somewhat agree	20
FALSE	Strongly agree	10
TRUE	Strongly disagree	3
TRUE	Somewhat disagree	8
TRUE	Neither agree nor disagree	21
TRUE	Somewhat agree	77
TRUE	Strongly agree	113

adhd.data %>%
  count(disclose, discheck) %>%
  arrange(disclose, discheck) %>%
  formatAsTable %>%
  bg(bg = "#074005",
     i = ~ ifelse(disclose == 0,
                  as.integer(discheck) < 3,
                  as.integer(discheck) > 3))

disclose	discheck	n
FALSE	Strongly disagree	197
FALSE	Somewhat disagree	10
FALSE	Neither agree nor disagree	4
FALSE	Somewhat agree	7
FALSE	Strongly agree	4
TRUE	Somewhat disagree	2
TRUE	Neither agree nor disagree	4
TRUE	Somewhat agree	17
TRUE	Strongly agree	199

# correlation matrix with variances on the diagonal
adhd.data %>%
  select(c(interdep, intcheck.int,
           disclose, discheck.int)) %>%
  correlate(diagonal = map_dbl(., ~ var(.))) %>%
  shave %>%
  fashion(leading_zeros = F) %>%
  formatAsTable

term	interdep	intcheck.int	disclose	discheck.int
interdep	.25
intcheck.int	.76	2.65
disclose	-.04	.04	.25
discheck.int	-.05	.07	.94	3.69

Clearly, both manipulations were effective, but disclosure was much more effective than interdep. To be honest, I’m not sure why. My two best guesses are that

interdep came first and the participants forgot what they had read by the time they got to the manipulation check or
either the vignette or manipulation check were somehow confusing.

I do think interdependence is a more difficult concept to grok than disclosure. Disclosure is just “Alex told me”. Interdependence requires making sense of your relationship with another person. All in all, I think it’s good enough.

Measured variables

adhd.data %>%
  select(matches("\\w{3}\\d.int")) %>%
  describe(fast = T) %>%
  as_tibble(rownames = "var") %>%
  select(-vars) %>%
  mutate(across(c(n, min, max, range), as.integer)) %>%
  formatAsTable

var	n	mean	sd	min	max	range	se
aff1.int	444	3.42	1.22	1	5	4	0.06
aff2.int	444	3.65	1.04	1	5	4	0.05
aff3.int	444	3.53	1.07	1	5	4	0.05
aff4.int	444	3.56	1.01	1	5	4	0.05
cog1.int	444	3.78	0.90	1	5	4	0.04
cog2.int	444	3.66	1.10	1	5	4	0.05
cog3.int	444	3.27	1.12	1	5	4	0.05
cog4.int	444	3.26	1.13	1	5	4	0.05
lik1.int	444	3.79	0.84	1	5	4	0.04
lik2.int	444	3.72	0.89	1	5	4	0.04
lik3.int	444	2.16	1.06	1	5	4	0.05
lik4.int	444	1.71	0.88	1	5	4	0.04

Most of the means are hovering around the middle of the scale, which isn’t bad to see. And I can tell at a glance that most people said they liked Alex, which is kind of interesting. Standard deviations are somewhat low; hopefully that doesn’t become a problem when computing regressions.

Factor Model

What are the reliabilities of the measures? Using the lavaan package, I will do a confirmatory factor analysis of the twelve items. I’m planning to use McDonald’s \(\omega\) in addition to Cronbach’s \(\alpha\) because it performs better and is preferable especially where there is skew.

cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
              cog =~ cog1 + cog2 + cog3 + cog4
              lik =~ lik1 + lik2 + lik3 + lik4'

cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)

cfa.fit %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

stat	aff	cog	lik
alpha	0.91	0.44	-0.49
omega	0.88	0.55	0.07
omega2	0.88	0.55	0.07
omega3	0.88	0.60	0.06
avevar	0.71	0.41	0.50

The reliabilities for cog and lik are low because some of their items are reverse-coded. I will try again.

adhd.data <- adhd.data %>%
  mutate(across(c(cog4, lik3, lik4),
                fct_rev,
                .names = "{.col}.r"),
         lik4.r = fct_collapse(lik4.r,
                               agree = c("Strongly agree",
                                         "Somewhat agree")))

cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
              cog =~ cog1 + cog2 + cog3 + cog4.r
              lik =~ lik1 + lik2 + lik3.r + lik4.r'

cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)

cfa.fit %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

stat	aff	cog	lik
alpha	0.91	0.66	0.80
omega	0.88	0.61	0.72
omega2	0.88	0.61	0.72
omega3	0.87	0.56	0.70
avevar	0.71	0.40	0.51

Ouch, cog was not a terribly reliable measure.

Anyway, as long as I have the CFA model, I might as well look at the loadings and fit statistics.

# loadings
cfa.fit %>%
  parameterEstimates %>%
  filter(op == "=~") %>%
  formatAsTable

lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper
aff	=~	aff1	0.94	0.02	47.15	0.00	0.90	0.98
aff	=~	aff2	0.99	0.01	65.95	0.00	0.96	1.02
aff	=~	aff3	1.06	0.01	71.92	0.00	1.03	1.09
aff	=~	aff4	1.01	0.01	71.29	0.00	0.99	1.04
cog	=~	cog1	1.67	0.06	26.25	0.00	1.55	1.80
cog	=~	cog2	1.16	0.05	23.05	0.00	1.06	1.25
cog	=~	cog3	0.98	0.06	16.39	0.00	0.87	1.10
cog	=~	cog4.r	0.19	0.08	2.28	0.02	0.03	0.35
lik	=~	lik1	1.29	0.04	36.86	0.00	1.22	1.36
lik	=~	lik2	1.08	0.03	35.47	0.00	1.02	1.14
lik	=~	lik3.r	0.65	0.04	15.50	0.00	0.57	0.74
lik	=~	lik4.r	0.97	0.04	27.67	0.00	0.90	1.04

# fit statistics
m <- c("chisq", "df", "pvalue", "rmsea", "tli")
cfa.fit %>%
  fitMeasures(fit.measures = m) %>%
  round(3) %>%
  as_tibble(rownames = 'stat') %>%
  formatAsTable

stat	value
chisq	290.53
df	51.00
pvalue	0.00
rmsea	0.10
tli	0.98

Interesting. It does look like cog4 deviated from the other cog items. Still loaded significantly, though.

I’m curious how this would all look if we dropped the potentially problematic observations I previously identified.

adhd.data.hq <- adhd.data %>%
  # exclude low-effort participants
  filter(!if_any(c(random_clicker, nonpart, ftl)))

cfa.fit.hq <- cfa.model %>%
  cfa(adhd.data.hq, effect.coding = T)

cfa.fit.hq %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

stat	aff	cog	lik
alpha	0.91	0.67	0.80
omega	0.88	0.61	0.73
omega2	0.88	0.61	0.73
omega3	0.88	0.57	0.71
avevar	0.71	0.41	0.51

cfa.fit.hq %>%
  parameterEstimates %>%
  filter(op == "=~") %>%
  formatAsTable

lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper
aff	=~	aff1	0.93	0.02	44.16	0.00	0.89	0.97
aff	=~	aff2	0.99	0.02	64.73	0.00	0.96	1.02
aff	=~	aff3	1.06	0.02	69.58	0.00	1.03	1.09
aff	=~	aff4	1.01	0.01	69.31	0.00	0.98	1.04
cog	=~	cog1	1.67	0.07	25.64	0.00	1.54	1.80
cog	=~	cog2	1.16	0.05	21.70	0.00	1.05	1.26
cog	=~	cog3	0.98	0.06	15.37	0.00	0.85	1.10
cog	=~	cog4.r	0.20	0.08	2.39	0.02	0.04	0.36
lik	=~	lik1	1.30	0.04	35.51	0.00	1.23	1.37
lik	=~	lik2	1.08	0.03	32.75	0.00	1.01	1.14
lik	=~	lik3.r	0.63	0.05	13.82	0.00	0.54	0.72
lik	=~	lik4.r	0.99	0.04	25.89	0.00	0.92	1.07

Looks like cog4 had some marginal improvement that shows up in both the loading and the reliability.

I will run it one more time without cog4 and we’ll see if that improves things.

cfa.fit.cog <- cfa.model %>%
  str_remove(fixed(" + cog4.r")) %>%
  cfa(adhd.data.hq, effect.coding = T)

cfa.fit.cog %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

stat	aff	cog	lik
alpha	0.91	0.75	0.80
omega	0.88	0.70	0.73
omega2	0.88	0.70	0.73
omega3	0.88	0.69	0.71
avevar	0.71	0.54	0.51

Nice. The McDonald’s \(\omega\) for cog increased to 0.7!

Save data:

adhd.data %>%
  saveRDS(file.path("..", "data", "adhd-data.rds"))

adhd.data %>%
  write_csv(file.path("..", "data", "adhd-data.csv"))

adhd.data.hq %>%
  saveRDS(file.path("..", "data", "hq-data.rds"))

adhd.data.hq %>%
  write_csv(file.path("..", "data", "hq-data.csv"))

Output document:

options(knitr.duplicate.label = "allow")
rmarkdown::render("descriptives.Rmd",
                  output_dir = file.path("..", "github", "thesis"))

Descriptive Statistics

5/28/2021

Basic descriptives

Manipulated variables

Measured variables

Factor Model