I now have all the data collected (unless I have to throw some out and replace it), so let’s look at some descriptive statistics!

First, import packages.

# import packages
library(corrr)
library(tidyverse)

library(flextable)

library(psych)
library(lavaan)
library(semTools)

Import my R objects from previous documents.

formatAsTable <- readRDS("format.rds")
adhd.data <- readRDS(file.path("..", "data", "quality-data.rds"))

Basic descriptives

Manipulated variables

The manipulations should have been randomized 50/50 by Qualtrics such that we end up with about 100 per cell, but let’s check.

adhd.data %>%
  count(interdep, disclose) %>%
  formatAsTable

Sweet, that looks just about right.

And I’ve already looked at the manipulation checks in a previous document, but I’ll port that analysis in here.

adhd.data <- adhd.data %>%
  mutate(across(where(is.factor),
                ~ as.integer(.),
                .names = "{.col}.int"))

adhd.data %>%
  count(interdep, intcheck) %>%
  arrange(interdep, intcheck) %>%
  formatAsTable %>%
  bg(bg = "#074005",
     i = ~ ifelse(interdep == 0,
                  as.integer(intcheck) < 3,
                  as.integer(intcheck) > 3))
adhd.data %>%
  count(disclose, discheck) %>%
  arrange(disclose, discheck) %>%
  formatAsTable %>%
  bg(bg = "#074005",
     i = ~ ifelse(disclose == 0,
                  as.integer(discheck) < 3,
                  as.integer(discheck) > 3))
# correlation matrix with variances on the diagonal
adhd.data %>%
  select(c(interdep, intcheck.int,
           disclose, discheck.int)) %>%
  correlate(diagonal = map_dbl(., ~ var(.))) %>%
  shave %>%
  fashion(leading_zeros = F) %>%
  formatAsTable

Clearly, both manipulations were effective, but disclosure was much more effective than interdep. To be honest, I’m not sure why. My two best guesses are that

  1. interdep came first and the participants forgot what they had read by the time they got to the manipulation check or

  2. either the vignette or manipulation check were somehow confusing.

I do think interdependence is a more difficult concept to grok than disclosure. Disclosure is just “Alex told me”. Interdependence requires making sense of your relationship with another person. All in all, I think it’s good enough.

Measured variables

adhd.data %>%
  select(matches("\\w{3}\\d.int")) %>%
  describe(fast = T) %>%
  as_tibble(rownames = "var") %>%
  select(-vars) %>%
  mutate(across(c(n, min, max, range), as.integer)) %>%
  formatAsTable

Most of the means are hovering around the middle of the scale, which isn’t bad to see. And I can tell at a glance that most people said they liked Alex, which is kind of interesting. Standard deviations are somewhat low; hopefully that doesn’t become a problem when computing regressions.

Factor Model

What are the reliabilities of the measures? Using the lavaan package, I will do a confirmatory factor analysis of the twelve items. I’m planning to use McDonald’s \(\omega\) in addition to Cronbach’s \(\alpha\) because it performs better and is preferable especially where there is skew.

cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
              cog =~ cog1 + cog2 + cog3 + cog4
              lik =~ lik1 + lik2 + lik3 + lik4'

cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)

cfa.fit %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

The reliabilities for cog and lik are low because some of their items are reverse-coded. I will try again.

adhd.data <- adhd.data %>%
  mutate(across(c(cog4, lik3, lik4),
                fct_rev,
                .names = "{.col}.r"),
         lik4.r = fct_collapse(lik4.r,
                               agree = c("Strongly agree",
                                         "Somewhat agree")))

cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
              cog =~ cog1 + cog2 + cog3 + cog4.r
              lik =~ lik1 + lik2 + lik3.r + lik4.r'

cfa.fit <- cfa(cfa.model, adhd.data, effect.coding = T)

cfa.fit %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

Ouch, cog was not a terribly reliable measure.

Anyway, as long as I have the CFA model, I might as well look at the loadings and fit statistics.

# loadings
cfa.fit %>%
  parameterEstimates %>%
  filter(op == "=~") %>%
  formatAsTable
# fit statistics
m <- c("chisq", "df", "pvalue", "rmsea", "tli")
cfa.fit %>%
  fitMeasures(fit.measures = m) %>%
  round(3) %>%
  as_tibble(rownames = 'stat') %>%
  formatAsTable

Interesting. It does look like cog4 deviated from the other cog items. Still loaded significantly, though.

I’m curious how this would all look if we dropped the potentially problematic observations I previously identified.

adhd.data.hq <- adhd.data %>%
  # exclude low-effort participants
  filter(!if_any(c(random_clicker, nonpart, ftl)))

cfa.fit.hq <- cfa.model %>%
  cfa(adhd.data.hq, effect.coding = T)

cfa.fit.hq %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable
cfa.fit.hq %>%
  parameterEstimates %>%
  filter(op == "=~") %>%
  formatAsTable

Looks like cog4 had some marginal improvement that shows up in both the loading and the reliability.

I will run it one more time without cog4 and we’ll see if that improves things.

cfa.fit.cog <- cfa.model %>%
  str_remove(fixed(" + cog4.r")) %>%
  cfa(adhd.data.hq, effect.coding = T)

cfa.fit.cog %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

Nice. The McDonald’s \(\omega\) for cog increased to 0.7!


Save data:

adhd.data %>%
  saveRDS(file.path("..", "data", "adhd-data.rds"))

adhd.data %>%
  write_csv(file.path("..", "data", "adhd-data.csv"))

adhd.data.hq %>%
  saveRDS(file.path("..", "data", "hq-data.rds"))

adhd.data.hq %>%
  write_csv(file.path("..", "data", "hq-data.csv"))

Output document:

options(knitr.duplicate.label = "allow")
rmarkdown::render("descriptives.Rmd",
                  output_dir = file.path("..", "github", "thesis"))