In this first document, I am going to create mock data, produce descriptive statistics, and analyze the reliabilities of the measures.

Before anything else, I need to import the packages I will be using.

# import packages
library(tidyverse)

library(flextable)

library(psych)
library(lavaan)
library(semTools)

Creating mock data

The first step I will take is to initalize a fake dataset that I expect to look like our actual dataset once we have it. I’ll start with the manipulations and checks.

# set sample size
n = 400

# create a dataframe with first 5 variables
mydata <- tibble(
  id = 1:n,
  interdep = rbinom(n, 1, .5),
  disclose = rbinom(n, 1, .5),
  intcheck = sample.int(4, n, T, c(.4, .4, .1, .1)),
  discheck = sample.int(4, n, T, c(.4, .4, .1, .1))
)

Now, I will add the measures (items).

bound <- function(x) {
  ifelse(
    x < -2, -2,
    ifelse(
      x > 2, 2, x
    )
  )
}

# define a function to randomly sample along Likert scale
likert <- function(n, mode) {
  weights <- map_dbl(1:5, ~ (2 ** (mode - abs(mode - .))))
  seed <- sample(-2:2, n, T,
             map_dbl(weights, ~ (. / sum(weights))))
  map_dfc(
    1:4,
    ~ bound(seed + sample(-1:1, n, T))
  )
}

# add a column of fake data for each item
mydata <- mydata %>%
  full_join(
    map_dfc(c(5, 1, 4), ~ likert(400, .)) %>%
      add_column(id = 1:400),
    "id"
    )

# name columns for items
names(mydata)[6:17] <- c("aff", "cog", "lik") %>%
  map(~ map(1:4, ~ paste(.y, .x, sep=""), .y = .x)) %>%
  unlist

Let’s see how that looks. I’m going to create a simple function to format a table.

# turn dataframe into html table
formatAsTable <- function(data) {
  data %>%
    flextable %>%
    color(color = "white", part = "all") %>%
    autofit
}

I have a feeling this function will be useful in future documents, so I will save it to an R object to use again later.

formatAsTable %>%
  saveRDS("format.rds")

Now let’s look at the data.

mydata %>%
  head %>%
  formatAsTable

We are in business now.

Before moving on, I will export the mock data to a file so I can access it from other pages.

write_csv(mydata, file.path("..", "github", "thesis", "mock.csv"))

Descriptive statistics

I’m now going to examine some descriptive statistics for the data, including the means and standard deviations of each variable. The manipulation check variables are not interpretable as even conceptually continuous, so I will exclude them for the moment.

# subset without id or checks
cont.data <- mydata[-c(1, 4, 5)]

cont.data %>%
  describe(fast = T) %>%
  mutate(vars = names(cont.data)) %>%
  formatAsTable

Manipulation checks

Now we can quickly check the frequencies for the manipulation checks.

The correct answers are 1 and 2 for experimental and control conditions, respectively. I will add a new variable that represents whether the participants responded correctly.

isCorrect <- function(iv, check) iv == -check + 2

mydata <- mydata %>%
  mutate(
    intcorrect = isCorrect(interdep, intcheck),
    discorrect = isCorrect(disclose, discheck)
  )

I’ll just quickly throw the code in a function to make the tables of correct answers by manipulation.

correctCount <- function(...) {
  mydata %>%
    count(...) %>%
    filter(if_any(ends_with('correct'))) %>%
    select(!ends_with('correct')) %>%
    add_column(id = 1:2)
}

Finally, we can look at the tables of correct answers.

correctCount(interdep, intcorrect) %>%
  inner_join(
    correctCount(disclose, discorrect), 'id'
    ) %>%
  select(!id) %>%
  rename(n.int = n.x, n.dis = n.y) %>%
  formatAsTable

How many people got both answers correct?

total <- sum(mydata$intcorrect & mydata$discorrect)

62 participants aced the test.

Factor Model

What are the reliabilities of the measures? Using the lavaan package, I will do a confirmatory factor analysis of the twelve items. I’m planning to use McDonald’s \(\omega\) in addition to Cronbach’s \(\alpha\) because it performs better and is preferable especially where there is skew.

cfa.model <- 'aff =~ aff1 + aff2 + aff3 + aff4
              cog =~ cog1 + cog2 + cog3 + cog4
              lik =~ lik1 + lik2 + lik3 + lik4'

cfa.fit <- cfa(cfa.model, mydata, effect.coding = T)

cfa.fit %>%
  reliability %>%
  as_tibble(rownames = "stat") %>%
  formatAsTable

As long as I have the CFA model, I might as well look at the loadings and fit statistics.

# loadings
cfa.fit %>%
  parameterEstimates %>%
  filter(op == "=~") %>%
  formatAsTable
# fit statistics
m <- c("chisq", "df", "pvalue", "rmsea", "tli")
cfa.fit %>%
  fitMeasures(fit.measures = m) %>%
  round(3) %>%
  as_tibble(rownames = 'stat') %>%
  formatAsTable

Intercorrelations

How do the variables relate to each other?

cor.model <- cfa.model %>%
  paste( "intfac =~ interdep",
         "disfac =~ disclose",
         sep = "\n")

cor.fit <- cfa(cor.model, mydata, effect.coding = T)

cor.fit %>%
  lavInspect("cor.lv") %>%
  round(3) %>%
  as_tibble(rownames = 'var') %>%
  formatAsTable

Output document:

rmarkdown::render("prework.Rmd", output_dir = file.path("..", "github", "thesis"))