I collected data on a number of categorical variables not directly connected to my hypotheses. Using those variables, there are a few types of exploratory analyses I’d like to undertake here.

First, I want to collect some basic descriptive statistics on the additional variables to cite in the participants’ section of the thesis.

Second, I want to know if any of the additional variables correlate with my three measured variables.

Third, I want to know if any of them should be used as controls in my hypothesis tests.

The extra variables I have are:

  • gender: male or female
  • ethnic: standard racial categories used in the United States
  • age (self-explanatory)
  • employ: standard economic measure of employment status
  • educat: the highest degree earned
  • party: political stance on left-right spectrum
  • adhd: who the participant knows with ADHD
  • work: work history with white-, blue-, and pink-collar occupations

For all of the above, participants could click ‘Other’ and write in whatever they chose. A small, but significant, number of participants chose to do so.

As usual, I will load my packages and import the data.

library(corrr)
library(tidyverse)

library(flextable)

adhd.data <- readRDS(file.path("..", "data", "hq-data.rds"))
formatAsTable <- readRDS("format.rds")

Descriptive Statistics

extra <- c("gender", "ethnic", "age", "employ",
           "educat", "party", "adhd", "work")

adhd.data %>%
  select(extra) %>%
  head %>%
  formatAsTable

Gender

adhd.data %>%
  count(gender) %>%
  formatAsTable

Race

adhd.data %>%
  select(contains("ethnic_") & !ethnic_5_text) %>%
  map_int(sum) %>%
  enframe("var", "count") %>%
  arrange(desc(count)) %>%
  formatAsTable

Age

plotTheme <- function() {
  theme(plot.background = element_rect(fill = '#3b434f'),
        panel.background = element_rect(fill = '#3b434f'),
        text = element_text(color = "wheat"),
        panel.grid = element_line(color = "wheat"),
        axis.text = element_text(color = "wheat"))
}

adhd.data %>%
  ggplot(aes(age)) +
  geom_histogram(fill = "antiquewhite4") +
  plotTheme()

Employment Status

adhd.data %>%
  count(employ) %>%
  arrange(desc(n)) %>%
  formatAsTable

Political Ideology

adhd.data %>%
  count(party) %>%
  arrange(desc(n)) %>%
  formatAsTable

ADHD Relationships

adhd.data %>%
  select(contains("adhd_") & !contains("text")) %>%
  map_int(sum) %>%
  enframe("var", "count") %>%
  arrange(desc(count)) %>%
  formatAsTable
somebody.vars <- c("friend", "family", "acquaintance",
                   "coworker", "classmate") %>%
  paste0("adhd_", .)

adhd.data <- adhd.data %>%
  rowwise() %>%
  mutate(adhd_somebody = any(unlist(across(somebody.vars))),
         adhd_simple = factor(ifelse(adhd_myself, "myself",
                                     ifelse(adhd_somebody, "somebody",
                                            "nobody"))))

adhd.data %>%
  count(adhd_simple) %>%
  arrange(desc(n)) %>%
  formatAsTable

Work History

adhd.data %>%
  select(contains("work_") & !contains("text")) %>%
  map_int(sum) %>%
  enframe("var", "count") %>%
  arrange(desc(count)) %>%
  formatAsTable

Education

adhd.data %>%
  count(educat) %>%
  arrange(desc(n)) %>%
  formatAsTable

Intercorrelations

Controls


Output document:

options(knitr.duplicate.label = "allow")
rmarkdown::render("exploratory.Rmd",
                  output_dir = file.path("..", "github", "thesis"))