Load packages
# Tidyverse #
library(readxl)
library(rlang)
library(broom)
library(corrr)
library(glue)
library(magrittr)
library(tidyverse)
# Markdown #
library(rmarkdown)
library(tinytex)
library(knitr)
library(printr)
library(kableExtra)
library(citr)
library(pander)
# Other #
library(psych)
library(here)According to Neter et al. (1989), piecewise linear regression is useful when there is a linear relationship between the independent and dependent variables, but that linear relationship varies depending on the subset of the range of the independent variable. So, for example, somebody who is lonely will seek out other people to be with, but someone above a certain critical point of loneliness will stay isolated instead. The effect of loneliness changes after a certain point.
The advantage of piecewise linear regression is that it can account for a change in a theoretical effect. It is also relatively simple to interpret, because it is still essentially a linear relationship: a unit change in \(X\) creates a \(\beta_1\) change in \(Y\). A disadvantage is that few effects in nature should have dramatic changes in their effects at a single point.
Polynomial regression makes the opposite tradeoffs. It is more difficult to interpret a quadratic (let alone a higher-order) term than a linear term, but it is more theoretically justifiable because it is continuously changing and has a more gradual change in slope at its inflection.
According to Pedhazur & Schmelkin (1991, p. 452), theoretical considerations come first. Does the theory suggest a piecewise effect? Or does it suggest a polynomial model would better represent the true effect?
First, I imported the data and made the variable names lowercase.
tbl.2 <- read_xlsx(here('data', 'nonlin.xlsx'))
colnames(tbl.2) <- tolower(colnames(tbl.2))Then I looked at the raw data and descriptive statistics.
ks <- function(x, c) {
  kable(x, caption = c, escape = F, digits = 3) %>%
  kable_styling(full_width = F, position = 'left',
                bootstrap_options = c("striped", "hover", "condensed"))
  }
ks(tbl.2[1:10,], 'Job Satisfaction Data (first 10 observations)')| comd | coms | payd | pays | socd | socs | supd | sups | tvld | tvls | vard | vars | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.000 | 6.000 | -1.333 | 3.333 | 0.000 | 5.000 | 0.333 | 3.000 | 2.000 | 5.333 | -1.000 | 6.000 | 
| 0.000 | 6.333 | -1.667 | 6.333 | 0.000 | 6.667 | 1.333 | 4.333 | 0.000 | 6.667 | 1.000 | 4.667 | 
| 0.000 | 4.000 | 0.000 | 5.667 | -0.333 | 4.000 | 1.000 | 3.333 | 0.000 | 4.000 | -1.000 | 3.000 | 
| 0.000 | 7.000 | -1.000 | 6.000 | 0.000 | 4.667 | 0.000 | 5.000 | 0.000 | 6.000 | -1.000 | 5.000 | 
| 0.333 | 5.000 | 0.000 | 7.000 | 1.000 | 6.000 | 1.000 | 6.000 | 1.000 | 6.000 | 0.000 | 7.000 | 
| 1.000 | 3.000 | -2.000 | 5.000 | -1.000 | 5.000 | -0.333 | 5.000 | 1.000 | 3.667 | -1.667 | 5.000 | 
| 2.000 | 2.000 | -2.000 | 2.667 | -0.667 | 4.000 | 0.000 | 5.333 | -2.000 | 3.667 | -0.333 | 5.000 | 
| 0.333 | 3.000 | -0.333 | 7.000 | 0.000 | 7.000 | 0.000 | 6.667 | 0.000 | 7.000 | 0.000 | 7.000 | 
| 1.000 | 3.000 | -1.000 | 7.000 | 0.000 | 6.333 | 0.000 | 6.000 | 2.000 | 3.000 | -1.333 | 5.000 | 
| 2.000 | 3.000 | -0.667 | 3.333 | 0.000 | 4.000 | -0.333 | 4.000 | 1.333 | 3.000 | 0.000 | 4.000 | 
describe(tbl.2) %>%
  ks('Descriptive Statistics')| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| comd | 1 | 185 | 0.492 | 0.883 | 0.000 | 0.434 | 0.494 | -2.000 | 3.000 | 5.000 | 0.579 | 0.190 | 0.065 | 
| coms | 2 | 185 | 4.536 | 1.469 | 4.333 | 4.538 | 1.977 | 1.000 | 7.000 | 6.000 | 0.011 | -0.853 | 0.108 | 
| payd | 3 | 185 | -1.314 | 0.932 | -1.333 | -1.280 | 0.988 | -3.000 | 0.333 | 3.333 | -0.070 | -1.061 | 0.069 | 
| pays | 4 | 185 | 4.948 | 1.514 | 5.000 | 5.051 | 1.483 | 1.000 | 7.000 | 6.000 | -0.545 | -0.439 | 0.111 | 
| socd | 5 | 185 | -0.189 | 0.772 | 0.000 | -0.228 | 0.000 | -2.333 | 2.667 | 5.000 | 0.468 | 1.872 | 0.057 | 
| socs | 6 | 185 | 4.637 | 1.196 | 4.333 | 4.628 | 0.988 | 1.000 | 7.000 | 6.000 | 0.012 | 0.015 | 0.088 | 
| supd | 7 | 185 | 0.213 | 0.869 | 0.000 | 0.204 | 0.741 | -2.333 | 3.000 | 5.333 | 0.176 | 0.904 | 0.064 | 
| sups | 8 | 185 | 4.546 | 1.218 | 4.333 | 4.537 | 1.483 | 1.000 | 7.000 | 6.000 | -0.027 | -0.396 | 0.090 | 
| tvld | 9 | 185 | 0.044 | 1.142 | 0.000 | 0.030 | 1.483 | -3.000 | 3.000 | 6.000 | 0.084 | -0.184 | 0.084 | 
| tvls | 10 | 185 | 4.626 | 1.369 | 4.667 | 4.643 | 1.483 | 1.000 | 7.000 | 6.000 | -0.193 | -0.406 | 0.101 | 
| vard | 11 | 185 | -0.721 | 0.859 | -0.667 | -0.660 | 0.988 | -3.000 | 1.667 | 4.667 | -0.435 | 0.264 | 0.063 | 
| vars | 12 | 185 | 5.257 | 1.239 | 5.333 | 5.334 | 0.988 | 1.333 | 7.000 | 5.667 | -0.445 | -0.256 | 0.091 | 
It looks like we have 12 variables, or six pairs of discrepancy and satisfaction variables. The discrepancy variables range from about -3 to about 3 and the satisfaction variables range from about 1 to 7, so everything matches the description in the assignment.
Next, I created dummy-coded variables for \(X_2\), such that the variable would be 0 when the discrepancy variable was less than 0 and 1 otherwise. I also multiplied \(X_2\) by the discrepancy variable, \(X_1\), to create the product term in Equation 10.24 (Neter et al., 1989, p. 370).
vec.2 <- tbl.2 %>%
  colnames() %>%
  str_sub(1, 3) %>%
  unique()
vec.2.d <- vec.2 %>%
  paste('d', sep = '')
tbl.2.a <- tbl.2 %>%
  mutate_at(vec.2.d, list(f = ~ ifelse(. >= 0, 1, 0), p = ~ (. >= 0) * .)) %>%
  rename_all(~ str_remove(., 'd_'))Once I had the dummy-coded variables and product terms ready, I created six piecewise functions and estimated their parameters.
# in:  att = 3-character name of attribute
# out: piecewise linear function
fun.2 <- function(att, data) {
  f <- paste(att, 's ~ ', att, 'd + ', att, 'p', sep = '') %>%
    as.formula()
  f %>%
    lm(data)
}
tbl.2.b <- vec.2 %>%
  map_df(~ fun.2(., tbl.2.a) %>% tidy())
tbl.2.b %>%
  ks('Regression Output for Piecewise Functions')| term | estimate | std.error | statistic | p.value | 
|---|---|---|---|---|
| (Intercept) | 5.110 | 0.122 | 42.014 | 0.000 | 
| comd | 0.513 | 0.343 | 1.494 | 0.137 | 
| comp | -1.465 | 0.385 | -3.808 | 0.000 | 
| (Intercept) | 5.967 | 0.172 | 34.625 | 0.000 | 
| payd | 0.775 | 0.107 | 7.254 | 0.000 | 
| payp | -0.675 | 4.062 | -0.166 | 0.868 | 
| (Intercept) | 5.034 | 0.103 | 48.741 | 0.000 | 
| socd | 0.723 | 0.155 | 4.667 | 0.000 | 
| socp | -1.678 | 0.262 | -6.394 | 0.000 | 
| (Intercept) | 4.906 | 0.112 | 43.733 | 0.000 | 
| supd | 0.036 | 0.192 | 0.185 | 0.853 | 
| supp | -0.889 | 0.267 | -3.327 | 0.001 | 
| (Intercept) | 5.373 | 0.124 | 43.353 | 0.000 | 
| tvld | 0.762 | 0.137 | 5.558 | 0.000 | 
| tvlp | -1.793 | 0.218 | -8.225 | 0.000 | 
| (Intercept) | 5.767 | 0.121 | 47.699 | 0.000 | 
| vard | 0.628 | 0.108 | 5.813 | 0.000 | 
| varp | -1.400 | 0.433 | -3.232 | 0.001 | 
The (Intercept), d, and p terms correspond to \(\beta_0\), \(\beta_1\), and \(\beta_2\), respectively. So, for example, the piecewise model for commute time has a \(\hat{\beta_1}\) of 0.513.
Are the functions symmetric about 0? That is equivalent to the hypothesis \[\beta_2 = -2\beta_1\], because \(\beta_2\) is the additional slope added to \(\beta_1\) when \(X_1 > 0\). When it is \(-2\beta_1\), the slopes for the two pieces of the function will be \(\beta_1\) and \(\beta_1 + \beta_2 = \beta_1 - 2\beta_1 = -\beta_1\).
To test this hypothesis, I computed \(\beta_2 - 2\beta_1\) for each model.
tbl.2.c <- tbl.2.b %>%
  mutate(att = vec.2 %>% rep(each = 3))
tbl.2.c %>%
  filter(term != '(Intercept)') %>%
  group_by(att) %>%
  summarise(diff = (function(x) x[2] - 2 * x[1])(estimate)) %>%
  ks('Tests of Symmetry Hypothesis')## `summarise()` ungrouping output (override with `.groups` argument)| att | diff | 
|---|---|
| com | -2.491 | 
| pay | -2.225 | 
| soc | -3.124 | 
| sup | -0.960 | 
| tvl | -3.318 | 
| var | -2.656 | 
The results show that only closeness of supervision might be symmetrical. Without doing an inference test, I don’t whether it is significantly different from 0, but I feel confident that all the others are very different from 0.
To visualize the results, I plotted each model.
fun.2.b <- function(att, data) {
  str.1 <- paste(att, 'd', sep = '')
  str.2 <- paste(att, 's', sep = '')
  str.3 <- paste(att, 'p', sep = '')
  dbl.1 <- data[[str.1]]
  dbl.2 <- c(min(dbl.1), 0, max(dbl.1))
  tbl.1 <- tibble(!!str.1 := dbl.2,
                  !!str.3 := c(0, 0, max(dbl.1)))
  fit.1 <- fun.2(att, data)
  dbl.3 <- predict(fit.1, tbl.1)
  
  tbl.2.a %>%
    ggplot(aes_string(x = str.1, y = str.2)) +
    geom_segment(aes(x = dbl.2[1], y = dbl.3[1],
                     xend = dbl.2[2], yend = dbl.3[2])) +
    geom_segment(aes(x = dbl.2[2], y = dbl.3[2],
                     xend = dbl.2[3], yend = dbl.3[3])) +
    geom_point(size = 1) +
    labs(x = str.1, y = str.2, title = paste("Piecewise Plot of", att)) +
    xlim(-3, 3)
}
vec.2 %>%
  map(~ fun.2.b(., tbl.2.a))Interesting. The model for closeness of supervision (sup) looks like the least symmetrical attribute, perhaps only after pay, which has hilariously little data above payd = 0. The plot of travel looks the most symmetrical. I’m not sure what I did wrong.
I reran the same analysis, this time giving parameters to both \(X_1 \times X_2\) and \(X_2\), as described in Equation 10.27 (Neter et al., 1989, p. 373).
# in:  att = 3-character name of attribute
# out: discontinuous piecewise linear function
fun.2.c <- function(att, data) {
  paste(att, 's ~ ', att, 'd + ', att, 'p +', att, 'f', sep = '') %>%
    as.formula() %>%
    lm(data)
}
tbl.2.f <- vec.2 %>%
  map_df(~ fun.2.c(., tbl.2.a) %>% tidy())
tbl.2.f %>%
  ks('Regression Output for Discontinuous Piecewise Functions')| term | estimate | std.error | statistic | p.value | 
|---|---|---|---|---|
| (Intercept) | 4.029 | 0.605 | 6.659 | 0.000 | 
| comd | -0.428 | 0.618 | -0.692 | 0.490 | 
| comp | -0.552 | 0.630 | -0.877 | 0.382 | 
| comf | 1.126 | 0.617 | 1.824 | 0.070 | 
| (Intercept) | 5.683 | 0.255 | 22.279 | 0.000 | 
| payd | 0.631 | 0.143 | 4.403 | 0.000 | 
| payp | -1.237 | 4.065 | -0.304 | 0.761 | 
| payf | 0.519 | 0.345 | 1.504 | 0.134 | 
| (Intercept) | 5.084 | 0.315 | 16.147 | 0.000 | 
| socd | 0.766 | 0.300 | 2.553 | 0.012 | 
| socp | -1.717 | 0.351 | -4.888 | 0.000 | 
| socf | -0.056 | 0.333 | -0.169 | 0.866 | 
| (Intercept) | 4.741 | 0.342 | 13.865 | 0.000 | 
| supd | -0.104 | 0.335 | -0.312 | 0.756 | 
| supp | -0.764 | 0.363 | -2.104 | 0.037 | 
| supf | 0.185 | 0.362 | 0.511 | 0.610 | 
| (Intercept) | 4.872 | 0.356 | 13.694 | 0.000 | 
| tvld | 0.434 | 0.258 | 1.683 | 0.094 | 
| tvlp | -1.507 | 0.289 | -5.217 | 0.000 | 
| tvlf | 0.569 | 0.379 | 1.501 | 0.135 | 
| (Intercept) | 5.015 | 0.215 | 23.338 | 0.000 | 
| vard | 0.153 | 0.154 | 0.989 | 0.324 | 
| varp | -1.194 | 0.418 | -2.856 | 0.005 | 
| varf | 1.061 | 0.255 | 4.160 | 0.000 | 
In this output, the terms that end in f correspond to \(\beta_3\) in Equation 10.27. And when \(\beta_3\) is not 0, there is a discontinuity where the discrepancy variable is 0. In the results table above, the only case where a \(\beta_3\) appears to be approximately 0 is socf, corresponding to span of control. For all the others, there is a jump at 0. And because they are all positive, I believe that means that the there’s a jump up from less-than-adequate to adequate.
This analysis tells us that, with the exception of span of control, all the attributes under consideration are discontinuous. And the fact that the discontinuity is one-sided suggests that participants were less dissatisfied with more-than-adequate attributes than with less-than-adequate attributes.
Finally, before moving on to polynomial regression, let’s take a look at the discontinuous model plots.
fun.2.b <- function(att, results, data) {
  res <- filter(results, att == att)[["estimate"]]
  str.1 <- paste(att, 'd', sep = '')
  str.2 <- paste(att, 's', sep = '')
  str.3 <- paste(att, 'p', sep = '')
  str.4 <- paste(att, 'f', sep = '')
  dbl.1 <- data[[str.1]]
  dbl.2 <- c(min(dbl.1), 0, max(dbl.1))
  tbl.1 <- tibble(!!str.1 := dbl.2,
                  !!str.3 := c(0, 0, max(dbl.1)),
                  !!str.4 := c(0, 1, 1))
  fit.1 <- fun.2.c(att, data)
  dbl.3 <- predict(fit.1, tbl.1)
  
  tbl.2.a %>%
    ggplot(aes_string(x = str.1, y = str.2)) +
    # Left piece
    geom_segment(aes(x = dbl.2[1], y = dbl.3[1],
                     xend = 0, yend = dbl.3[2] - res[4])) +
    # Jump
    geom_segment(aes(x = 0, y = dbl.3[2] - res[4],
                     xend = 0, yend = dbl.3[2])) +
    # Right piece
    geom_segment(aes(x = 0, y = dbl.3[2],
                     xend = dbl.2[3], yend = dbl.3[3])) +
    geom_point(size = 1) +
    labs(x = str.1, y = str.2, title = paste("Discontinuity Plot of", att))
}
vec.2 %>%
  map(~ fun.2.b(., tbl.2.c, tbl.2.a))Across the board, they all have large jumps. That is a sign that the constrained piecewise models were definitely inappropriate.
I followed the procedure in (Pedhazur & Schmelkin, 1991) for hierarchical polynomial regression. I tested linear, quadratic, cubic, and quartic models and compared their improvements in fit using the \(R^2\) difference test in Equation 18.15 (Pedhazur & Schmelkin, 1991, p. 432).
First, I had to do some data management. I collapsed all the satisfaction variables into one variable called ‘sat’ and all the discrepancy variables into one variable called ‘dis’, with a grouping variable for the attributes called ‘att’.
tbl.3 <- tbl.2 %>%
  select_at(vars(ends_with('s'))) %>%
  stack() %>%
  as_tibble() %>%
  rename(sat = values, att = ind) %>%
  mutate(att = str_sub(att, 1, -2)) %>%
  add_column(dis = tbl.2 %>%
               select_at(vars(ends_with('d'))) %>%
               stack() %>% as_tibble() %>%
               pull("values")) %>%
  select(-att, everything())
tbl.3[1:10,] %>%
  ks('Rearranged Data (first 10 obs)')| sat | dis | att | 
|---|---|---|
| 6.000 | 0.000 | com | 
| 6.333 | 0.000 | com | 
| 4.000 | 0.000 | com | 
| 7.000 | 0.000 | com | 
| 5.000 | 0.333 | com | 
| 3.000 | 1.000 | com | 
| 2.000 | 2.000 | com | 
| 3.000 | 0.333 | com | 
| 3.000 | 1.000 | com | 
| 3.000 | 2.000 | com | 
Second, I regressed sat on dis for each attribute. Note that I used raw polynomials rather than orthogonal polynomials. Instead of showing the output for all 24 (6 attributes &$ 4 orders) polynomial models, I printed just the quadratic model for ‘com’ as an example.
fun.3.a <- function(ord) {
  f <- glue('fit{ord}')
  t <- glue('tdy{ord}')
  m <- glue('mod{ord}')
  exprs(!!{{f}} := map(data, ~ lm(sat ~ poly(dis, !!ord, raw = TRUE), .)),
        !!{{t}} := map(!!sym(f), tidy),
        !!{{m}} := map(!!sym(f), glance))
}
lst.3.a <- 1:4 %>%
  map(fun.3.a) %>%
  flatten()
tbl.3.a <- tbl.3 %>%
  nest(-att) %>%
  mutate(!!!lst.3.a)
tbl.3.a %>%
  select_at(vars(matches("fit2"))) %>%
    map(~ .[[1]] %>% tidy()) %>%
  unname() %>%
  pander()| term | estimate | std.error | statistic | p.value | 
|---|---|---|---|---|
| (Intercept) | 4.963 | 0.1125 | 44.1 | 4.323e-99 | 
| poly(dis, 2, raw = TRUE)1 | -0.3653 | 0.1732 | -2.109 | 0.03634 | 
| poly(dis, 2, raw = TRUE)2 | -0.2429 | 0.09002 | -2.698 | 0.007633 | 
Third, I tried to compare model fit in order to determine the highest-order model with significant improvement on its next-lowest-order model.
fun.3.b <- function(att, ord, tbl) {
  results <- tbl %>%
    filter(att == !!att) %>%
    select_at(vars(matches(paste("fit", seq(ord), sep = '', collapse = '|')))) %>%
    map(~ .[[1]]) %>%
    unname() %>%
    do.call(what = anova)
}
lst.3.b <- vec.2 %>%
  map(~ map(.x = 1:4, .f = fun.3.b, att = ., tbl = tbl.3.a)) %>%
  set_names(vec.2)But, as you can see, I did it wrong. I accidentally compared all 4 models for each attribute to each other all at once.
Fourth, I tried again.
fun.3.b.2 <- function(att, ord, tbl) {
  stopifnot(att %in% vec.2)
  stopifnot(ord >= 1 && ord <= 4)
  results <- tbl %>%
    filter(att == !!att) %>%
    select_at(vars(matches(paste("fit", max(ord-1, 1):ord,
                                 sep = '', collapse = '|')))) %>%
    map(~ .[[1]]) %>%
    unname() %>%
    do.call(what = anova)
  results
}
fun.3.c <- function(att, tbl) {
  helper <- function() {
    assign('r', fun.3.b.2(att, i, tbl), parent.frame())
    assign('p', as.vector(na.omit(r$`Pr(>F)`)), parent.frame())
  }
  i <- 1
  helper()
  p <- 0
  prev <- r
  while (i <= 4 && p < .05) {
    i = i + 1
    prev <- r
    helper()
  }
  if (p < .05) {
    r
  } else {
    prev
  }
}
lst.3.c <- vec.2 %>%
  map(fun.3.c, tbl = tbl.3.a)
chr.3 <- lst.3.c %>%
  map2_chr(vec.2, ~ glue("Attr: {.y} || Order: {184 - .x[names(.x)[match(ifelse('Res.Df' %in% names(.x), 'Res.Df', 'Df'), names(.x))]][[1]][[2]]}"))
lst.3.c %>%
  set_names(chr.3) %>%
  pander()Attr: com || Order: 2:
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 183 | 321 | NA | NA | NA | NA | 
| 182 | 308.6 | 1 | 12.34 | 7.279 | 0.007633 | 
Attr: pay || Order: 2:
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 183 | 326.3 | NA | NA | NA | NA | 
| 182 | 318.5 | 1 | 7.735 | 4.42 | 0.0369 | 
Attr: soc || Order: 2:
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 183 | 263.2 | NA | NA | NA | NA | 
| 182 | 214.3 | 1 | 48.88 | 41.52 | 1.015e-09 | 
Attr: sup || Order: 2:
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 183 | 235.3 | NA | NA | NA | NA | 
| 182 | 227.1 | 1 | 8.195 | 6.567 | 0.0112 | 
Attr: tvl || Order: 2:
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 183 | 336.6 | NA | NA | NA | NA | 
| 182 | 253.9 | 1 | 82.73 | 59.31 | 8.331e-13 | 
Attr: var || Order: 1:
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| poly(dis, 1, raw = TRUE) | 1 | 31.42 | 31.42 | 22.9 | 3.505e-06 | 
| Residuals | 183 | 251.1 | 1.372 | NA | NA | 
Excellent, it worked this time.
To find support for the hypothesis that satisfaction is greatest when discrepancy is 0, we would need to see that a quadratic model best fit the data. According to the results above, that is the case for commute time, pay, span of control, closeness of supervision, and travel.
However, for variety, a quadratic model failed to improve on the fit of the linear model.
Finally, let’s look at some plots. First, quadratic plots for all six attributes.
map(vec.2, ~ ggplot(filter(tbl.3, att == .), aes(x = dis, y = sat)) +
      geom_point() +
      geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE)) +
      xlim(-3, 3) +
      ggtitle(glue("Quadratic Plot for {.}"))) %>%
  set_names(vec.2)Frankly, it’s difficult to see why these quadratic models performed signficantly better than the linear models. In at least some cases (e.g., span of control, travel), it looks like it may be due to a small number of high-leverage outliers. But perhaps the density of points matter, since these plots show multiple overlapping points.
Span of control and travel appear to best support the hypothesis. They are both concave with vertices approximately at zero. More and less span of control or travel are worse than just the adequate amount.
The others, while clearly parabolic, do not have both of these features. Commute time and closeness of supervision veer down and right, suggesting that more is worse, but less is not.
Pay is convex and veering up to the right, but it has so little variation that we can’t even test our hypothesis. All we can say is more pay is better and hardly anyone is satisfied with what they have.
Also, the plots for pay and variety are so similar, that it’s a wonder the quadratic model for pay represented a signficant improvement over the linear model. The curvature in the pay plot is very, very slight. If we go back to the ANOVA tables above, we can see that pay did in fact have the largest \(p\)-value of the quadratic models (\(F \approx 59.31\), \(p \approx .04\)).
For pay and variety, let’s overlay the linear and quadratic plots and see how they compare.
map(c('pay','var'), ~ ggplot(filter(tbl.3, att == .), aes(x = dis, y = sat)) +
      geom_point() +
      geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE)) +
      geom_smooth(method = lm) +
      xlim(-3, 3) +
      ggtitle(glue("Overlay Plot for {.}"))) %>%
  set_names(c('pay','var'))## `geom_smooth()` using formula 'y ~ x'## `geom_smooth()` using formula 'y ~ x'There is some slight difference for pay. With enough power, it’s possible to understand why the quadratic model would outperform the linear model.
The two curves are sitting almost precisely upon each other for variety.
There is not a clear story to tell here. The first analysis with constrained piecewise functions mostly supported our hypothesis that more and less of any attribute are worse than adequate, with perhaps the exceptions of pay and closeness of supervision.
But loosening the constraints, either by allowing for a discontinuity or by shifting to polynomial regression, showed that only span of control and closeness of supervision fit the hypothesized pattern.
Neter, J., Wasserman, W., & Kutner, M. H. (1989). Applied linear regression models (2nd ed.). Irwin.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Lawrence Erlbaum Associates.