Set up working environment

Load packages

# Tidyverse #
library(readxl)
library(rlang)
library(broom)
library(corrr)
library(glue)
library(magrittr)
library(tidyverse)

# Markdown #
library(rmarkdown)
library(tinytex)
library(knitr)
library(printr)
library(kableExtra)
library(citr)
library(pander)

# Other #
library(psych)
library(here)

1. Piecewise vs. Polynomial Regression

According to Neter et al. (1989), piecewise linear regression is useful when there is a linear relationship between the independent and dependent variables, but that linear relationship varies depending on the subset of the range of the independent variable. So, for example, somebody who is lonely will seek out other people to be with, but someone above a certain critical point of loneliness will stay isolated instead. The effect of loneliness changes after a certain point.

The advantage of piecewise linear regression is that it can account for a change in a theoretical effect. It is also relatively simple to interpret, because it is still essentially a linear relationship: a unit change in \(X\) creates a \(\beta_1\) change in \(Y\). A disadvantage is that few effects in nature should have dramatic changes in their effects at a single point.

Polynomial regression makes the opposite tradeoffs. It is more difficult to interpret a quadratic (let alone a higher-order) term than a linear term, but it is more theoretically justifiable because it is continuously changing and has a more gradual change in slope at its inflection.

According to Pedhazur & Schmelkin (1991, p. 452), theoretical considerations come first. Does the theory suggest a piecewise effect? Or does it suggest a polynomial model would better represent the true effect?

2. Piecewise Regression

Setup

First, I imported the data and made the variable names lowercase.

tbl.2 <- read_xlsx(here('data', 'nonlin.xlsx'))
colnames(tbl.2) <- tolower(colnames(tbl.2))

Then I looked at the raw data and descriptive statistics.

ks <- function(x, c) {
  kable(x, caption = c, escape = F, digits = 3) %>%
  kable_styling(full_width = F, position = 'left',
                bootstrap_options = c("striped", "hover", "condensed"))
  }

ks(tbl.2[1:10,], 'Job Satisfaction Data (first 10 observations)')
Job Satisfaction Data (first 10 observations)
comd coms payd pays socd socs supd sups tvld tvls vard vars
0.000 6.000 -1.333 3.333 0.000 5.000 0.333 3.000 2.000 5.333 -1.000 6.000
0.000 6.333 -1.667 6.333 0.000 6.667 1.333 4.333 0.000 6.667 1.000 4.667
0.000 4.000 0.000 5.667 -0.333 4.000 1.000 3.333 0.000 4.000 -1.000 3.000
0.000 7.000 -1.000 6.000 0.000 4.667 0.000 5.000 0.000 6.000 -1.000 5.000
0.333 5.000 0.000 7.000 1.000 6.000 1.000 6.000 1.000 6.000 0.000 7.000
1.000 3.000 -2.000 5.000 -1.000 5.000 -0.333 5.000 1.000 3.667 -1.667 5.000
2.000 2.000 -2.000 2.667 -0.667 4.000 0.000 5.333 -2.000 3.667 -0.333 5.000
0.333 3.000 -0.333 7.000 0.000 7.000 0.000 6.667 0.000 7.000 0.000 7.000
1.000 3.000 -1.000 7.000 0.000 6.333 0.000 6.000 2.000 3.000 -1.333 5.000
2.000 3.000 -0.667 3.333 0.000 4.000 -0.333 4.000 1.333 3.000 0.000 4.000
describe(tbl.2) %>%
  ks('Descriptive Statistics')
Descriptive Statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
comd 1 185 0.492 0.883 0.000 0.434 0.494 -2.000 3.000 5.000 0.579 0.190 0.065
coms 2 185 4.536 1.469 4.333 4.538 1.977 1.000 7.000 6.000 0.011 -0.853 0.108
payd 3 185 -1.314 0.932 -1.333 -1.280 0.988 -3.000 0.333 3.333 -0.070 -1.061 0.069
pays 4 185 4.948 1.514 5.000 5.051 1.483 1.000 7.000 6.000 -0.545 -0.439 0.111
socd 5 185 -0.189 0.772 0.000 -0.228 0.000 -2.333 2.667 5.000 0.468 1.872 0.057
socs 6 185 4.637 1.196 4.333 4.628 0.988 1.000 7.000 6.000 0.012 0.015 0.088
supd 7 185 0.213 0.869 0.000 0.204 0.741 -2.333 3.000 5.333 0.176 0.904 0.064
sups 8 185 4.546 1.218 4.333 4.537 1.483 1.000 7.000 6.000 -0.027 -0.396 0.090
tvld 9 185 0.044 1.142 0.000 0.030 1.483 -3.000 3.000 6.000 0.084 -0.184 0.084
tvls 10 185 4.626 1.369 4.667 4.643 1.483 1.000 7.000 6.000 -0.193 -0.406 0.101
vard 11 185 -0.721 0.859 -0.667 -0.660 0.988 -3.000 1.667 4.667 -0.435 0.264 0.063
vars 12 185 5.257 1.239 5.333 5.334 0.988 1.333 7.000 5.667 -0.445 -0.256 0.091

It looks like we have 12 variables, or six pairs of discrepancy and satisfaction variables. The discrepancy variables range from about -3 to about 3 and the satisfaction variables range from about 1 to 7, so everything matches the description in the assignment.

Next, I created dummy-coded variables for \(X_2\), such that the variable would be 0 when the discrepancy variable was less than 0 and 1 otherwise. I also multiplied \(X_2\) by the discrepancy variable, \(X_1\), to create the product term in Equation 10.24 (Neter et al., 1989, p. 370).

vec.2 <- tbl.2 %>%
  colnames() %>%
  str_sub(1, 3) %>%
  unique()

vec.2.d <- vec.2 %>%
  paste('d', sep = '')

tbl.2.a <- tbl.2 %>%
  mutate_at(vec.2.d, list(f = ~ ifelse(. >= 0, 1, 0), p = ~ (. >= 0) * .)) %>%
  rename_all(~ str_remove(., 'd_'))

Constrained Piecewise Regression

Once I had the dummy-coded variables and product terms ready, I created six piecewise functions and estimated their parameters.

# in:  att = 3-character name of attribute
# out: piecewise linear function
fun.2 <- function(att, data) {
  f <- paste(att, 's ~ ', att, 'd + ', att, 'p', sep = '') %>%
    as.formula()
  f %>%
    lm(data)
}

tbl.2.b <- vec.2 %>%
  map_df(~ fun.2(., tbl.2.a) %>% tidy())

tbl.2.b %>%
  ks('Regression Output for Piecewise Functions')
Regression Output for Piecewise Functions
term estimate std.error statistic p.value
(Intercept) 5.110 0.122 42.014 0.000
comd 0.513 0.343 1.494 0.137
comp -1.465 0.385 -3.808 0.000
(Intercept) 5.967 0.172 34.625 0.000
payd 0.775 0.107 7.254 0.000
payp -0.675 4.062 -0.166 0.868
(Intercept) 5.034 0.103 48.741 0.000
socd 0.723 0.155 4.667 0.000
socp -1.678 0.262 -6.394 0.000
(Intercept) 4.906 0.112 43.733 0.000
supd 0.036 0.192 0.185 0.853
supp -0.889 0.267 -3.327 0.001
(Intercept) 5.373 0.124 43.353 0.000
tvld 0.762 0.137 5.558 0.000
tvlp -1.793 0.218 -8.225 0.000
(Intercept) 5.767 0.121 47.699 0.000
vard 0.628 0.108 5.813 0.000
varp -1.400 0.433 -3.232 0.001

The (Intercept), d, and p terms correspond to \(\beta_0\), \(\beta_1\), and \(\beta_2\), respectively. So, for example, the piecewise model for commute time has a \(\hat{\beta_1}\) of 0.513.

Are the functions symmetric about 0? That is equivalent to the hypothesis \[\beta_2 = -2\beta_1\], because \(\beta_2\) is the additional slope added to \(\beta_1\) when \(X_1 > 0\). When it is \(-2\beta_1\), the slopes for the two pieces of the function will be \(\beta_1\) and \(\beta_1 + \beta_2 = \beta_1 - 2\beta_1 = -\beta_1\).

To test this hypothesis, I computed \(\beta_2 - 2\beta_1\) for each model.

tbl.2.c <- tbl.2.b %>%
  mutate(att = vec.2 %>% rep(each = 3))

tbl.2.c %>%
  filter(term != '(Intercept)') %>%
  group_by(att) %>%
  summarise(diff = (function(x) x[2] - 2 * x[1])(estimate)) %>%
  ks('Tests of Symmetry Hypothesis')
## `summarise()` ungrouping output (override with `.groups` argument)
Tests of Symmetry Hypothesis
att diff
com -2.491
pay -2.225
soc -3.124
sup -0.960
tvl -3.318
var -2.656

The results show that only closeness of supervision might be symmetrical. Without doing an inference test, I don’t whether it is significantly different from 0, but I feel confident that all the others are very different from 0.

To visualize the results, I plotted each model.

fun.2.b <- function(att, data) {
  str.1 <- paste(att, 'd', sep = '')
  str.2 <- paste(att, 's', sep = '')
  str.3 <- paste(att, 'p', sep = '')
  dbl.1 <- data[[str.1]]
  dbl.2 <- c(min(dbl.1), 0, max(dbl.1))
  tbl.1 <- tibble(!!str.1 := dbl.2,
                  !!str.3 := c(0, 0, max(dbl.1)))
  fit.1 <- fun.2(att, data)
  dbl.3 <- predict(fit.1, tbl.1)
  
  tbl.2.a %>%
    ggplot(aes_string(x = str.1, y = str.2)) +
    geom_segment(aes(x = dbl.2[1], y = dbl.3[1],
                     xend = dbl.2[2], yend = dbl.3[2])) +
    geom_segment(aes(x = dbl.2[2], y = dbl.3[2],
                     xend = dbl.2[3], yend = dbl.3[3])) +
    geom_point(size = 1) +
    labs(x = str.1, y = str.2, title = paste("Piecewise Plot of", att)) +
    xlim(-3, 3)
}

vec.2 %>%
  map(~ fun.2.b(., tbl.2.a))

Interesting. The model for closeness of supervision (sup) looks like the least symmetrical attribute, perhaps only after pay, which has hilariously little data above payd = 0. The plot of travel looks the most symmetrical. I’m not sure what I did wrong.

Discontinuous Piecewise Regression

I reran the same analysis, this time giving parameters to both \(X_1 \times X_2\) and \(X_2\), as described in Equation 10.27 (Neter et al., 1989, p. 373).

# in:  att = 3-character name of attribute
# out: discontinuous piecewise linear function
fun.2.c <- function(att, data) {
  paste(att, 's ~ ', att, 'd + ', att, 'p +', att, 'f', sep = '') %>%
    as.formula() %>%
    lm(data)
}

tbl.2.f <- vec.2 %>%
  map_df(~ fun.2.c(., tbl.2.a) %>% tidy())

tbl.2.f %>%
  ks('Regression Output for Discontinuous Piecewise Functions')
Regression Output for Discontinuous Piecewise Functions
term estimate std.error statistic p.value
(Intercept) 4.029 0.605 6.659 0.000
comd -0.428 0.618 -0.692 0.490
comp -0.552 0.630 -0.877 0.382
comf 1.126 0.617 1.824 0.070
(Intercept) 5.683 0.255 22.279 0.000
payd 0.631 0.143 4.403 0.000
payp -1.237 4.065 -0.304 0.761
payf 0.519 0.345 1.504 0.134
(Intercept) 5.084 0.315 16.147 0.000
socd 0.766 0.300 2.553 0.012
socp -1.717 0.351 -4.888 0.000
socf -0.056 0.333 -0.169 0.866
(Intercept) 4.741 0.342 13.865 0.000
supd -0.104 0.335 -0.312 0.756
supp -0.764 0.363 -2.104 0.037
supf 0.185 0.362 0.511 0.610
(Intercept) 4.872 0.356 13.694 0.000
tvld 0.434 0.258 1.683 0.094
tvlp -1.507 0.289 -5.217 0.000
tvlf 0.569 0.379 1.501 0.135
(Intercept) 5.015 0.215 23.338 0.000
vard 0.153 0.154 0.989 0.324
varp -1.194 0.418 -2.856 0.005
varf 1.061 0.255 4.160 0.000

In this output, the terms that end in f correspond to \(\beta_3\) in Equation 10.27. And when \(\beta_3\) is not 0, there is a discontinuity where the discrepancy variable is 0. In the results table above, the only case where a \(\beta_3\) appears to be approximately 0 is socf, corresponding to span of control. For all the others, there is a jump at 0. And because they are all positive, I believe that means that the there’s a jump up from less-than-adequate to adequate.

This analysis tells us that, with the exception of span of control, all the attributes under consideration are discontinuous. And the fact that the discontinuity is one-sided suggests that participants were less dissatisfied with more-than-adequate attributes than with less-than-adequate attributes.

Finally, before moving on to polynomial regression, let’s take a look at the discontinuous model plots.

fun.2.b <- function(att, results, data) {
  res <- filter(results, att == att)[["estimate"]]
  str.1 <- paste(att, 'd', sep = '')
  str.2 <- paste(att, 's', sep = '')
  str.3 <- paste(att, 'p', sep = '')
  str.4 <- paste(att, 'f', sep = '')
  dbl.1 <- data[[str.1]]
  dbl.2 <- c(min(dbl.1), 0, max(dbl.1))
  tbl.1 <- tibble(!!str.1 := dbl.2,
                  !!str.3 := c(0, 0, max(dbl.1)),
                  !!str.4 := c(0, 1, 1))
  fit.1 <- fun.2.c(att, data)
  dbl.3 <- predict(fit.1, tbl.1)
  
  tbl.2.a %>%
    ggplot(aes_string(x = str.1, y = str.2)) +
    # Left piece
    geom_segment(aes(x = dbl.2[1], y = dbl.3[1],
                     xend = 0, yend = dbl.3[2] - res[4])) +
    # Jump
    geom_segment(aes(x = 0, y = dbl.3[2] - res[4],
                     xend = 0, yend = dbl.3[2])) +
    # Right piece
    geom_segment(aes(x = 0, y = dbl.3[2],
                     xend = dbl.2[3], yend = dbl.3[3])) +
    geom_point(size = 1) +
    labs(x = str.1, y = str.2, title = paste("Discontinuity Plot of", att))
}

vec.2 %>%
  map(~ fun.2.b(., tbl.2.c, tbl.2.a))

Across the board, they all have large jumps. That is a sign that the constrained piecewise models were definitely inappropriate.

3. Polynomial Regression

Hierarchical Polynomial Regression

I followed the procedure in (Pedhazur & Schmelkin, 1991) for hierarchical polynomial regression. I tested linear, quadratic, cubic, and quartic models and compared their improvements in fit using the \(R^2\) difference test in Equation 18.15 (Pedhazur & Schmelkin, 1991, p. 432).

First, I had to do some data management. I collapsed all the satisfaction variables into one variable called ‘sat’ and all the discrepancy variables into one variable called ‘dis’, with a grouping variable for the attributes called ‘att’.

tbl.3 <- tbl.2 %>%
  select_at(vars(ends_with('s'))) %>%
  stack() %>%
  as_tibble() %>%
  rename(sat = values, att = ind) %>%
  mutate(att = str_sub(att, 1, -2)) %>%
  add_column(dis = tbl.2 %>%
               select_at(vars(ends_with('d'))) %>%
               stack() %>% as_tibble() %>%
               pull("values")) %>%
  select(-att, everything())

tbl.3[1:10,] %>%
  ks('Rearranged Data (first 10 obs)')
Rearranged Data (first 10 obs)
sat dis att
6.000 0.000 com
6.333 0.000 com
4.000 0.000 com
7.000 0.000 com
5.000 0.333 com
3.000 1.000 com
2.000 2.000 com
3.000 0.333 com
3.000 1.000 com
3.000 2.000 com

Second, I regressed sat on dis for each attribute. Note that I used raw polynomials rather than orthogonal polynomials. Instead of showing the output for all 24 (6 attributes &$ 4 orders) polynomial models, I printed just the quadratic model for ‘com’ as an example.

fun.3.a <- function(ord) {
  f <- glue('fit{ord}')
  t <- glue('tdy{ord}')
  m <- glue('mod{ord}')
  exprs(!!{{f}} := map(data, ~ lm(sat ~ poly(dis, !!ord, raw = TRUE), .)),
        !!{{t}} := map(!!sym(f), tidy),
        !!{{m}} := map(!!sym(f), glance))
}

lst.3.a <- 1:4 %>%
  map(fun.3.a) %>%
  flatten()

tbl.3.a <- tbl.3 %>%
  nest(-att) %>%
  mutate(!!!lst.3.a)

tbl.3.a %>%
  select_at(vars(matches("fit2"))) %>%
    map(~ .[[1]] %>% tidy()) %>%
  unname() %>%
  pander()
  • term estimate std.error statistic p.value
    (Intercept) 4.963 0.1125 44.1 4.323e-99
    poly(dis, 2, raw = TRUE)1 -0.3653 0.1732 -2.109 0.03634
    poly(dis, 2, raw = TRUE)2 -0.2429 0.09002 -2.698 0.007633

Third, I tried to compare model fit in order to determine the highest-order model with significant improvement on its next-lowest-order model.

fun.3.b <- function(att, ord, tbl) {
  results <- tbl %>%
    filter(att == !!att) %>%
    select_at(vars(matches(paste("fit", seq(ord), sep = '', collapse = '|')))) %>%
    map(~ .[[1]]) %>%
    unname() %>%
    do.call(what = anova)
}

lst.3.b <- vec.2 %>%
  map(~ map(.x = 1:4, .f = fun.3.b, att = ., tbl = tbl.3.a)) %>%
  set_names(vec.2)

But, as you can see, I did it wrong. I accidentally compared all 4 models for each attribute to each other all at once.

Fourth, I tried again.

fun.3.b.2 <- function(att, ord, tbl) {
  stopifnot(att %in% vec.2)
  stopifnot(ord >= 1 && ord <= 4)
  results <- tbl %>%
    filter(att == !!att) %>%
    select_at(vars(matches(paste("fit", max(ord-1, 1):ord,
                                 sep = '', collapse = '|')))) %>%
    map(~ .[[1]]) %>%
    unname() %>%
    do.call(what = anova)
  results
}

fun.3.c <- function(att, tbl) {
  helper <- function() {
    assign('r', fun.3.b.2(att, i, tbl), parent.frame())
    assign('p', as.vector(na.omit(r$`Pr(>F)`)), parent.frame())
  }
  i <- 1
  helper()
  p <- 0
  prev <- r
  while (i <= 4 && p < .05) {
    i = i + 1
    prev <- r
    helper()
  }
  if (p < .05) {
    r
  } else {
    prev
  }
}

lst.3.c <- vec.2 %>%
  map(fun.3.c, tbl = tbl.3.a)

chr.3 <- lst.3.c %>%
  map2_chr(vec.2, ~ glue("Attr: {.y} || Order: {184 - .x[names(.x)[match(ifelse('Res.Df' %in% names(.x), 'Res.Df', 'Df'), names(.x))]][[1]][[2]]}"))

lst.3.c %>%
  set_names(chr.3) %>%
  pander()
  • Attr: com || Order: 2:

    Analysis of Variance Table
    Res.Df RSS Df Sum of Sq F Pr(>F)
    183 321 NA NA NA NA
    182 308.6 1 12.34 7.279 0.007633
    • Attr: pay || Order: 2:

      Analysis of Variance Table
      Res.Df RSS Df Sum of Sq F Pr(>F)
      183 326.3 NA NA NA NA
      182 318.5 1 7.735 4.42 0.0369
    • Attr: soc || Order: 2:

      Analysis of Variance Table
      Res.Df RSS Df Sum of Sq F Pr(>F)
      183 263.2 NA NA NA NA
      182 214.3 1 48.88 41.52 1.015e-09
    • Attr: sup || Order: 2:

      Analysis of Variance Table
      Res.Df RSS Df Sum of Sq F Pr(>F)
      183 235.3 NA NA NA NA
      182 227.1 1 8.195 6.567 0.0112
    • Attr: tvl || Order: 2:

      Analysis of Variance Table
      Res.Df RSS Df Sum of Sq F Pr(>F)
      183 336.6 NA NA NA NA
      182 253.9 1 82.73 59.31 8.331e-13
    • Attr: var || Order: 1:

      Analysis of Variance Table
        Df Sum Sq Mean Sq F value Pr(>F)
      poly(dis, 1, raw = TRUE) 1 31.42 31.42 22.9 3.505e-06
      Residuals 183 251.1 1.372 NA NA

Excellent, it worked this time.

To find support for the hypothesis that satisfaction is greatest when discrepancy is 0, we would need to see that a quadratic model best fit the data. According to the results above, that is the case for commute time, pay, span of control, closeness of supervision, and travel.

However, for variety, a quadratic model failed to improve on the fit of the linear model.

Plotting Polynomial Models

Finally, let’s look at some plots. First, quadratic plots for all six attributes.

map(vec.2, ~ ggplot(filter(tbl.3, att == .), aes(x = dis, y = sat)) +
      geom_point() +
      geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE)) +
      xlim(-3, 3) +
      ggtitle(glue("Quadratic Plot for {.}"))) %>%
  set_names(vec.2)

Frankly, it’s difficult to see why these quadratic models performed signficantly better than the linear models. In at least some cases (e.g., span of control, travel), it looks like it may be due to a small number of high-leverage outliers. But perhaps the density of points matter, since these plots show multiple overlapping points.

Span of control and travel appear to best support the hypothesis. They are both concave with vertices approximately at zero. More and less span of control or travel are worse than just the adequate amount.

The others, while clearly parabolic, do not have both of these features. Commute time and closeness of supervision veer down and right, suggesting that more is worse, but less is not.

Pay is convex and veering up to the right, but it has so little variation that we can’t even test our hypothesis. All we can say is more pay is better and hardly anyone is satisfied with what they have.

Also, the plots for pay and variety are so similar, that it’s a wonder the quadratic model for pay represented a signficant improvement over the linear model. The curvature in the pay plot is very, very slight. If we go back to the ANOVA tables above, we can see that pay did in fact have the largest \(p\)-value of the quadratic models (\(F \approx 59.31\), \(p \approx .04\)).

For pay and variety, let’s overlay the linear and quadratic plots and see how they compare.

map(c('pay','var'), ~ ggplot(filter(tbl.3, att == .), aes(x = dis, y = sat)) +
      geom_point() +
      geom_smooth(method = lm, formula = y ~ poly(x, 2, raw = TRUE)) +
      geom_smooth(method = lm) +
      xlim(-3, 3) +
      ggtitle(glue("Overlay Plot for {.}"))) %>%
  set_names(c('pay','var'))
## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

There is some slight difference for pay. With enough power, it’s possible to understand why the quadratic model would outperform the linear model.

The two curves are sitting almost precisely upon each other for variety.

4. Conclusion

There is not a clear story to tell here. The first analysis with constrained piecewise functions mostly supported our hypothesis that more and less of any attribute are worse than adequate, with perhaps the exceptions of pay and closeness of supervision.

But loosening the constraints, either by allowing for a discontinuity or by shifting to polynomial regression, showed that only span of control and closeness of supervision fit the hypothesized pattern.

References

Neter, J., Wasserman, W., & Kutner, M. H. (1989). Applied linear regression models (2nd ed.). Irwin.

Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Lawrence Erlbaum Associates.