Set up working environment

As usual, I load the R packages I will be using first.

# Tidyverse
library(readxl)
library(tidyverse)

# R Markdown
library(knitr)

# Project
library(here)

# Statistics
library(psych)

Next, I will read in the data. I will also glance at the data to see what I’m working with.

tbl.2 <- read_excel(here('data', 'DIF.xlsx'))

tbl.2[1:10, ] %>%
  kable

ALTO	ALTI	PAYO	PAYI	JOBSAT	IDENT
-0.3333333	2.0000000	1.0000000	0.6666667	3.666667	4.166667
1.0000000	1.0000000	1.6666667	2.0000000	3.333333	2.166667
-1.3333333	0.3333333	-0.6666667	1.3333333	3.000000	1.833333
2.0000000	2.0000000	1.0000000	1.6666667	3.666667	2.166667
0.6666667	1.0000000	1.0000000	1.0000000	4.000000	3.666667
1.3333333	2.0000000	0.3333333	2.0000000	4.000000	4.333333
0.0000000	2.0000000	0.0000000	2.0000000	3.000000	3.666667
-1.3333333	-0.6666667	0.3333333	2.0000000	3.666667	2.333333
-0.3333333	1.6666667	-0.3333333	2.0000000	5.000000	3.833333
0.3333333	1.3333333	0.0000000	1.6666667	3.000000	3.000000

describe(tbl.2)[c('n', 'mean', 'sd',
                  'min', 'max')] %>%
  kable

	n	mean	sd	min	max
ALTO	970	0.5020619	0.9185661	-2.0000000	2
ALTI	970	1.0805842	0.6897499	-2.0000000	2
PAYO	969	0.4791882	0.8725597	-2.0000000	2
PAYI	969	1.2707258	0.6024815	-0.6666667	2
JOBSAT	967	3.6856256	0.9173250	1.0000000	5
IDENT	965	3.5260794	0.8160464	1.0000000	5

It looks like there may be a small number of missing data. While ALTO and ALTI have data for 970 observations each, the other four variables have fewer than 970 observations. Although I am sure there is an appropriate way to handle these missing data (Newman, 2014), I do not have enough information about how the data were collected to be sure of what to do. So let’s take a quick look at the observations with NA values and see if any corrections are needed.

tbl.2 %>%
  filter_all(any_vars(is.na(.))) %>%
  kable

ALTO	ALTI	PAYO	PAYI	JOBSAT	IDENT
0.6666667	0.6666667	0.6666667	2.000000	4.666667	NA
1.0000000	2.0000000	0.6666667	1.000000	NA	NA
-1.0000000	0.0000000	-1.0000000	1.000000	2.666667	NA
-1.5000000	0.0000000	NA	2.000000	2.666667	1.400000
0.0000000	-0.5000000	0.5000000	NA	4.000000	3.333333
0.0000000	2.0000000	-0.3333333	2.000000	5.000000	NA
0.6666667	1.0000000	0.0000000	1.333333	NA	NA
1.3333333	1.3333333	1.0000000	1.000000	NA	4.166667

It looks like the main problem here is that if I take the difference of PAYO and PAYI, I will lose two data points. It is also a problem that two observations have no data on either outcome variable, essentially causing a loss of eight data points for independent variables. R will drop those observations when computing correlations and regressions, and I don’t think there is anything to be done about that, so I will simply move on.

Returning to the table of descriptive statistics above, the mean organizational scores on both altruism and pay are lower than their respective mean individual scores. This suggests that individuals tended to report valuing altruism and pay more than their organizations. (The standard deviations for individual scores were lower than for organizations, but I’m not sure what the substantive interpretation of that is.)

Interestingly, nobody reported valuing pay less than -0.67. That fact makes me wonder if perhaps respondents were more worried about future negotiations with employers than about presenting socially desirable responses to researchers.

a. Construct difference scores and compute correlations

My understanding of algebraic and squared difference scores is that, for two variables, \(X\) and \(Y\), their algebraic difference is \(X - Y\) and their squared difference is \((X-Y)^2\) (Edwards, 2002).

tbl.2.a <- tbl.2 %>%
  mutate(
    alt.df = ALTO - ALTI,
    alt.sq = alt.df ^ 2,
    pay.df = PAYO - PAYI,
    pay.sq = pay.df ^2
  )

tbl.2.a[1:10, ] %>%
  kable

ALTO	ALTI	PAYO	PAYI	JOBSAT	IDENT	alt.df	alt.sq	pay.df	pay.sq
-0.3333333	2.0000000	1.0000000	0.6666667	3.666667	4.166667	-2.3333333	5.4444444	0.3333333	0.1111111
1.0000000	1.0000000	1.6666667	2.0000000	3.333333	2.166667	0.0000000	0.0000000	-0.3333333	0.1111111
-1.3333333	0.3333333	-0.6666667	1.3333333	3.000000	1.833333	-1.6666667	2.7777778	-2.0000000	4.0000000
2.0000000	2.0000000	1.0000000	1.6666667	3.666667	2.166667	0.0000000	0.0000000	-0.6666667	0.4444444
0.6666667	1.0000000	1.0000000	1.0000000	4.000000	3.666667	-0.3333333	0.1111111	0.0000000	0.0000000
1.3333333	2.0000000	0.3333333	2.0000000	4.000000	4.333333	-0.6666667	0.4444444	-1.6666667	2.7777778
0.0000000	2.0000000	0.0000000	2.0000000	3.000000	3.666667	-2.0000000	4.0000000	-2.0000000	4.0000000
-1.3333333	-0.6666667	0.3333333	2.0000000	3.666667	2.333333	-0.6666667	0.4444444	-1.6666667	2.7777778
-0.3333333	1.6666667	-0.3333333	2.0000000	5.000000	3.833333	-2.0000000	4.0000000	-2.3333333	5.4444444
0.3333333	1.3333333	0.0000000	1.6666667	3.000000	3.000000	-1.0000000	1.0000000	-1.6666667	2.7777778

As I would expect, the squared differences are all positive (or zero), and are larger or smaller in magnitude than the algebraic differences according to whether the magnitudes of the algebraic differences are greater or less than one.

Now, I will compute the correlations.

vec.2.a.1 <- tbl.2.a[7:10] %>%
  map_dbl(~ cor(., tbl.2.a$JOBSAT, use = "pairwise.complete.obs"))

vec.2.a.2 <- tbl.2.a[7:10] %>%
  map_dbl(~ cor(., tbl.2.a$IDENT, use = "pairwise.complete.obs"))

tibble(var = names(vec.2.a.1), job = vec.2.a.1, ide = vec.2.a.2) %>%
  kable

var	job	ide
alt.df	0.2921289	0.1359600
alt.sq	-0.3870969	-0.3013596
pay.df	0.1393014	0.1258576
pay.sq	-0.2877371	-0.2360829

Overall, the correlations are fairly weak.

References

Edwards, J. R. (2002). Alternatives to Difference Scores. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing behavior in organizations: Advances in measurement and data analysis. Jossey-Bass.

Newman, D. A. (2014). Missing Data: Five Practical Guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590

HW #5: Difference Scores and Profile Similarity Indices

Daniel Lewis

2/12/2020

Set up working environment

a. Construct difference scores and compute correlations

References