#Load necessary libraries
library(tidyverse)
library(epiextractr)
library(epidatatools)
library(labelled)
library(realtalk)
Calculate union wage premiums by state and year
Use this code to calculate a union wage premium (the difference between median union and nonunion median wages) by state using the EPI CPS microdata extracts.
Some utility notes:
- Unlike the regression-based national union wage premium on the State of Working America Data Library, this simple code does not control for confounding variables such as education, job type, or demographic. That means while it may accurately reflect that union jobs often carry with them a wage premium, it does not take into account the fact that low-wage jobs, for instance, tend to be non-unionized. The actual wage premium across comparable positions may be different. Exercise caution when making specific claims with these data.
- Sample sizes vary widely over time, state, and union vs nonunion groups. Keep a close eye on this, and remember that some states may not have large enough sample sizes to produce strong data. Though there is no hard and fast rule for how large a sample needs to be, smaller samples will produce more noisy, less reliable data. It may not be possible to produce usable union wage premium data for your state.
Please reach out to ecohn@epi.org with any questions. Now let’s get coding!
The following chunk of code loads the R libraries necessary for this exercise. You may need to install them to run this code.
Import and clean data
Note: Don’t forget to update years to match your setup before running the script.
Running this script chunk will call the BLS Current Population Survey ORG data required to calculate union wage premiums.
# Import CPS ORG data
# Note: load as many years necessary to get sufficient sample sizes or desired time series.
<- load_org(2020:2024, "year", "age", "statefips", "wage", "union", "orgwgt", "a_earnhour", "cow1") %>%
cps_org # Age and labor force restrictions (exclude self-employed and self-incorporated), non-imputed wages.
filter(age >= 16, cow1 <= 5, a_earnhour != 1, !is.na(wage))
Method 1: Point-in-time comparisons
This method produces union wage premiums for all fifty states, pooling five years of data to get sufficient sample sizes.
Note: some of the sample sizes are still quite small, even with five years of data. E.g., South Carolina’s union-represented sample. Consider expanding the number of years pooled, but keep in mind that this will also alter what you can say about the results.
Create wage data
This code chunk uses EPI methodology to correct for wage clumping by created a weighted average of wages around the median. The result is one median wage per state.
# Note: divide orgwgt by as many months are in your pool.
<- cps_org |>
wage_single mutate(union = to_factor(union)) |>
summarise(
wage_median = averaged_median(
x = wage,
w = orgwgt/60,
quantiles_n = 9L,
quantiles_w = c(1:4, 5, 4:1)),
n=n(),
.by=c(union, statefips))
Method 2: Time series
This method produces inflation-adjusted union wage premiums for one state over time.
Note: Because you can’t combine years to pool data for this method, check sample sizes before proceeding. Do not use for states that have insufficient sample sizes.
You can check sample sizes by running the code through line X and checking the n
column of wage_series
.
Set up inflation adjustment
For more information on inflation adjusting wages, see Inflation adjusting with Realtalk.
# Calculate real wage over time: load CPI data from realtalk
<- realtalk::c_cpi_u_annual
cpi_data
# Set base year to 2024
<- cpi_data$c_cpi_u[cpi_data$year==2024] cpi2024
Create wage data
This code chunk calculates median wages and adjusts them for inflation.
# Note: change statefips to whichever state you prefer.
<- cps_org |>
wage_series filter(statefips == 36) |>
mutate(union = to_factor(union)) |>
summarise(
wage_median = averaged_median(
x = wage,
w = orgwgt/12,
quantiles_n = 9L,
quantiles_w = c(1:4, 5, 4:1)),
n=n(),
.by = c(year, union)) |>
# Merge annual CPI data to data frame by year
left_join(cpi_data, by='year') |>
# Inflation adjust wages
mutate(real_wage = wage_median * (cpi2024/c_cpi_u)) |>
select(year, union, real_wage)