Downloading BLS data via blsR

Author

Jori Kandra

Loading blsR and setting API key

blsR is the R package written and maintained by the Bureau of Labor Statistics to provide users with BLS data, including Current Employment Statistics (CES), Worker characteristics data (CPS), inflation & prices (CPI), and Job Openings Layoff and Turnover Survey (JOLTS), among others.

BLS provides this data through the use of an Application Programming Interface (API), which allows R to directly import data from BLS using the series IDs which are extensively documented by BLS. This process automates the manual retrieval of data using the BLS series report and delivers the tidied data.

The following chunk of code loads the R libraries necessary for this exercise. You may need to install them to run this code.

# download blsR
#install.library("blsR")

# import relevant libraries
library(tidyverse)
library(blsR)
library(here)

In order to access BLS data you must register for a unique API key. Save your key as an environmental object to be used later.

# set key for BLS api
#note: each user must register for unique BLS API key here: https://www.bls.gov/developers/home.htm
bls_key <- Sys.getenv("your-key-goes-here")

Nominal average hourly earnings of production and non-supervisory employees

You can import a single table using the blsR::get_n_series_table() function:

## use blsR to pull in nominal wages
nominal_wages <- get_n_series_table(series = "CEU0500000008", api_key = bls_key, 
                                    start_year = 1965, end_year = 2023, 
                                    tidy = TRUE, annualaverage = TRUE) %>% 
  # filter for annual data
  filter(month == 13)

Real average hourly earnings of production and non-supervsory employees

You can also import multiple series at once. For example, we can import AHE and CPI in order to calculate real wages.

# set cpi codes
cpi_codes <- c("CUUR0000SA0",
               "CEU0500000008")

# set cpi base
#note: used to set base year for inflation-adjustment
cpi_base <- get_n_series_table(series_ids = "CUUR0000SA0", start_year = 2023, end_year = 2023, 
                               api_key = bls_key, tidy = TRUE, annualaverage = TRUE) %>% 
  # filter annual 2023 data and pull CPI value as base
  filter(month == 13) %>% pull(CUUR0000SA0)

# use blsR to pull CPI data
cpi_output <- get_n_series_table(series_ids = cpi_codes, start_year = 1947, end_year = 2023, 
                                 api_key = bls_key, tidy = TRUE, annualaverage = TRUE) %>% 
  # filter annual data
  filter(month == 13) %>%
  # rename for easier handling
  rename(ahe = CEU0500000008, cpi = CUUR0000SA0) %>% 
  # calculate real wages
  mutate(ahe_real = ahe * (cpi_base/cpi)) %>% 
  # export to delimited file
  write.csv(here("ahe_real.csv"))

Pulling more than 50 series

The BLS API limits each call to 50 series, which can be limiting when you are trying to pull in large datasets. Here is a trick I use in the code that runs our Jobs and Unemployment page to workaround this limitation:

# read in bls series codes
#note: this is a random selection of codes used in our Jobs and Unemployment figures
bls_codes <- read.csv(here("bls_codes.csv"))
# remove any blanks
bls_codes <- bls_codes$series_id[bls_codes$series_id != ""]


# use map to iteratively call blsR api at max number of series id
#note: BLS restricts to max 50 series in a single call
jobs_day_df <- map(split(bls_codes, ceiling(seq_along(bls_codes) / 50)), # split codes into groups of 50
             # call blsR using series ids sliced into groups of 50
             ~ get_n_series_table(series_ids = .x, start_year = 1939, end_year = 2023, 
                                  api_key = bls_key, tidy = TRUE)) %>% 
  # map returns list, flatten by joining data
  reduce(., function(df1, df2) full_join(df1, df2, by = c("year", "month"))) %>%
  # define date
  mutate(date = as.POSIXct(paste(year,month,1, sep = "-")),
         date = as.Date(date)) %>% 
  write_csv(here("jobs_day_example.csv"))