Getting started with swadlr • swadlr

The swadlr package provides access to the EPI State of Working America Data Library (SWADL), a comprehensive resource for data on wages, employment, and the labor market in the United States.

library(swadlr)

Exploring available data

Before fetching data, you can explore what’s available in the SWADL API using swadl_id_names().

Topics

Topics are broad categories that group related indicators:

swadl_id_names("topics")

Indicators

Indicators are specific data series. You can list all indicators or filter by topic:

# List all indicators
swadl_id_names("indicators")

# List indicators for a specific topic
swadl_id_names("indicators", topic = "wages")

Measures

Measures are specific ways of presenting indicator data. For example, wage data might be available in nominal dollars, real (inflation-adjusted) dollars, or as a percentage:

# List all measures
swadl_id_names("measures")

# List measures for a specific indicator
swadl_id_names("measures", indicator = "hourly_wage_percentiles")

Dimensions

Dimensions allow subsetting data by demographic or other categories (e.g., gender, race, education). Each dimension has multiple values:

# List all dimensions and their values
swadl_id_names("dimensions")

# List dimensions for a specific indicator
swadl_id_names("dimensions", indicator = "hourly_wage_percentiles")

Geographies

The package supports national, regional, divisional, and state-level data:

# List all geographies
swadl_id_names("geographies")

# Filter to just states
geographies <- swadl_id_names("geographies")
geographies[geographies$level == "state", ]

Getting indicator information

Before fetching data, use swadl_indicator() to get detailed information about an indicator, including available measures, dimensions, date ranges, and geographic availability:

info <- swadl_indicator("hourly_wage_percentiles")
print(info)

You can also access specific components of the info object:

# Available measures
info$measures

# Availability by date interval, measure, and geography
info$availability

Fetching time series data

The main function for fetching data is get_swadl(). It returns a tibble with columns for date, value, geography, and any dimensions you request.

Basic usage

Fetch the median hourly wage over time:

wages <- get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50")
)
wages

Dimension syntax

The dimension argument supports several formats:

Overall (aggregate data):

Use "overall" to get aggregate data without demographic breakdown:

get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",

  dimension = "overall"
)

Single dimension (all values):

Pass a dimension ID to get all values for that dimension:

# All wage percentiles
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = "wage_percentile"
)

Single dimension (specific value):

Use a named list to filter to specific dimension values:

# Only the 90th percentile
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p90")
)

Multiple dimensions (cross-tabulated):

Combine dimensions using a list. Named elements filter to specific values, while unnamed elements include all values:

# Employment rate for males, by all age groups
get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "month",
  dimension = list("gender" = "gender_male", "age_group")
)

Date intervals

Most indicators support both annual and monthly data. Use the date_interval argument:

# Annual data (default)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  date_interval = "year",
  dimension = list("wage_percentile" = "wage_p50")
)

# Monthly data
get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "month",
  dimension = "overall"
)

Geographic levels

Fetch data for different geographic levels:

# National data (default)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "national",
  dimension = list("wage_percentile" = "wage_p50")
)

# State data (by name)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "California",
  dimension = list("wage_percentile" = "wage_p50")
)

# State data (by abbreviation)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "NY",
  dimension = list("wage_percentile" = "wage_p50")
)

# Census region
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "Midwest",
  dimension = list("wage_percentile" = "wage_p50")
)

Date filtering

Filter to specific dates or date ranges:

# Single date
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50"),
  date = "2023-01-01"
)

# Date range
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50"),
  date = c("2010-01-01", "2023-01-01")
)

Example: Wage percentiles over time

Here’s a complete example that fetches all wage percentiles and creates a summary:

# Fetch all wage percentiles
wages <- get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = "wage_percentile",
  date = c("2000-01-01", "2023-01-01")
)

# View the data
head(wages)

# Summary by percentile
aggregate(value ~ wage_percentile, data = wages, FUN = function(x) {
  c(start = x[1], end = x[length(x)], change = x[length(x)] - x[1])
})

Example: State-level employment

Fetch employment rates for all states with available data:

# Get info to see which geographic levels have data
info <- swadl_indicator("labor_force_emp")
info$availability

# Fetch data for California
ca_emp <- get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "year",
  geography = "California",
  dimension = "overall"
)
ca_emp

Cache management

The package caches metadata (topics, indicators, measures, dimensions, sources) within your R session to minimize API calls. If you need to refresh this cache:

clear_swadlr_cache()