Overview

This page contains information and R code to re-create the figures from my talk and workshop.

You may download a pdf of my slides here.

Packages

You’ll need the following packages to re-create the figures

  • sf
  • spData
  • ggplot2
  • palmerpenguins
  • tidycensus
  • tidyverse

Ggplot Figures

For these figures, you’ll need to load the ggplot2 and palmerpenguins packages:

require(ggplot2)
require(palmerpenguins)

Scatterplot 1

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    y = bill_length_mm)) +
  xlab("Body Mass (g)") +
  ylab("Bill Length (mm)") +
  geom_point()

Scatterplot 2

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    y = bill_length_mm,
    colour = species)) +
  xlab("Body Mass (g)") +
  ylab("Bill Length (mm)") +
  geom_point()

Scatterplot 3

ggplot(
  na.omit(penguins),
  aes(
    x = body_mass_g,
    y = bill_length_mm,
    colour = species,
    shape = sex)) +
  xlab("Body Mass (g)") +
  ylab("Bill Length (mm)") +
  # Use the cex argument to make the points larger
  geom_point(cex = 2)

Maps: CONUS - spData

You’ll need to load the spData package for the us states border vector data.

Loading the Data

require(spData)

This package contains a sf object called us_states

head(us_states)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS:  NAD83
##   GEOID        NAME   REGION             AREA total_pop_10 total_pop_15
## 1    01     Alabama    South 133709.27 [km^2]      4712651      4830620
## 2    04     Arizona     West 295281.25 [km^2]      6246816      6641928
## 3    08    Colorado     West 269573.06 [km^2]      4887061      5278906
## 4    09 Connecticut Norteast  12976.59 [km^2]      3545837      3593222
## 5    12     Florida    South 151052.01 [km^2]     18511620     19645772
## 6    13     Georgia    South 152725.21 [km^2]      9468815     10006693
##                         geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...

Subsetting CONUS

To restrict to CONUS (CONtinental United States), a.k.a the lower 48, you can use the subset() function on the us_states object.

conus = subset(us_states, !(NAME %in% c("Alaska", "Hawaii")))
head(conus)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS:  NAD83
##   GEOID        NAME   REGION             AREA total_pop_10 total_pop_15
## 1    01     Alabama    South 133709.27 [km^2]      4712651      4830620
## 2    04     Arizona     West 295281.25 [km^2]      6246816      6641928
## 3    08    Colorado     West 269573.06 [km^2]      4887061      5278906
## 4    09 Connecticut Norteast  12976.59 [km^2]      3545837      3593222
## 5    12     Florida    South 151052.01 [km^2]     18511620     19645772
## 6    13     Georgia    South 152725.21 [km^2]      9468815     10006693
##                         geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...

Keep Selected Columns

To make things simple, we’ll use only a subset of the data columns:

conus_2 = conus[, c("NAME", "REGION", "total_pop_15")]
names(conus_2) = c("State", "Region", "Population", "geometry")
head(conus_2)

Map 1: Borders

Simple map of state borders, in a geographic coordinate system:

ggplot(conus_2) + geom_sf()

Map 2: Reprojected

A nice projection for CONUS is the EPSG:5070. It’s an Albers Equal Area Conic projection (don’t worry if you don’t know what that means).

We can use coord_sf() to easily set the coordinate system of our map.

If you use coord_sf() you don’t have to reproject your sf objects, so it’s very convenient!

ggplot(conus_2) + geom_sf() +
  coord_sf(crs = sf::st_crs(5070))

Map 3: Color by Region

ggplot(conus_2) + 
  geom_sf(aes(colour = Region)) +
  coord_sf(crs = sf::st_crs(5070))

Map 5: Population Choropleth 1

This time we use both the colour and fill aesthetics.

ggplot(conus_2) + 
  geom_sf(
    aes(
      colour = Region,
      fill = Region),
    # The alpha argument sets the transparency for the fill color
    alpha = 0.5) +
  coord_sf(crs = sf::st_crs(5070))

Map 5a: Population Choropleth 1

A basic choropleth using the default fill colors:

ggplot(conus_2) + geom_sf() +
  geom_sf(
    aes(
      fill = Population),
    lwd = 0.5) +
  coord_sf(crs = sf::st_crs(5070))

Map 5b: Population Choropleth 2

The viridis color scale c is easier to see.

  • Divide the population by 1 million to make the legend numbers cleaner.
  • You can use the name argument to set the title of the legend.
ggplot(conus_2) +
  geom_sf(
    aes(
      fill = Population * 1e-6),
    lwd = 0.5) +
  coord_sf(crs = sf::st_crs(5070)) +
  scale_fill_viridis_c(name = "Population\n(millions)")

Map 5c: Too Much!

This map is very difficult to read, but it illustrates the simultaneous use of both fill and colour aesthetics.

ggplot(conus_2) + geom_sf() +
  geom_sf(
    aes(
      colour = Region,
      fill = Population * 1e-6),
    lwd = 0.8) +
  coord_sf(crs = sf::st_crs(5070)) +
  scale_fill_viridis_c(name = "Population\n(millions)")

Census - tidycensus

The American Community Survey (ACS) contains data on hundreds of demographic and socioeconomic variables.

To illustrate the basic functionality of the tidycensus package we’ll concentrate on one variable: Median Household Income (ACS variable B19013_001).

Installing tidycensus

To use tidycensus, you need to first set up an API key with the US Census bureau. Don’t worry, it’s easier than it sounds!

You can visit this page to request your API code. It should only take a few minutes.

Once you have your API key, you can use the following code to install it:

census_api_key("your API key goes here", install = TRUE)

By using the install = TRUE argument, you won’t have to run this command every time you restart R.

Hampshire County Median Income

The ACS variable code for median household income is “B19013_001”.

To retrieve a sf object with median household income for census tracts in Hampshire County Massachusetts I can use the following code:

hampshire = get_acs(
  state = "MA",
  county = "Hampshire",
  geography = "tract", 
  variables = "B19013_001",
  # This retrieves a spatial version of the data
  geometry = TRUE,
  year = 2020)

Hampshire County Choropleth

To plot a choropleth, we can use what we learned above about plotting sf objects with ggplot().

ggplot(hampshire) + 
  geom_sf(aes(fill = estimate)) +
  scale_fill_viridis_b(name = "Median\nHousehold\nIncome")

It looks like there’s one little census tract with very high income. We can examine a histogram. I personally prefer the base-R histograms to the ggplot version.

hist(
  hampshire$estimate,
  main = "Histogram of Median Income\nHampshire County MA",
  xlab = "Median Income in Dollars")

MA County Median Income

I can use similar code to make a choropleth of median income at the county level for Massachusetts:

MA = get_acs(
  state = "MA",
  geography = "county", 
  variables = "B19013_001",
  geometry = TRUE,
  year = 2020)

And plotting the choropleth:

ggplot(MA) + 
  geom_sf(aes(fill = estimate)) +
  scale_fill_viridis_b(name = "Median\nHousehold\nIncome")