This page contains information and R code to re-create the figures from my talk and workshop.
You may download a pdf of my slides here.
You’ll need the following packages to re-create the figures
sf
spData
ggplot2
palmerpenguins
tidycensus
tidyverse
For these figures, you’ll need to load the ggplot2
and
palmerpenguins
packages:
require(ggplot2)
require(palmerpenguins)
ggplot(
penguins,aes(
x = body_mass_g,
y = bill_length_mm)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
geom_point()
ggplot(
penguins,aes(
x = body_mass_g,
y = bill_length_mm,
colour = species)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
geom_point()
ggplot(
na.omit(penguins),
aes(
x = body_mass_g,
y = bill_length_mm,
colour = species,
shape = sex)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
# Use the cex argument to make the points larger
geom_point(cex = 2)
spData
You’ll need to load the spData
package for the us states
border vector data.
require(spData)
This package contains a sf
object called
us_states
head(us_states)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 01 Alabama South 133709.27 [km^2] 4712651 4830620
## 2 04 Arizona West 295281.25 [km^2] 6246816 6641928
## 3 08 Colorado West 269573.06 [km^2] 4887061 5278906
## 4 09 Connecticut Norteast 12976.59 [km^2] 3545837 3593222
## 5 12 Florida South 151052.01 [km^2] 18511620 19645772
## 6 13 Georgia South 152725.21 [km^2] 9468815 10006693
## geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...
To restrict to CONUS (CONtinental United States), a.k.a the lower
48, you can use the subset()
function on the
us_states
object.
= subset(us_states, !(NAME %in% c("Alaska", "Hawaii")))
conus head(conus)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 01 Alabama South 133709.27 [km^2] 4712651 4830620
## 2 04 Arizona West 295281.25 [km^2] 6246816 6641928
## 3 08 Colorado West 269573.06 [km^2] 4887061 5278906
## 4 09 Connecticut Norteast 12976.59 [km^2] 3545837 3593222
## 5 12 Florida South 151052.01 [km^2] 18511620 19645772
## 6 13 Georgia South 152725.21 [km^2] 9468815 10006693
## geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...
To make things simple, we’ll use only a subset of the data columns:
= conus[, c("NAME", "REGION", "total_pop_15")]
conus_2 names(conus_2) = c("State", "Region", "Population", "geometry")
head(conus_2)
Simple map of state borders, in a geographic coordinate system:
ggplot(conus_2) + geom_sf()
A nice projection for CONUS is the EPSG:5070. It’s an Albers Equal Area Conic projection (don’t worry if you don’t know what that means).
We can use coord_sf()
to easily set the coordinate
system of our map.
If you use coord_sf()
you don’t have to reproject your
sf
objects, so it’s very convenient!
ggplot(conus_2) + geom_sf() +
coord_sf(crs = sf::st_crs(5070))
ggplot(conus_2) +
geom_sf(aes(colour = Region)) +
coord_sf(crs = sf::st_crs(5070))
This time we use both the colour
and fill
aesthetics.
ggplot(conus_2) +
geom_sf(
aes(
colour = Region,
fill = Region),
# The alpha argument sets the transparency for the fill color
alpha = 0.5) +
coord_sf(crs = sf::st_crs(5070))
A basic choropleth using the default fill colors:
ggplot(conus_2) + geom_sf() +
geom_sf(
aes(
fill = Population),
lwd = 0.5) +
coord_sf(crs = sf::st_crs(5070))
The viridis color scale c is easier to see.
name
argument to set the title of the
legend.ggplot(conus_2) +
geom_sf(
aes(
fill = Population * 1e-6),
lwd = 0.5) +
coord_sf(crs = sf::st_crs(5070)) +
scale_fill_viridis_c(name = "Population\n(millions)")
This map is very difficult to read, but it illustrates the simultaneous use of both fill and colour aesthetics.
ggplot(conus_2) + geom_sf() +
geom_sf(
aes(
colour = Region,
fill = Population * 1e-6),
lwd = 0.8) +
coord_sf(crs = sf::st_crs(5070)) +
scale_fill_viridis_c(name = "Population\n(millions)")
tidycensus
The American Community Survey (ACS) contains data on hundreds of demographic and socioeconomic variables.
To illustrate the basic functionality of the tidycensus
package we’ll concentrate on one variable: Median Household Income
(ACS variable B19013_001).
tidycensus
To use tidycensus
, you need to first set up an API key
with the US Census bureau. Don’t worry, it’s easier than it sounds!
You can visit this page to request your API code. It should only take a few minutes.
Once you have your API key, you can use the following code to install it:
census_api_key("your API key goes here", install = TRUE)
By using the install = TRUE
argument, you won’t have to
run this command every time you restart R.
The ACS variable code for median household income is “B19013_001”.
To retrieve a sf
object with median household income for
census tracts in Hampshire County Massachusetts I can use the following
code:
= get_acs(
hampshire state = "MA",
county = "Hampshire",
geography = "tract",
variables = "B19013_001",
# This retrieves a spatial version of the data
geometry = TRUE,
year = 2020)
To plot a choropleth, we can use what we learned above about plotting
sf
objects with ggplot()
.
ggplot(hampshire) +
geom_sf(aes(fill = estimate)) +
scale_fill_viridis_b(name = "Median\nHousehold\nIncome")
It looks like there’s one little census tract with very high income. We can examine a histogram. I personally prefer the base-R histograms to the ggplot version.
hist(
$estimate,
hampshiremain = "Histogram of Median Income\nHampshire County MA",
xlab = "Median Income in Dollars")
I can use similar code to make a choropleth of median income at the county level for Massachusetts:
= get_acs(
MA state = "MA",
geography = "county",
variables = "B19013_001",
geometry = TRUE,
year = 2020)
And plotting the choropleth:
ggplot(MA) +
geom_sf(aes(fill = estimate)) +
scale_fill_viridis_b(name = "Median\nHousehold\nIncome")