This page contains information and R code to re-create the figures from my talk and workshop.
You may download a pdf of my slides here.
You’ll need the following packages to re-create the figures
sfspDataggplot2palmerpenguinstidycensustidyverseFor these figures, you’ll need to load the ggplot2 and
palmerpenguins packages:
require(ggplot2)
require(palmerpenguins)ggplot(
penguins,
aes(
x = body_mass_g,
y = bill_length_mm)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
geom_point()ggplot(
penguins,
aes(
x = body_mass_g,
y = bill_length_mm,
colour = species)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
geom_point()ggplot(
na.omit(penguins),
aes(
x = body_mass_g,
y = bill_length_mm,
colour = species,
shape = sex)) +
xlab("Body Mass (g)") +
ylab("Bill Length (mm)") +
# Use the cex argument to make the points larger
geom_point(cex = 2)spDataYou’ll need to load the spData package for the us states
border vector data.
require(spData)This package contains a sf object called
us_states
head(us_states)## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 01 Alabama South 133709.27 [km^2] 4712651 4830620
## 2 04 Arizona West 295281.25 [km^2] 6246816 6641928
## 3 08 Colorado West 269573.06 [km^2] 4887061 5278906
## 4 09 Connecticut Norteast 12976.59 [km^2] 3545837 3593222
## 5 12 Florida South 151052.01 [km^2] 18511620 19645772
## 6 13 Georgia South 152725.21 [km^2] 9468815 10006693
## geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...
To restrict to CONUS (CONtinental United States), a.k.a the lower
48, you can use the subset() function on the
us_states object.
conus = subset(us_states, !(NAME %in% c("Alaska", "Hawaii")))
head(conus)## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -114.8136 ymin: 24.55868 xmax: -71.78699 ymax: 42.04964
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 01 Alabama South 133709.27 [km^2] 4712651 4830620
## 2 04 Arizona West 295281.25 [km^2] 6246816 6641928
## 3 08 Colorado West 269573.06 [km^2] 4887061 5278906
## 4 09 Connecticut Norteast 12976.59 [km^2] 3545837 3593222
## 5 12 Florida South 151052.01 [km^2] 18511620 19645772
## 6 13 Georgia South 152725.21 [km^2] 9468815 10006693
## geometry
## 1 MULTIPOLYGON (((-88.20006 3...
## 2 MULTIPOLYGON (((-114.7196 3...
## 3 MULTIPOLYGON (((-109.0501 4...
## 4 MULTIPOLYGON (((-73.48731 4...
## 5 MULTIPOLYGON (((-81.81169 2...
## 6 MULTIPOLYGON (((-85.60516 3...
To make things simple, we’ll use only a subset of the data columns:
conus_2 = conus[, c("NAME", "REGION", "total_pop_15")]
names(conus_2) = c("State", "Region", "Population", "geometry")
head(conus_2)Simple map of state borders, in a geographic coordinate system:
ggplot(conus_2) + geom_sf()A nice projection for CONUS is the EPSG:5070. It’s an Albers Equal Area Conic projection (don’t worry if you don’t know what that means).
We can use coord_sf() to easily set the coordinate
system of our map.
If you use coord_sf() you don’t have to reproject your
sf objects, so it’s very convenient!
ggplot(conus_2) + geom_sf() +
coord_sf(crs = sf::st_crs(5070))ggplot(conus_2) +
geom_sf(aes(colour = Region)) +
coord_sf(crs = sf::st_crs(5070))This time we use both the colour and fill
aesthetics.
ggplot(conus_2) +
geom_sf(
aes(
colour = Region,
fill = Region),
# The alpha argument sets the transparency for the fill color
alpha = 0.5) +
coord_sf(crs = sf::st_crs(5070))A basic choropleth using the default fill colors:
ggplot(conus_2) + geom_sf() +
geom_sf(
aes(
fill = Population),
lwd = 0.5) +
coord_sf(crs = sf::st_crs(5070))The viridis color scale c is easier to see.
name argument to set the title of the
legend.ggplot(conus_2) +
geom_sf(
aes(
fill = Population * 1e-6),
lwd = 0.5) +
coord_sf(crs = sf::st_crs(5070)) +
scale_fill_viridis_c(name = "Population\n(millions)")This map is very difficult to read, but it illustrates the simultaneous use of both fill and colour aesthetics.
ggplot(conus_2) + geom_sf() +
geom_sf(
aes(
colour = Region,
fill = Population * 1e-6),
lwd = 0.8) +
coord_sf(crs = sf::st_crs(5070)) +
scale_fill_viridis_c(name = "Population\n(millions)")tidycensusThe American Community Survey (ACS) contains data on hundreds of demographic and socioeconomic variables.
To illustrate the basic functionality of the tidycensus
package we’ll concentrate on one variable: Median Household Income
(ACS variable B19013_001).
tidycensusTo use tidycensus, you need to first set up an API key
with the US Census bureau. Don’t worry, it’s easier than it sounds!
You can visit this page to request your API code. It should only take a few minutes.
Once you have your API key, you can use the following code to install it:
census_api_key("your API key goes here", install = TRUE)By using the install = TRUE argument, you won’t have to
run this command every time you restart R.
The ACS variable code for median household income is “B19013_001”.
To retrieve a sf object with median household income for
census tracts in Hampshire County Massachusetts I can use the following
code:
hampshire = get_acs(
state = "MA",
county = "Hampshire",
geography = "tract",
variables = "B19013_001",
# This retrieves a spatial version of the data
geometry = TRUE,
year = 2020)To plot a choropleth, we can use what we learned above about plotting
sf objects with ggplot().
ggplot(hampshire) +
geom_sf(aes(fill = estimate)) +
scale_fill_viridis_b(name = "Median\nHousehold\nIncome")It looks like there’s one little census tract with very high income. We can examine a histogram. I personally prefer the base-R histograms to the ggplot version.
hist(
hampshire$estimate,
main = "Histogram of Median Income\nHampshire County MA",
xlab = "Median Income in Dollars")I can use similar code to make a choropleth of median income at the county level for Massachusetts:
MA = get_acs(
state = "MA",
geography = "county",
variables = "B19013_001",
geometry = TRUE,
year = 2020)And plotting the choropleth:
ggplot(MA) +
geom_sf(aes(fill = estimate)) +
scale_fill_viridis_b(name = "Median\nHousehold\nIncome")