Load packages.

For this exercise, you’ll need the following packages:

  • sf

Download and Read Data

First, start a fresh R session so that you have an empty environment. Alternatively, you can run the following code: rm(list = ls()), although restarting R is the preferred option.

Download the following files and save them to your data subdirectory:

  • ca_border.GPKG
  • ca_counties.GPKG
  • nv_border.GPKG
  • nv_counties.GPKG
  • dvnp.GPKG


Note that these files are in the GeoPackage format. This format has the same functionality as ESRI shapefiles, but it also has a key advantage:

  • It is a single-file format, i.e. there are not multiple files with the same name, but different extensions. Unlike a “shapefile”, all of the data and metadata are stored in a single file.

This makes it much easier to keep your data organized. It also makes it easier to change filenames when needed: you only need to change the name of one file!

Read data with read_sf

The syntax for reading data with read_sf() in the sf package is very similar to that for reading with readOGR().

For example, if I want to read the California borders vector data, I would simply type:

ca_border = read_sf(here("data", "ca_border.GPKG"))

Following my code example example, read in the CA county data, the NV county and state outline data, and the Death Valley National Park data (dvnp.GPKG). To follow along with this activity you should use the following variable names for your sf objects:

  • ca_cnty
  • nv_border
  • nv_cnty
  • dvnp

Viewing the Coordinate Reference System (CRS)

We used proj4string() to get and set the CRS when we worked with sp objects. Similar functionality is provided by the st_crs() function in sf

For example, to view the complete CRS for the California county borders object, I can use:

st_crs(ca_cnty)
## Coordinate Reference System:
##   User input: NAD83 / California zone 4 (ftUS) 
##   wkt:
## PROJCRS["NAD83 / California zone 4 (ftUS)",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["SPCS83 California zone 4 (US Survey feet)",
##         METHOD["Lambert Conic Conformal (2SP)",
##             ID["EPSG",9802]],
##         PARAMETER["Latitude of false origin",35.3333333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-119,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",37.25,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",36,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",6561666.667,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",1640416.667,
##             LENGTHUNIT["US survey foot",0.304800609601219],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["US survey foot",0.304800609601219]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["United States (USA) - California - counties Fresno; Inyo; Kings; Monterey; San Benito; Tulare."],
##         BBOX[35.78,-122.01,37.58,-115.62]],
##     ID["EPSG",2228]]

You should check the CRS of the Nevada counties sf object.

  • Are they the same?

Reprojecting in sf

Reprojecting sf objects is easy using the st_transform() function. The syntax is very similar to spTransform():

Suppose I wanted to transform the CA counties data into UTM 11 (the coordinate system of the nevada counties.) I could use the following syntax:

ca_cnty = st_transform(ca_cnty, st_crs(nv_cnty))

You should now make sure that all 5 sf objects are in the UTM 11N projection.

ca_cnty = st_transform(ca_cnty, st_crs(nv_cnty))
ca_border = st_transform(ca_border, st_crs(nv_cnty))
nv_border = st_transform(nv_border, st_crs(nv_cnty))
dvnp = st_transform(dvnp, st_crs(nv_cnty))

Plotting sf objects

Using base R

You can plot sf using base R graphics.

par(mar = c(0, 0, 0, 0))
plot(st_geometry(nv_cnty))
plot(st_geometry(dvnp), add = T, col = adjustcolor("steelblue", 0.3))

NOTE: the main difference is that you have to use the st_geometry() function inside plot()

If you try to plot the object directly, you get a set of choropleth maps, one for each column in the dataset:

plot(nv_cnty)
## Warning: plotting the first 10 out of 55 attributes; use max.plot = 55 to plot all

This can be useful, but may not be what you want.

If you want to plot only a single attribute, you can subset using square brackets:

plot(nv_cnty["POP1999"])

You can plot several attributes using a vector of column names

plot(nv_cnty[c("POP1990", "POP1999")])

Using ggplot2()

A better option is to plot sf objects using ggplot():

require(ggplot2)
ggplot(nv_cnty) +
  geom_sf()

If I want just the boundaries (no borders) an can tell it to fill with a transparent color:

ggplot(nv_cnty) +
  geom_sf(fill = "transparent")

Note how much simpler that was and also that it makes a nicer-looking map.

An alternative formulation is to put the data object inside of geom_sf() rather than ggplot():

ggplot() +
  geom_sf(data = nv_cnty, fill = adjustcolor("steelblue", 0.3))

An adavantage of this formulation is that you can plot several sf objects without having to do a union operation first:

ggplot() +
  geom_sf(data = nv_cnty) +
  geom_sf(data = ca_cnty) 

Try plotting all of these layers in a single plot:

  • Nevada counties
  • California counties
  • Nevada border with a transparent fill, and thicker lines
    • use the lwd argument
  • California border with a transparent fill and thicker lines

Next, elaborate on your plot with the following modifications: - Nevada counties: use a semi-transparent fill color, and thinner border lines - California counties: use a different semi-transparent fill color and thin border lines

My version looked like this:

Choropleth maps with ggplot()

ggplot() makes it easy to color polygons based on a gradient of values. For example, I can make a choropleth of the nevada population in 1999 using the fill aesthetic:

names(nv_cnty)
##  [1] "AREA"       "PERIMETER"  "NV_COUNTY_" "NV_COUNT_1" "NAME"       "STATE_NAME" "STATE_FIPS" "CNTY_FIPS" 
##  [9] "FIPS"       "POP1990"    "POP1999"    "POP90_SQMI" "HOUSEHOLDS" "MALES"      "FEMALES"    "WHITE"     
## [17] "BLACK"      "AMERI_ES"   "ASIAN_PI"   "OTHER"      "HISPANIC"   "AGE_UNDER5" "AGE_5_17"   "AGE_18_29" 
## [25] "AGE_30_49"  "AGE_50_64"  "AGE_65_UP"  "NEVERMARRY" "MARRIED"    "SEPARATED"  "WIDOWED"    "DIVORCED"  
## [33] "HSEHLD_1_M" "HSEHLD_1_F" "MARHH_CHD"  "MARHH_NO_C" "MHH_CHILD"  "FHH_CHILD"  "HSE_UNITS"  "VACANT"    
## [41] "OWNER_OCC"  "RENTER_OCC" "MEDIAN_VAL" "MEDIANRENT" "UNITS_1DET" "UNITS_1ATT" "UNITS2"     "UNITS3_9"  
## [49] "UNITS10_49" "UNITS50_UP" "MOBILEHOME" "NO_FARMS87" "AVG_SIZE87" "CROP_ACR87" "AVG_SALE87" "geom"
ggplot() +
  geom_sf(data = nv_cnty, mapping = aes(fill = POP1999))

This isn’t very informative because most counties have low populations, except for the county that contains Las Vegas.

I could take the logarithm of the population directly in the fill aesthetic:

ggplot() +
  geom_sf(data = nv_cnty, mapping = aes(fill = log(POP1999)))

That gives a nicer looking map, but it’s hard to interpret a log scale.

I can also do the log transformation using scale_fill_gradient() with the trans argument.

ggplot() +
  geom_sf(data = nv_cnty, mapping = aes(fill = POP1999)) +
  scale_fill_gradient(trans = "log")

This method makes the legend more readable (but the breaks aren’t very nice…).

Check out the help entry for scale_fill_gradient() to see how you can set the break labels.

Here’s a choropleth that I made with some additional modifications:

Things to Try

If you make it through all of the exercises above, here are some ideas to try out:

  • Plot the Death Valley National Park on top of your California/Nevada map. Try making it semitransparent or opaque to see which looks best.
  • Try a different color scale for the choropleth fill
  • Check out the theme() function for some fine control over different plot elements.
  • Change the order in which you add geom objects to your plots to see what happens.
  • Tinker with arguments like cex, size, fill, lwd, lty.
  • Check out the gray() function