sf
For this exercise, you’ll need the following packages:
sf
First, start a fresh R session so that you have an empty environment. Alternatively, you can run the following code: rm(list = ls())
, although restarting R is the preferred option.
Download the following files and save them to your data
subdirectory:
ca_border.GPKG
ca_counties.GPKG
nv_border.GPKG
nv_counties.GPKG
dvnp.GPKG
Note that these files are in the GeoPackage format. This format has the same functionality as ESRI shapefiles, but it also has a key advantage:
This makes it much easier to keep your data organized. It also makes it easier to change filenames when needed: you only need to change the name of one file!
read_sf
The syntax for reading data with read_sf()
in the sf
package is very similar to that for reading with readOGR()
.
For example, if I want to read the California borders vector data, I would simply type:
ca_border = read_sf(here("data", "ca_border.GPKG"))
Following my code example example, read in the CA county data, the NV county and state outline data, and the Death Valley National Park data (dvnp.GPKG). To follow along with this activity you should use the following variable names for your sf
objects:
ca_cnty
nv_border
nv_cnty
dvnp
We used proj4string()
to get and set the CRS when we worked with sp
objects. Similar functionality is provided by the st_crs()
function in sf
For example, to view the complete CRS for the California county borders object, I can use:
st_crs(ca_cnty)
## Coordinate Reference System:
## User input: NAD83 / California zone 4 (ftUS)
## wkt:
## PROJCRS["NAD83 / California zone 4 (ftUS)",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["SPCS83 California zone 4 (US Survey feet)",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",35.3333333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-119,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",37.25,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",36,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",6561666.667,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",1640416.667,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["US survey foot",0.304800609601219]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["US survey foot",0.304800609601219]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["United States (USA) - California - counties Fresno; Inyo; Kings; Monterey; San Benito; Tulare."],
## BBOX[35.78,-122.01,37.58,-115.62]],
## ID["EPSG",2228]]
You should check the CRS of the Nevada counties sf
object.
sf
Reprojecting sf
objects is easy using the st_transform()
function. The syntax is very similar to spTransform()
:
Suppose I wanted to transform the CA counties data into UTM 11 (the coordinate system of the nevada counties.) I could use the following syntax:
ca_cnty = st_transform(ca_cnty, st_crs(nv_cnty))
You should now make sure that all 5 sf
objects are in the UTM 11N projection.
ca_cnty = st_transform(ca_cnty, st_crs(nv_cnty))
ca_border = st_transform(ca_border, st_crs(nv_cnty))
nv_border = st_transform(nv_border, st_crs(nv_cnty))
dvnp = st_transform(dvnp, st_crs(nv_cnty))
sf
objectsYou can plot sf
using base R graphics.
par(mar = c(0, 0, 0, 0))
plot(st_geometry(nv_cnty))
plot(st_geometry(dvnp), add = T, col = adjustcolor("steelblue", 0.3))
NOTE: the main difference is that you have to use the st_geometry()
function inside plot()
If you try to plot the object directly, you get a set of choropleth maps, one for each column in the dataset:
plot(nv_cnty)
## Warning: plotting the first 10 out of 55 attributes; use max.plot = 55 to plot all
This can be useful, but may not be what you want.
If you want to plot only a single attribute, you can subset using square brackets:
plot(nv_cnty["POP1999"])
You can plot several attributes using a vector of column names
plot(nv_cnty[c("POP1990", "POP1999")])
ggplot2()
A better option is to plot sf
objects using ggplot()
:
require(ggplot2)
ggplot(nv_cnty) +
geom_sf()
If I want just the boundaries (no borders) an can tell it to fill with a transparent color:
ggplot(nv_cnty) +
geom_sf(fill = "transparent")
Note how much simpler that was and also that it makes a nicer-looking map.
An alternative formulation is to put the data object inside of geom_sf()
rather than ggplot()
:
ggplot() +
geom_sf(data = nv_cnty, fill = adjustcolor("steelblue", 0.3))
An adavantage of this formulation is that you can plot several sf
objects without having to do a union operation first:
ggplot() +
geom_sf(data = nv_cnty) +
geom_sf(data = ca_cnty)
Try plotting all of these layers in a single plot:
Next, elaborate on your plot with the following modifications: - Nevada counties: use a semi-transparent fill color, and thinner border lines - California counties: use a different semi-transparent fill color and thin border lines
My version looked like this:
ggplot()
ggplot()
makes it easy to color polygons based on a gradient of values. For example, I can make a choropleth of the nevada population in 1999 using the fill aesthetic:
names(nv_cnty)
## [1] "AREA" "PERIMETER" "NV_COUNTY_" "NV_COUNT_1" "NAME" "STATE_NAME" "STATE_FIPS" "CNTY_FIPS"
## [9] "FIPS" "POP1990" "POP1999" "POP90_SQMI" "HOUSEHOLDS" "MALES" "FEMALES" "WHITE"
## [17] "BLACK" "AMERI_ES" "ASIAN_PI" "OTHER" "HISPANIC" "AGE_UNDER5" "AGE_5_17" "AGE_18_29"
## [25] "AGE_30_49" "AGE_50_64" "AGE_65_UP" "NEVERMARRY" "MARRIED" "SEPARATED" "WIDOWED" "DIVORCED"
## [33] "HSEHLD_1_M" "HSEHLD_1_F" "MARHH_CHD" "MARHH_NO_C" "MHH_CHILD" "FHH_CHILD" "HSE_UNITS" "VACANT"
## [41] "OWNER_OCC" "RENTER_OCC" "MEDIAN_VAL" "MEDIANRENT" "UNITS_1DET" "UNITS_1ATT" "UNITS2" "UNITS3_9"
## [49] "UNITS10_49" "UNITS50_UP" "MOBILEHOME" "NO_FARMS87" "AVG_SIZE87" "CROP_ACR87" "AVG_SALE87" "geom"
ggplot() +
geom_sf(data = nv_cnty, mapping = aes(fill = POP1999))
This isn’t very informative because most counties have low populations, except for the county that contains Las Vegas.
I could take the logarithm of the population directly in the fill aesthetic:
ggplot() +
geom_sf(data = nv_cnty, mapping = aes(fill = log(POP1999)))
That gives a nicer looking map, but it’s hard to interpret a log scale.
I can also do the log transformation using scale_fill_gradient()
with the trans
argument.
ggplot() +
geom_sf(data = nv_cnty, mapping = aes(fill = POP1999)) +
scale_fill_gradient(trans = "log")
This method makes the legend more readable (but the breaks aren’t very nice…).
Check out the help entry for scale_fill_gradient()
to see how you can set the break labels.
Here’s a choropleth that I made with some additional modifications:
If you make it through all of the exercises above, here are some ideas to try out:
theme()
function for some fine control over different plot elements.
cex
, size
, fill
, lwd
, lty
.gray()
function