The US Census Bureau website contains a wealth of data, much of it spatial.
The TIGER/Line Shapefiles collection of vector data covers all sorts of geographies, for example state, county, and census tract boundaries.
Navigate to the TIGER page, explore, and download the 2018 us state border shapefile.
Unzip the files into a subdirectory of your data directory.
rgdal
In the sp
spatial data paradigm, the main package for reading vector data is rgdal
. The main function for reading shapefiles and other vector-format spatial data is readOGR()
.
For example, I can use the following code to read the 2018 US state border data (which I’ve saved in a subdirectory called “data”):
require(rgdal)
states = rgdal::readOGR(
dsn = here("data", "tl_2018_us_state"))
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\tl_2018_us_state", layer: "tl_2018_us_state"
## with 56 features
## It has 14 fields
## Integer64 fields read as strings: ALAND AWATER
To verify, I can plot it:
par(mar = c(0, 0,0, 0))
plot(states)
Note that it looks like the data are in a GCS, and the data set contains not only the 50 states, but several other entities.
It may be useful, if perhaps tedious, to use our data frame subsetting skills to keep only the state borders of interest.
As you may know, “shapefiles” are but one of many file formats in which vector data may be stored.
Shapefiles aren’t actually a single file, but rather a collection of individual files that have similar filenames. You should use your computer’s file browser to look in the directory where you stored your US state border data to verify this.
A few things to note about reading shapefiles with readOGR()
:
layer
argument.A fundamental vector spatial data structure in R is the SpatialPolygonsDataFrame
.
This data structure associates vector data entities with rows in a 2-dimensional table.
data.frame
data structure.
data.frame
objects.For the other vector data types (lines and points) there are corresponding Spatial_____DataFrame data structures.
When you use the readOGR()
function to read a polygon shapefile (or vector data in another format), it reads the data into a SpatialPolygonsDataFrame.
Later in this lab, you’ll be working with the borders of Yellowstone National Park, which overlaps three states: Idaho, Montana, and Wyoming.
To subset these three states, I can use the subset()
function along with a logical test to retrieve only the three states of interest.
First, note that the 2-letter state abbreviations are stored in the STUSPS
column (for the TIGER data, you may need to use a different column if you use state border from a different source):
states$STUSPS
## [1] "WV" "FL" "IL" "MN" "MD" "RI" "ID" "NH" "NC" "VT" "CT" "DE" "NM" "CA" "NJ"
## [16] "WI" "OR" "NE" "PA" "WA" "LA" "GA" "AL" "UT" "OH" "TX" "CO" "SC" "OK" "TN"
## [31] "WY" "HI" "ND" "KY" "VI" "MP" "GU" "ME" "NY" "NV" "AK" "AS" "MI" "AR" "MS"
## [46] "MO" "MT" "KS" "IN" "PR" "SD" "MA" "VA" "DC" "IA" "AZ"
Next, perform the subsetting:
yellowstone_states =
subset(states, STUSPS %in% c("ID", "MT", "WY"))
par(mar = c(0, 0, 3, 0))
plot(yellowstone_states, main = "Look at this GCS!")
To complete this exercise, you’ll need the following vector data sets:
I found the Wyoming Geospatial Hub website useful, as well as the United States Geographic Survey and the US Census bureau for locating spatial data sets. If you find useful data sources, you should consider adding them to the course Wiki in Moodle!
This exercise is a chance to practice basic spatial operations on vector data in R.
The packages you’ll need are: rgdal
, raster
, and rgeos
.
The here
package will also come in handy for locating your data files.
rgdal
contains functions for reading vector data into Rraster
contains lots of utility functions that work on both raster and vector data.rgeos
contains functions to perform common spatial operations like unions, intersections, buffers, etc.Note that the package sp
is automatically loaded when when you load these packages, so you must have it installed as well.
You can use library()
or require()
to load a package (I prefer require()
)
Among the functions you’ll need are:
here()
to help locate your data files relative to your project directory.readOGR()
for reading vector spatial data sets.projection()
(from the raster package) and proj4string()
(from sp
)spTransform()
For performing transformations between coordinate systems, a.k.a projecting and reprojecting.After you find your vector data, you need to get it into R!
Using a combination of here()
and readOGR()
I was able to read in the Yellowstone border data as:
require(rgdal)
require(here)
yellowstone = readOGR(
here("data", "ab67")
)
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\ab67", layer: "ab67"
## with 1 features
## It has 13 fields
## Integer64 fields read as strings: AB67_ AB67_ID
Some things to note:
here()
inside my call to readOGR
.yellowstone
, you’ll be able to follow along with the walkthrough.Let’s examine the yellowstone
object. We’d like to know what type of object it is; we can use the class()
function for that:
class(yellowstone)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
It’s our friend the SpatialPolygonsDataFrame.
What’s it’s coordinate system?
# Using proj4string from the sp package
proj4string(yellowstone)
# Using projection from the raster package
projection(yellowstone)
## [1] "+proj=lcc +lat_0=44.25 +lon_0=-109.5 +lat_1=49 +lat_2=45 +x_0=600000 +y_0=0 +datum=NAD83 +units=m +no_defs"
## [1] "+proj=lcc +lat_0=44.25 +lon_0=-109.5 +lat_1=49 +lat_2=45 +x_0=600000 +y_0=0 +datum=NAD83 +units=m +no_defs"
You should always inspect the coordinate system when you load a spatial dataset.
Unlike Arc, R will not reproject on the fly. Spatial data sets must be in a common coordinate system for plotting and spatial operations to work.
Now, let’s create a simple plot:
par(mar = c(0, 0, 0, 0))
plot(yellowstone)
To make our plot slightly nicer, we’ll define a custom color to fill the polygon. I’ve taken R’s “steelblue” color and modified it with adjustcolor()
to have an alpha value (opacity) of 0.4 (code not shown). Check out the help for adjustcolor()
, it’s fairly easy to understand.
par(mar = c(0, 0, 0, 0))
# make sure you have defined the variable yst_col
plot(yellowstone, col = yst_col)
You can use the above code as a template to read your US States and Wyoming counties data into R.
If you use variable names wy_counties
for Wyoming counties and yellowstone_states
for the Yellowstone state borders, you’ll be able to follow along with my code.
Review the materials from earlier in the lab if you need a hint.
Now that you’ve got all of your data, and you can import into R, you’ll need to make sure that all of your spatial data objects are in the same coordinate system.
I decided to use the native coordinate system of the Yellowstone borders file.
Once you decide on a coordinate system to use, it can be convenient to do transformations when you first read in your data.
For example, I read my states data using the following code:
states = spTransform(
readOGR(
here("data", "tl_2018_us_state")
),
proj4string(yellowstone))
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\tl_2018_us_state", layer: "tl_2018_us_state"
## with 56 features
## It has 14 fields
## Integer64 fields read as strings: ALAND AWATER
## Warning in proj4string(yellowstone): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html
Notes:
spTransform()
expects a Spatial object as its first argument and a coordinate reference system (which can be specified in several ways) as the second argument. You should check out the help entry for more info.projection(yellowstone)
.You should repeat this process with the Wyoming counties borders data.
Now that you’ve got your data into R and all in the same coordinate system, you’re ready to do some plotting and spatial operations!
First, a reality check. Let’s plot the Yellowstone states, then add the Yellowstone borders.
par(mar = c(0, 0, 0, 0))
plot(yellowstone_states)
plot(yellowstone, col = yst_col, add = TRUE)
Not too bad!
Now, suppose you wanted to make the states more colorful.
You can use the col =
argument to specify fill colors for polygons. Since there are three of them, your col
argument needs to be a vector of three colors, which can be specified in several ways.
Here’s an ugly example I created using col = 2:4
(code not shown):
I’m sure you can find some better colors to use!
Finally, we’re ready to do some spatial operations on our data!
Among the functions you’ll use are (in no particular order):
gUnion()
: union with dissolvegUnaryUnion()
: Dissolve polygons within a SPDFunion()
: union without dissolveextent()
: Coordinates of the bounding box of a SPDFerase()
: Erase one SPDF from anothergIntersection()
intersection with dissolveintersect()
intersection without dissolvegBuffer()
buffer the edgesThe way that coordinate systems are specified in Spatial objects in R is in the process of being updated from “Proj4” to “Proj6”. This is related to the general replacement of the older sp
universe of spatial analysis in R with the newer sf
paradigm.
The rgeos
versions of the union and intersection functions are more sensitive to slight differences in how the coordinate systems are specified and therefore you might get warnings or errors when using gIntersection()
or gUnion()
. We will deal with these errors if or when they come up in class.
Review the spatial operations slides near the end of slide deck 1.
Experiment with plotting your states, WY counties, and Yellowstone polygon data.
For example, I can plot all of the Wyoming counties with Yellowstone on top using:
par(mar = c(0, 0, 0, 0))
plot(wy_counties)
plot(yellowstone, add = T, col = yst_col)
Challenge: try updating this plot to zoom in on the region surrounding the park. There are numerous ways you might approach this problem, one possible method:
xlim
and ylim
arguments to plot.# Buffer by 20 kilometers
yst_buf_20k = gBuffer(yellowstone, width = 20000)
# Get the coordinates of the bounding box
yst_buf_20k_ext = extent(yst_buf_20k)
# Show the coordinates
yst_buf_20k_ext
## class : Extent
## xmin : 449724.6
## xmax : 593531
## ymin : -32703.64
## ymax : 116362.2
Here’s how my plot looks with the new extent:
par(mar = c(0, 0, 0, 0))
plot(
wy_counties,
xlim = c(yst_buf_20k_ext[1], yst_buf_20k_ext[2]),
ylim = c(yst_buf_20k_ext[3], yst_buf_20k_ext[4]))
plot(yellowstone, add = T, col = yst_col)
Experiment with the spatial operations functions listed above and see if you can re-create my lecture slide figures.
Here are a couple of examples you can use as templates to get started:
Dissolving polygons within a SPDF
par(mar = c(0, 0, 0, 0))
plot(gUnaryUnion(yellowstone_states))
Erase without dissolve
par(mar = c(0, 0, 0, 0))
plot(erase(yellowstone_states, yellowstone), col = 2)
Compile your answers to the following into a document and submit on Moodle.
You may compile your answers using a word processing application like Word or Google Docs, but export your final product as a .pdf for submission.
You may also do your work in an R Markdown document. For your submission, you should knit your document into an html file.
Using the template code I provided above for subsetting the Yellowstone states, make a new subset that consists of the six New England states: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont.
Choose one of the maps you’ve created for the intersection or union questions.
Use your internet sleuthing skills to locate the proj4string entry for the Massachusetts State Plane projection.
Re-project your Yellowstone states SPDF into the Massachusetts State Plane projection and plot it.
It should look kind of weird: