Vector Data 1

TIGER Data

The US Census Bureau website contains a wealth of data, much of it spatial.

The TIGER/Line Shapefiles collection of vector data covers all sorts of geographies, for example state, county, and census tract boundaries.

Navigate to the TIGER page, explore, and download the 2018 us state border shapefile.

Unzip the files into a subdirectory of your data directory.

Reading Shapefiles with rgdal

In the sp spatial data paradigm, the main package for reading vector data is rgdal. The main function for reading shapefiles and other vector-format spatial data is readOGR().

For example, I can use the following code to read the 2018 US state border data (which I’ve saved in a subdirectory called “data”):

require(rgdal)
states = rgdal::readOGR(
  dsn = here("data", "tl_2018_us_state"))
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\tl_2018_us_state", layer: "tl_2018_us_state"
## with 56 features
## It has 14 fields
## Integer64 fields read as strings:  ALAND AWATER

To verify, I can plot it:

par(mar = c(0, 0,0, 0))
plot(states)

Note that it looks like the data are in a GCS, and the data set contains not only the 50 states, but several other entities.

It may be useful, if perhaps tedious, to use our data frame subsetting skills to keep only the state borders of interest.

“Shapefiles”

As you may know, “shapefiles” are but one of many file formats in which vector data may be stored.

Shapefiles aren’t actually a single file, but rather a collection of individual files that have similar filenames. You should use your computer’s file browser to look in the directory where you stored your US state border data to verify this.

A few things to note about reading shapefiles with readOGR():

  • To read the state border data, I supplied the (relative) path to the directory that contains the collection of files. - I didn’t include the filename of the file with the .shp extension within the directory.
  • If your shapefiles all have the same base filenames as their containing directory, you only need to supply the path to the directory.
    • It’s a best practice to use this naming scheme, when possible.
    • If your shapefiles names don’t match the directory or if you have multiple sets of shapefiles in the same directory, you might have to use more complicated syntax, including the layer argument.
    • It’s a best practice to keep only one set of shapefile files per directory.

Spatial Polygons DataFrames

A fundamental vector spatial data structure in R is the SpatialPolygonsDataFrame.

This data structure associates vector data entities with rows in a 2-dimensional table.

  • The vector element associated with a row can be a single polygon, or a set of polygons (multipolygon type).
  • The 2-dimensional table behaves like R’s familiar data.frame data structure.
    • An advantage of this behavior is that we can use all of the familiar subsetting tools for data.frame objects.
    • This table is analogous to an Attribute Table in ArcGIS.

For the other vector data types (lines and points) there are corresponding Spatial_____DataFrame data structures.

When you use the readOGR() function to read a polygon shapefile (or vector data in another format), it reads the data into a SpatialPolygonsDataFrame.

Subsetting a SPDF

Later in this lab, you’ll be working with the borders of Yellowstone National Park, which overlaps three states: Idaho, Montana, and Wyoming.

To subset these three states, I can use the subset() function along with a logical test to retrieve only the three states of interest.

First, note that the 2-letter state abbreviations are stored in the STUSPS column (for the TIGER data, you may need to use a different column if you use state border from a different source):

states$STUSPS
##  [1] "WV" "FL" "IL" "MN" "MD" "RI" "ID" "NH" "NC" "VT" "CT" "DE" "NM" "CA" "NJ"
## [16] "WI" "OR" "NE" "PA" "WA" "LA" "GA" "AL" "UT" "OH" "TX" "CO" "SC" "OK" "TN"
## [31] "WY" "HI" "ND" "KY" "VI" "MP" "GU" "ME" "NY" "NV" "AK" "AS" "MI" "AR" "MS"
## [46] "MO" "MT" "KS" "IN" "PR" "SD" "MA" "VA" "DC" "IA" "AZ"

Next, perform the subsetting:

yellowstone_states = 
  subset(states, STUSPS %in% c("ID", "MT", "WY"))
par(mar = c(0, 0, 3, 0))
plot(yellowstone_states, main = "Look at this GCS!")

Yellowstone Vector Data 1: Reading Data

Data Sources

To complete this exercise, you’ll need the following vector data sets:

  • Yellowstone National Park boundary
  • Wyoming county borders
  • State borders for Idaho, Montana, and Wyoming.

I found the Wyoming Geospatial Hub website useful, as well as the United States Geographic Survey and the US Census bureau for locating spatial data sets. If you find useful data sources, you should consider adding them to the course Wiki in Moodle!

Packages

This exercise is a chance to practice basic spatial operations on vector data in R.

The packages you’ll need are: rgdal, raster, and rgeos.

The here package will also come in handy for locating your data files.

  • rgdal contains functions for reading vector data into R
  • raster contains lots of utility functions that work on both raster and vector data.
  • rgeos contains functions to perform common spatial operations like unions, intersections, buffers, etc.

Note that the package sp is automatically loaded when when you load these packages, so you must have it installed as well.

You can use library() or require() to load a package (I prefer require())

Functions

Among the functions you’ll need are:

  • here() to help locate your data files relative to your project directory.
  • readOGR() for reading vector spatial data sets.
  • projection() (from the raster package) and proj4string() (from sp)
  • spTransform() For performing transformations between coordinate systems, a.k.a projecting and reprojecting.

Reading The Yellowstone Data

After you find your vector data, you need to get it into R!

Using a combination of here() and readOGR() I was able to read in the Yellowstone border data as:

require(rgdal)
require(here)
yellowstone = readOGR(
  here("data", "ab67")
)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\ab67", layer: "ab67"
## with 1 features
## It has 13 fields
## Integer64 fields read as strings:  AB67_ AB67_ID

Some things to note:

  • I nested my call to here() inside my call to readOGR.
  • The Yellowstone data directory has a terrible name: “ab67”.
  • This name gives us no clue as to what the data files are! It’s not self-documenting.
  • Depending on where you obtained your data, your file/directory name will probably be different.
  • That’s OK! As long as you save it to a variable called yellowstone, you’ll be able to follow along with the walkthrough.

Inspect Yellowstone

Let’s examine the yellowstone object. We’d like to know what type of object it is; we can use the class() function for that:

class(yellowstone)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

It’s our friend the SpatialPolygonsDataFrame.

What’s it’s coordinate system?

  • We have a couple of options for finding out:
# Using proj4string from the sp package
proj4string(yellowstone)

# Using projection from the raster package
projection(yellowstone)
## [1] "+proj=lcc +lat_0=44.25 +lon_0=-109.5 +lat_1=49 +lat_2=45 +x_0=600000 +y_0=0 +datum=NAD83 +units=m +no_defs"
## [1] "+proj=lcc +lat_0=44.25 +lon_0=-109.5 +lat_1=49 +lat_2=45 +x_0=600000 +y_0=0 +datum=NAD83 +units=m +no_defs"

You should always inspect the coordinate system when you load a spatial dataset.

Unlike Arc, R will not reproject on the fly. Spatial data sets must be in a common coordinate system for plotting and spatial operations to work.

Plotting Yellowstone

Now, let’s create a simple plot:

par(mar = c(0, 0, 0, 0))
plot(yellowstone)

To make our plot slightly nicer, we’ll define a custom color to fill the polygon. I’ve taken R’s “steelblue” color and modified it with adjustcolor() to have an alpha value (opacity) of 0.4 (code not shown). Check out the help for adjustcolor(), it’s fairly easy to understand.

par(mar = c(0, 0, 0, 0))
# make sure you have defined the variable yst_col
plot(yellowstone, col = yst_col)

State And County Data

You can use the above code as a template to read your US States and Wyoming counties data into R.

If you use variable names wy_counties for Wyoming counties and yellowstone_states for the Yellowstone state borders, you’ll be able to follow along with my code.

Review the materials from earlier in the lab if you need a hint.

A Common Coordinate System

Now that you’ve got all of your data, and you can import into R, you’ll need to make sure that all of your spatial data objects are in the same coordinate system.

I decided to use the native coordinate system of the Yellowstone borders file.

Once you decide on a coordinate system to use, it can be convenient to do transformations when you first read in your data.

For example, I read my states data using the following code:

states = spTransform(
  readOGR(
    here("data", "tl_2018_us_state")
  ),
  proj4string(yellowstone))
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\tl_2018_us_state", layer: "tl_2018_us_state"
## with 56 features
## It has 14 fields
## Integer64 fields read as strings:  ALAND AWATER
## Warning in proj4string(yellowstone): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html

Notes:

  • spTransform() expects a Spatial object as its first argument and a coordinate reference system (which can be specified in several ways) as the second argument. You should check out the help entry for more info.
  • I could have also used projection(yellowstone).
  • I’ll have to do some subsetting to retrieve the states of interest (Idaho, Wyoming, and Montana).

You should repeat this process with the Wyoming counties borders data.

Plotting States and Yellowstone 1

Now that you’ve got your data into R and all in the same coordinate system, you’re ready to do some plotting and spatial operations!

First, a reality check. Let’s plot the Yellowstone states, then add the Yellowstone borders.

par(mar = c(0, 0, 0, 0))
plot(yellowstone_states)
plot(yellowstone, col = yst_col, add = TRUE)

Not too bad!

Now, suppose you wanted to make the states more colorful.
You can use the col = argument to specify fill colors for polygons. Since there are three of them, your col argument needs to be a vector of three colors, which can be specified in several ways.

Here’s an ugly example I created using col = 2:4 (code not shown):

I’m sure you can find some better colors to use!

Yellowstone Vector Data 2: Spatial Operations

Finally, we’re ready to do some spatial operations on our data!

Functions

Among the functions you’ll use are (in no particular order):

  • gUnion(): union with dissolve
  • gUnaryUnion(): Dissolve polygons within a SPDF
  • union(): union without dissolve
  • extent(): Coordinates of the bounding box of a SPDF
  • erase(): Erase one SPDF from another
  • gIntersection() intersection with dissolve
  • intersect() intersection without dissolve
  • gBuffer() buffer the edges

The way that coordinate systems are specified in Spatial objects in R is in the process of being updated from “Proj4” to “Proj6”. This is related to the general replacement of the older sp universe of spatial analysis in R with the newer sf paradigm.

The rgeos versions of the union and intersection functions are more sensitive to slight differences in how the coordinate systems are specified and therefore you might get warnings or errors when using gIntersection() or gUnion(). We will deal with these errors if or when they come up in class.

Re-Creating the Slide Figures 1

Review the spatial operations slides near the end of slide deck 1.

Experiment with plotting your states, WY counties, and Yellowstone polygon data.

For example, I can plot all of the Wyoming counties with Yellowstone on top using:

par(mar = c(0, 0, 0, 0))
plot(wy_counties)
plot(yellowstone, add = T, col = yst_col)

Challenge: try updating this plot to zoom in on the region surrounding the park. There are numerous ways you might approach this problem, one possible method:

  1. Buffer the park by a desired amount
  2. Calculate the coordinates of the bounding box of the buffered park
  3. Use these coordinates as the xlim and ylim arguments to plot.
# Buffer by 20 kilometers
yst_buf_20k = gBuffer(yellowstone, width = 20000)
# Get the coordinates of the bounding box
yst_buf_20k_ext = extent(yst_buf_20k)
# Show the coordinates
yst_buf_20k_ext
## class      : Extent 
## xmin       : 449724.6 
## xmax       : 593531 
## ymin       : -32703.64 
## ymax       : 116362.2

Here’s how my plot looks with the new extent:

par(mar = c(0, 0, 0, 0))
plot(
  wy_counties,
  xlim = c(yst_buf_20k_ext[1], yst_buf_20k_ext[2]),
  ylim = c(yst_buf_20k_ext[3], yst_buf_20k_ext[4]))
plot(yellowstone, add = T, col = yst_col)

Re-Creating the Slide Figures 2

Experiment with the spatial operations functions listed above and see if you can re-create my lecture slide figures.

Here are a couple of examples you can use as templates to get started:

Dissolving polygons within a SPDF

par(mar = c(0, 0, 0, 0))
plot(gUnaryUnion(yellowstone_states))

Erase without dissolve

par(mar = c(0, 0, 0, 0))
plot(erase(yellowstone_states, yellowstone), col = 2)

Report Questions

Compile your answers to the following into a document and submit on Moodle.

You may compile your answers using a word processing application like Word or Google Docs, but export your final product as a .pdf for submission.

You may also do your work in an R Markdown document. For your submission, you should knit your document into an html file.

Q1: New England States

Using the template code I provided above for subsetting the Yellowstone states, make a new subset that consists of the six New England states: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont.

  • Q1 (3 pts.): Include a plot of the 6 New England states in your report.

Q2-3: Unions

  • Q2 (2 pts.): Include a map showing a union operation without dissolve.
  • Q3 (2 pts.): Include a map showing the same union operation with the remaining polygons dissolved.

Q4-5: Intersections

  • Q4 (2 pts.): Include a map showing an intersection operation without dissolve.
  • Q5 (2 pts.): Include a map showing the same intersection operation with the remaining polygons dissolved.

Q6: Projections

Choose one of the maps you’ve created for the intersection or union questions.

Use your internet sleuthing skills to locate the proj4string entry for the Massachusetts State Plane projection.

Re-project your Yellowstone states SPDF into the Massachusetts State Plane projection and plot it.

It should look kind of weird:

  • Q6 (2 pts.): Include a map showing an the three Yellowstone states reprojected into the MA state plane projection.