For this in-class exercise, we’ll practice working with raster data again. Specifically, we’ll learn:
You will need the following datasets, which you can download from the course GitHub site:
temperature.zip
A raster of average annual temperatures for the Contiguous United States (CONUS)wy_countes.zip
A vector data set of Wyoming county borders.Download and unzip them to subdirectories of your data
folder.
temperature = raster(here::here("data", "temperature", "temperature.tif"))
wy_cnty = spTransform(
readOGR(here("data", "wy_counties")),
projection(temperature))
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\wy_counties", layer: "DOR_-_Counties"
## with 23 features
## It has 8 fields
projection(wy_cnty)
## [1] "+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"
As usual, once you´ve loaded your data, you should plot it as a quick reality check to verify projections etc.
plot(temperature)
plot(wy_cnty, add = T)
For convenience, you should crop the temperature raster to the extent of the Wyoming counties dataset.
temp_wy = crop(temperature, wy_cnty)
par(mar = c(0.1, 0.1, 0.1, 1))
plot(temp_wy, axes = F); plot(wy_cnty, add = T)
extract()
functionYou can use the function extract()
to retrieve the values of raster cells that fall within the boundaries of polygons.
We’ll most often use extract()
in conjunction with one raster
and one Spatial*DataFrame
object. It also works with sf
vector data objects. The raster
object will be the first argument, and the Spatial*DataFrame
object the second.
You have several options for the output type of the extract()
function:
mean()
or sd()
) the function will return a list whose elements are vectors of the raw cell values. The order of the list elements will be the same as the order in which the features appear in the Spatial*DataFrame
object.matrix
of values: if you specify an aggregate function, extract()
can output a matrix
vector containing the values of the statistic correspondingsp=TRUE
argument along with an aggregate function to return a version of the Spatial*DataFrame
with an extra column containing the values of the aggreagate statistic.We’ll examine examples of each type of output.
matrix
outputLet’s start with the simplest output: a matrix
with a single column of aggregate statistics.
I’ll calculate the mean of the cells in the temperature raster for each of the Wyoming counties:
wy_counties_temp_mean = raster::extract(temp_wy, wy_cnty, fun = mean)
class(wy_counties_temp_mean)
## [1] "matrix" "array"
Spatial*DataFrame
outputNext, let’s do the same calculation, but append the vector to a column of the original SpatialPolygonsDataFrame
:
wy_counties_temp_spdf = raster::extract(
temp_wy, wy_cnty, fun = mean, sp = TRUE)
class(wy_counties_temp_spdf)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
names(wy_counties_temp_spdf)
## [1] "OBJECTID" "Shape_Leng" "Shape_Area" "COUNTYNAME"
## [5] "YEARCOLLEC" "TAXYEAR" "TYPENAME" "RuleID"
## [9] "temperature"
Note the new data column called temperature
.
You may have noticed that extract()
takes a little time to run. On larger datasets it may take minutes or longer to run.
If you plan to calculate several aggregate statistics, you can speed up your computation time by first extracting the raw values of the raster cells as a list, then use the sapply()
function to quickly calculate the aggregate statistics.
wy_counties_temp_list = raster::extract(
temp_wy, wy_cnty)
sapply(wy_counties_temp_list, mean)
## [1] 6.736080 7.734480 6.217723 5.408657 4.857212 5.281912 8.495762
## [8] 8.077408 6.294061 4.745921 3.028894 3.532247 7.210908 8.021009
## [15] 6.073369 6.377311 7.428445 6.952354 4.832284 6.958162 7.500885
## [22] 1.643057 1.800145
Note that if your raster has missing data in any cells, you’ll need to specify how the aggregate statistic function should handle them.
If you want to ignore them, just supply sapply()
with the na.rm = TRUE
argument (it will pass on the na.rm
argument to the aggregating function via ...
).
sapply(wy_counties_temp_list, mean, na.rm = T)
## [1] 6.736080 7.734480 6.217723 5.408657 4.857212 5.281912 8.495762
## [8] 8.077408 6.294061 4.745921 3.028894 3.532247 7.210908 8.021009
## [15] 6.073369 6.377311 7.428445 6.952354 4.832284 6.958162 7.500885
## [22] 1.643057 1.800145
Alternatively, you could write a custom function that deals with NA values in a user-defined way.
We can use the class()
and str()
functions to examine the output object to verify that it is a list:
class(wy_counties_temp_list)
## [1] "list"
str(wy_counties_temp_list, 1)
## List of 23
## $ : num [1:40637] 5.95 5.99 5.7 6.14 6.12 ...
## $ : num [1:19389] 5.95 5.91 5.96 5.96 6.01 ...
## $ : num [1:20350] 1 1.34 1.37 1.31 1.28 ...
## $ : num [1:76077] 2.68 2.63 2.59 2.57 2.58 ...
## $ : num [1:31147] 3.43 3.46 3.51 3.36 3.36 ...
## $ : num [1:57872] 6.8 6.85 6.83 6.83 6.86 ...
## $ : num [1:16456] 6.9 6.87 6.87 6.9 6.9 ...
## $ : num [1:15526] 8.05 8.01 7.97 7.91 7.91 ...
## $ : num [1:25162] 7.58 7.59 7.56 7.56 7.54 ...
## $ : num [1:70340] -2 -1.49 -1.67 -1.61 -0.27 ...
## $ : num [1:55499] -0.06 0.06 1.08 1.31 1.63 ...
## $ : num [1:30269] 2.76 3.01 2.92 1.69 1.64 ...
## $ : num [1:32265] 7.64 7.64 7.61 7.65 7.7 ...
## $ : num [1:19983] 8.34 8.38 8.38 8.46 8.5 ...
## $ : num [1:15538] 6.26 6.33 6.39 6.41 6.28 ...
## $ : num [1:32649] 7.04 7.13 7.28 7.28 7.28 ...
## $ : num [1:37924] 7.77 7.74 7.76 7.91 7.91 ...
## $ : num [1:22854] 7.13 7.11 7.06 6.94 6.82 ...
## $ : num [1:14860] 4.62 4.75 4.73 4.87 4.76 ...
## $ : num [1:17546] 7.26 7.26 7.25 7.25 7.25 ...
## $ : num [1:18569] 6.51 6.36 6.39 6.3 6.13 ...
## $ : num [1:32910] 2.15 2.15 2.15 2.14 2.14 ...
## $ : num [1:37079] 0.6 0.56 0.5 0.47 0.44 ...
It’s easy to append columns to our original SpatialPolygonsDataFrame
with the output from sapply()
:
# Mean of annual temperature.
wy_cnty$mean_temp = sapply(wy_counties_temp_list, mean, na.rm = TRUE)
# Standard deviation of annual temperature:
wy_cnty$sd_temp = sapply(wy_counties_temp_list, sd, na.rm = TRUE)
spplot()
FunctionWhen we calculate aggregate statistics for polygons, it’s often desirable to create a choropleth map.
The spplot()
is an easy way to create such a map.
Use the zcol
argument to specify which column the choropleth should be based upon:
spplot(
wy_cnty, zcol = "mean_temp")
spplot(wy_cnty, zcol = "sd_temp")
Note that spplot()
does not use base-R plotting methods, so our usualy techniques for multi-panel figures won’t work.
You can check the help entry for more details, and some suggestions for plot customizations.
Let’s practice with some data on Iowa counties.
Navigate to the site for PRISM climate data:
https://prism.oregonstate.edu/
and locate the Normals data page.
Download the 30 year normals for precipitation and average temperature. You should download the .bil versions of the data.
You may choose to download either the annual average rasters, or the average for a specific month.
When you read the raster files, note the file extension.
Navigate to the Iowa Spatial Hub
https://open-iowa.opendata.arcgis.com/
and find the Iowa counties shapefile dataset. Save it to your data subdirectory.
Note that the name of the unzipped directory and the shapefiles don’t match. This means you will have to use slightly more complex syntax to read the files into R.
Specifically you’ll need to use the following arguments to readOGR()
dsn = here("data", "Iowa_County_Boundaries-shp")
layer = "IowaCounties"
You should probably crop your rasters to the extent of Iowa, perhaps you may even want to buffer the Iowa borders.
Make sure you check the units of the coordinate system. The units will inform the amount by which you buffer
Once you’re done, you can plot your data:
Now you can use extract()
to calculate aggregate statistics for the Iowa counties!
Try calculating mean temperature and precipitation values. You can plot histograms as a self-check:
Now try plotting a scatterplot of mean annual temperature and precipitation:
plot(
mean_prec ~ mean_temp, data = ia_cnty@data,
main = "IA Counties\n Mean Annual Precipitation and Temperature",
xlab = "Mean annual temperature", ylab = "Mean annual precipitation"
)