In this lab, you’ll get some guided practice building and diagnosing spatially-aware regression models.
You’ll need the data from the Fletcher and Fortin text for chapter 6.
You’ll also need:
icorrelogram_updated.R
and place it in your Fletcher_Fortin-2018-Supporting_Files
directory.NY8_utm18.zip
)Work through the book code sections:
You will need to use the chapter 6 verified code (which you can find in Moodle).
There are several bits of code which will not work as printed in the book - you can find corrected versions in the verified code in Moodle.
Think about these questions as you work through the code:
ggplot()
The procedure for plotting a raster with ggplot()
is a little tedious, but the results look nice.
To plot the elevation raster, I can use the following:
require(sp)
require(ggplot2)
# You can use $ to subset a brick with named layers
elev_spdf = rasterToPoints(layers$elev)
# rasterToPoints returns a matrix, but ggplot()
# expects a data.frame
elev_df = data.frame(elev_spdf)
head(elev_df)
## x y elev
## 1 48465 580800 0.000
## 2 48665 580800 0.000
## 3 48865 580800 0.000
## 4 49065 580800 0.000
## 5 49265 580800 0.000
## 6 49465 580800 1.777
# The geom_raster() method does the work by 'tiling' squares.
# It can be a little bit slow.
ggplot(elev_df) +
geom_raster(aes(x = x, y = y, fill = elev))
It doesn’t look great because ggplot()
doesn’t know it is spatial!
sf
objectIf we first coerce the varied thrush data to a sf object, we can then use it to trick ggplot()
into plotting the way we want:
require(sf)
head(point.data)
## SURVEYID TRANSECT POINT VATH EASTING NORTHING elev slope aspect
## 1 1 452511619 3 0 59332.20 173289.0 0.763 0.0004484913 1.925163
## 2 2 452511619 5 0 59142.22 173151.8 0.867 0.0004273647 1.938810
## 3 3 452511619 6 0 58834.36 173185.7 0.949 0.0003995906 1.603648
## 4 4 452511619 9 0 58754.24 172876.0 1.140 0.0004042006 1.449894
## 5 5 452511625 1 0 59037.42 181450.2 1.174 0.0001162163 5.954632
## 6 6 452511625 4 0 59336.71 181389.1 1.209 0.0002622559 6.027802
## elevs slopes aspects
## 1 -1.5389891 1.8025487 -0.8591999
## 2 -1.2475156 1.6372378 -0.8510115
## 3 -1.0176998 1.4199119 -1.0521146
## 4 -0.4823973 1.4559846 -1.1443703
## 5 -0.3871077 -0.7974223 1.5585550
## 6 -0.2890157 0.3453015 1.6024583
point_sf = st_as_sf(point.data, coords = c("EASTING", "NORTHING"))
# We happen to know the coordinate systems are the same
st_crs(point_sf) = crs(layers)
ggplot() +
geom_raster(data = elev_df, mapping = aes(x = x, y = y, fill = elev)) +
geom_sf(data = point_sf)
Now, I can change the color gradient to make it easier to see the contrasts
ggplot() +
geom_raster(data = elev_df, mapping = aes(x = x, y = y, fill = elev)) +
geom_sf(data = point_sf) +
scale_fill_viridis_c()
The viridis color schemes are designed to be accessible for colorblindness. They also display well in grayscale.
Review the lecture slides (Deck 5c) on the example spatial regression of Leukemia incidence in the Syracuse, NY area.
Re-create the code for the 4 models in the slides:
Once you have working models, modify them to add an additional predictor: the pollution exposure. It’s in column PEXPOSURE
.
Copy the code for the rmse() function from the lecture slides (deck 5c) and run it on each of your Syracuse leukemia models.
Choose your favorite leukemia model and modify it to include all of the District 8 data (not just the Syracuse City region).