Concepts

In this lab, you’ll get some guided practice building and diagnosing spatially-aware regression models.

The data

You’ll need the data from the Fletcher and Fortin text for chapter 6.

You’ll also need:

  • Chapter 6 verified code and updated icorrelogram code.
    • Download icorrelogram_updated.R and place it in your Fletcher_Fortin-2018-Supporting_Files directory.
    • Both source scripts are on Moodle.
  • NY District 8 shapefiles (on GitHub, filename is NY8_utm18.zip)

F + F Walkthrough

Work through the book code sections:

  • 6.3.3
  • 6.3.4.1
  • 6.3.4.2
  • 6.3.4.3
  • 6.3.4.4 (optional)
  • 6.3.4.5
  • 6.3.4.6 (optional)

You will need to use the chapter 6 verified code (which you can find in Moodle).

There are several bits of code which will not work as printed in the book - you can find corrected versions in the verified code in Moodle.

  • Watch out for code errors in the data import section, there are several issues with the code used to import the elevation raster. When in doubt, check the verified code.

Questions

Think about these questions as you work through the code:

  • Why was the quantity 1.96 used with the standard error to calculate the upper and lower confidence levels for the elevation2 model?
  • Which of the candidate models would you choose, and why?
  • For the spatially-aware models: which part of the model is accounting for the autocorrelation?
    • Think about the model structure: response, covariates, errors.

Plotting a raster with ggplot()

The procedure for plotting a raster with ggplot() is a little tedious, but the results look nice.

To plot the elevation raster, I can use the following:

require(sp)
require(ggplot2)

# You can use $ to subset a brick with named layers
elev_spdf = rasterToPoints(layers$elev)

# rasterToPoints returns a matrix, but ggplot()
# expects a data.frame
elev_df = data.frame(elev_spdf)
head(elev_df)
##       x      y  elev
## 1 48465 580800 0.000
## 2 48665 580800 0.000
## 3 48865 580800 0.000
## 4 49065 580800 0.000
## 5 49265 580800 0.000
## 6 49465 580800 1.777
# The geom_raster() method does the work by 'tiling' squares.
# It can be a little bit slow.
ggplot(elev_df) +
  geom_raster(aes(x = x, y = y, fill = elev))

It doesn’t look great because ggplot() doesn’t know it is spatial!

Data Frame to sf object

If we first coerce the varied thrush data to a sf object, we can then use it to trick ggplot() into plotting the way we want:

require(sf)

head(point.data)
##   SURVEYID  TRANSECT POINT VATH  EASTING NORTHING  elev        slope   aspect
## 1        1 452511619     3    0 59332.20 173289.0 0.763 0.0004484913 1.925163
## 2        2 452511619     5    0 59142.22 173151.8 0.867 0.0004273647 1.938810
## 3        3 452511619     6    0 58834.36 173185.7 0.949 0.0003995906 1.603648
## 4        4 452511619     9    0 58754.24 172876.0 1.140 0.0004042006 1.449894
## 5        5 452511625     1    0 59037.42 181450.2 1.174 0.0001162163 5.954632
## 6        6 452511625     4    0 59336.71 181389.1 1.209 0.0002622559 6.027802
##        elevs     slopes    aspects
## 1 -1.5389891  1.8025487 -0.8591999
## 2 -1.2475156  1.6372378 -0.8510115
## 3 -1.0176998  1.4199119 -1.0521146
## 4 -0.4823973  1.4559846 -1.1443703
## 5 -0.3871077 -0.7974223  1.5585550
## 6 -0.2890157  0.3453015  1.6024583
point_sf = st_as_sf(point.data, coords = c("EASTING", "NORTHING"))

# We happen to know the coordinate systems are the same
st_crs(point_sf) = crs(layers)

ggplot() +
  geom_raster(data = elev_df, mapping = aes(x = x, y = y, fill = elev)) +
  geom_sf(data = point_sf)

Now, I can change the color gradient to make it easier to see the contrasts

ggplot() +
  geom_raster(data = elev_df, mapping = aes(x = x, y = y, fill = elev)) +
  geom_sf(data = point_sf) +
  scale_fill_viridis_c()

Viridis Colors

The viridis color schemes are designed to be accessible for colorblindness. They also display well in grayscale.

Syracuse Leukemia

Review the lecture slides (Deck 5c) on the example spatial regression of Leukemia incidence in the Syracuse, NY area.

Recreate the models

Re-create the code for the 4 models in the slides:

  • Aspatial lm() model
  • Eigenvector Filter model
  • SAR
  • Lag

Add pollution covariate

Once you have working models, modify them to add an additional predictor: the pollution exposure. It’s in column PEXPOSURE.

Model comparison

Copy the code for the rmse() function from the lecture slides (deck 5c) and run it on each of your Syracuse leukemia models.

  • Which model performed the best?
  • Consider model complexity, residual spatial autocorrelation, and model fit (as measured by RMSE).

Model of District 8

Choose your favorite leukemia model and modify it to include all of the District 8 data (not just the Syracuse City region).

Report

Questions

Modeling Questions

  • Q1 (2 pts.): Briefly define the term standard error and explain why we multiplied this quantity by 1.96 to derive upper and lower confidence envelopes.
  • Q2 (1 pt.): report the RMSE for each of the varied thrush models.
  • Q3 (3 pts.): Given the RMSE, residual spatial autocorrelation, and model complexity, which varied thrush model would you choose and why?
  • Q4 (1 pt.): Print the model summaries for each of your Syracuse leukemia models.
  • Q5 (1 pt.): Report the RMSE for all four models
  • Q6 (3 pts.): Which of the four models would you choose, and why?
  • Q7 (2 pts.): According to your model, do you conclude that there is a relationship between pollution exposure and leukemia incidence?
  • Q8 (2 pts.): Show the model summary for your District 8 leukemia model.
  • Q9 (2 pts.): Do your conclusions change when you model the entire district? Are your results regarding pollution exposure surprising?
  • Q10 (2 pts.): Create a 4-panel plot showing histograms of the model residuals for each of the Syracuse leukemia models.
  • Q11 (2 pts.): Do the residuals seem normally-distributed? What are the implications for our models?

Plotting exercises

  • Q12 (2 pts.): Using the above code as a template, plot the slope layer of the raster brick. Your plot must:
    • Have the correct aspect ratio and be in the right projection (check out the sf plotting trick above).
    • Show the points of thrush observation locations.
    • Use a different color gradient than viridis c.
    • Not use the default color gradient.
  • Q13 (2 pts.): Create the same plot as in the previous question, but this time do not show the bird observation locations.
    • There are at least two ways to do this that require minimal code modification - If you’re making many changes to your code, you’re probably overthinking it!