In this lab you will work through an example of kriging, in the process you’ll practice:
ggplot()
and sf
.gstat
.You should work through the example from the slides using the CA ozone data. You will then apply the same procedure to the Air Quality Index data (don’t worry, it’s just a different column in the data frame!)
You’ll be working with the same data set that I used in the lecture slides kriging example (deck 4b).
You’ll need the CA_ozone_2017.GPKG
file and the county borders for California. Links to both files are available in the Data Files section of the Course Materials part of the course GitHub page.
As always, save yourself some headaches when you first import the data and make sure that your two spatial data sets are in a common projection.
You should use the projection of the CA Counties.
I encourage you to re-type the code from the lecture notes in order to practice, however you can use the reference code below for guidance as needed.
require(rgdal)
ca_cnty = readOGR(here("data", "ca_counties.GPKG"))
## OGR data source with driver: GPKG
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\ca_counties.GPKG", layer: "ca_counties"
## with 58 features
## It has 17 fields
ca_ozone = spTransform(readOGR(here("data", "CA_ozone_2017.GPKG")), proj4string(ca_cnty))
## OGR data source with driver: GPKG
## Source: "C:\Users\michaelnelso\git\spatial_data_r\data\CA_ozone_2017.GPKG", layer: "ozone"
## with 80 features
## It has 3 fields
# This saves some headaches later:
# It copies the all of the coordinate reference info, including comments.
crs(ca_ozone) = crs(ca_cnty)
First make sf
objects from the sp
object:
ca_sf = st_as_sf(ca_cnty)
oz_sf = st_as_sf(ca_ozone)
Note the use of the color aesthetic. I set it equal to ‘ozone’ so that it would scale the colors to that column.
ggplot() +
geom_sf(data = ca_sf, lwd = 0.1, fill = gray(0.5, 0.5)) +
geom_sf(data = oz_sf, mapping = aes(colour = ozone), cex = 2) +
scale_color_gradientn(colours = heat.colors(10), name = "Ozone") +
theme_minimal() +
ggtitle("Annual ozone levels: 2017")
Things to note (and tinker with) in the ggplot code:
gray(0.5, 0.5)
.# Create a template raster
temp_rast = raster(
ca_cnty, nrow = 200, ncol = 180)
# Use the crs() trick to avoid any potential headaches
crs(temp_rast) = crs(ca_cnty)
# Convert the raster to a SpatialPointsDataFrame
temp_grid_sp = as(temp_rast, "SpatialPoints")
# Crop to the outline of California
temp_grid_sp = crop(temp_grid_sp, ca_cnty)
# Create gstat object
oz_gs = gstat(
formula = ozone ~ 1,
locations = ca_ozone)
# Build empirical variogram:
vgm_emp = variogram(oz_gs)
plot(vgm_emp, pch = 16, cex = 1.2)
# Parameterize initial spherical and exponential variograms.
# These contain the starting parameters for optimizations.
vgm_mod_exp = vgm(
model = "Exp",
nugget = 1e-5,
range = 2e5)
vgm_mod_sph = vgm(
model = "Sph",
nugget = 1e-5,
range = 2e5)
# Use fit.variogram() to optimize the variogram parameters...
vgm_fit_sph = fit.variogram(
vgm_emp, vgm_mod_sph)
vgm_fit_exp = fit.variogram(
vgm_emp, vgm_mod_exp)
# and plot them!
par(mfrow = c(2, 1))
plot(vgm_emp, vgm_fit_exp, main = "Exponential Variogram")
plot(vgm_emp, vgm_fit_sph, main = "Spherical Variogram")
Here’s a way to plot variograms in ggplot. Attribution is the answer from user aosmith on this stackoverflow entry.
preds_exp = variogramLine(
vgm_fit_exp,
maxdist = max(vgm_emp$dist),
n = 200)
preds_sph = variogramLine(
vgm_fit_sph,
maxdist = max(vgm_emp$dist),
n = 200)
g_exp = ggplot(vgm_emp, aes(x = dist, y = gamma)) +
geom_point() +
geom_line(data = preds_exp) +
ggtitle("Mike's awesome exponential variogram", "feat. ggplot2")
g_sph = ggplot(vgm_emp, aes(x = dist, y = gamma)) +
geom_point() +
geom_line(data = preds_sph) +
ggtitle("Mike's awesome spherical variogram", "feat. ggplot2")
g_exp
g_sph
# For arranging ggplot objects into a multi-panel figure:
require(cowplot)
plot_grid(g_exp, g_sph, nrow = 2)
Once everything is set up, kriging is easy:
oz_krig_exp = gstat::krige(
ozone ~ 1,
locations = ca_ozone,
newdata = temp_grid_sp,
model = vgm_fit_exp)
oz_krig_sph = gstat::krige(
ozone ~ 1,
locations = ca_ozone,
newdata = temp_grid_sp,
model = vgm_fit_sph)
The ggplot() code for plotting the interpolation is:
NOTE: this is slightly streamlined from the lecture note code
# CA county borders:
g_ca = geom_sf(
data = ca_sf,
lwd = 0.1, fill = "transparent")
# Fill gradients for the interpolated values and the sample variance
# Modify the name and limits for AQI
g_fill_oz_pred = scale_fill_gradientn(
colours = heat.colors(10),
name = "Ozone", limits = c(0, 0.05))
# Modify the name and limits for AQI
g_fill_oz_var = scale_fill_gradientn(
colours = rainbow(10),
name = "Sample\nVariance",
limits = c(0, 5e-5))
# For plotting the raster layers for interpolated values and sample variances:
g_r_pred = geom_raster(mapping = aes(x = x, y = y, fill = var1.pred))
g_r_var = geom_raster(mapping = aes(x = x, y = y, fill = var1.var))
# Ggplot objects
g_k_exp_oz = ggplot(as.data.frame(oz_krig_exp)) +
g_r_pred + g_fill_oz_pred + g_ca +
ggtitle("Kriged Ozone Values", "Exponential Variogram")
g_k_exp_var = ggplot(as.data.frame(oz_krig_exp)) +
g_r_var + g_fill_oz_var + g_ca +
ggtitle("Sample Variance", "Exponential Variogram")
g_k_sph_oz = ggplot(as.data.frame(oz_krig_sph)) +
g_r_pred + g_fill_oz_pred + g_ca +
ggtitle("Kriged Ozone Values", "Spherical Variogram")
g_k_sph_var = ggplot(as.data.frame(oz_krig_sph)) +
g_r_var + g_fill_oz_var + g_ca +
ggtitle("Sample Variance", "Spherical Variogram")
I used plot_grid()
to plot all the interpolations at once:
cowplot::plot_grid(
g_k_sph_oz, g_k_sph_var, g_k_exp_oz, g_k_exp_var,
nrow = 2)
You should follow along with the code for kriging the ozone pollution data. Now, modify it to work with the air quality index (AQI) data.
Here are some plots of my implementation. You may use these to test your work.
Here’s how my basic map came out:
My AQI variograms looked like:
My Kriged interpolation models looked like:
## [using ordinary kriging]
## [using ordinary kriging]
Above we worked with variograms and Kriging. Recall that these are part of a geostatistical approach to spatial analysis.
We can also examine the dataset from a spatial statistics point of view using Moran’s I and correlograms.
We know that Moran’s I is a spatial version of a Pearson correlation coefficient for pairs of points within a distance class, d, and that a correlogram is a plot of Moran’s I against d.
Before working on the CA Ozone data, you should work your way through the book sections 5.3.1 to 5.3.3. You should use the verified book code for chapter 5 in Moodle as a reference.
The authors provide code to plot the cactus matrix data using base R.
You may also want to practice plotting sf objects with ggplot using the cactus data.
Here is some code I used to convert the matrix data into a sf
object for easy plotting
require(sf)
require(ggplot2)
cactus_matrix = read.csv(file.path(data_dir, "cactus_matrix.csv"))
cactus_coords <- cbind(cactus_matrix$x, cactus_matrix$y)
# note the names of the x- and y-coordinate columns
head(cactus_matrix, 3)
## x y Height
## 1 0 0 35
## 2 0 2 65
## 3 0 4 75
# use the st_as_sf() function to convert a matrix with into an sf object.
cactus_matrix_sf =
st_as_sf(
cactus_matrix,
# provide the names of the x- and y-coordinate columns:
coords = c("x", "y"))
# Note there's no coordinate reference (see the book's comment on this).
# st_as_sf() knew to use the Height column as the data corresponding
# to the coordinates in the x and y columns
head(cactus_matrix_sf)
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 0 ymax: 10
## CRS: NA
## Height geometry
## 1 35 POINT (0 0)
## 2 65 POINT (0 2)
## 3 75 POINT (0 4)
## 4 60 POINT (0 6)
## 5 80 POINT (0 8)
## 6 20 POINT (0 10)
Now I can plot using ggplot()
and geom_sf()
. See if you can re-create a plot similar to mine before you peek at my code.
gg_cactus = ggplot(cactus_matrix_sf) +
geom_sf(aes(colour = Height), cex = 5) +
# I used the topo.colors color gradient:
scale_colour_gradientn(colours = topo.colors(10)) +
ggtitle("Mike's plot of cactus height matrix")
gg_cactus
spdep
For reference, here’s my adaptation of the book code to build and correlogram using spdep
:
require(spdep)
distmat = as.matrix(dist(cactus_matrix[, 1:2]))
maxdist = (2/3) * max(distmat)
# Spdep
correlog.sp <- data.frame(
dist=seq(5, maxdist, by=5),
Morans.i=NA,
Null.lcl=NA,
Null.ucl=NA,
Pvalue=NA)
#inspect (it should be empty)
head(correlog.sp)
## dist Morans.i Null.lcl Null.ucl Pvalue
## 1 5 NA NA NA NA
## 2 10 NA NA NA NA
## 3 15 NA NA NA NA
## 4 20 NA NA NA NA
## 5 25 NA NA NA NA
## 6 30 NA NA NA NA
#then do a for loop to calculate Moran's I for lag distances
for (i in 1:nrow(correlog.sp))
{
d.start <- correlog.sp[i,"dist"]-5
d.end <- correlog.sp[i,"dist"]
neigh <- dnearneigh(
x=cactus_coords,
d1=d.start,
d.end,
longlat=F)
wts <- nb2listw(
neighbours=neigh,
style='W',
zero.policy=T)
mor.i <- moran.mc(
x=cactus_matrix$Height,
listw=wts,
nsim=99,
alternative="greater",
zero.policy=T)
#summarize results from spdep
correlog.sp[i, "dist"] <- (d.end+d.start)/2
#mean dist
correlog.sp[i, "Morans.i"] <- mor.i$statistic
#observed Moran's I
correlog.sp[i, "Null.lcl"] <-
quantile(
mor.i$res,
probs = 0.025,
na.rm = TRUE) #lower null envelope
correlog.sp[i, "Null.ucl"] <-
quantile(
mor.i$res,
probs = 0.975,
na.rm = TRUE) #upper null envelope
correlog.sp[i, "Pvalue"] <- mor.i$p.value #p-value for Moran's I at that distance category
}
#plot
plot(
y=correlog.sp$Morans.i,
x=correlog.sp$dist,
xlab="Lag Distance(m)",
ylab="Moran's I",
ylim=c(-0.3,0.3))
#ylim provides limit on y-axis between -1 and 1
abline(h=0) #0 reference
lines(correlog.sp$dist, correlog.sp$Null.lcl,col = "red") #add the null lcl to the plot
lines(correlog.sp$dist, correlog.sp$Null.ucl,col = "red") #add the null ucl to the plot
And a ggplot version of the correlogram:
ggplot(data.frame(correlog.sp)) +
geom_smooth(aes(x = dist, y = Morans.i), se = FALSE) +
geom_point(aes(x = dist, y = Morans.i)) +
geom_ribbon(aes(
x = dist,
ymin = Null.lcl,
ymax = Null.ucl),
fill = adjustcolor("steelblue", 0.2)) +
ggtitle("Mike's spdep correlogram", "cactus vegetation matrix")
A natural question we could ask with the California ozone dataset is whether the Air Quality Index is a good predictor for ozone pollution.
Since SpatialPointsDataFrame objects can be used just like a data.frame in many circumstances, we can build a simple linear model:
fit_1 = lm(ozone ~ AQI, data = ca_ozone)
summary(fit_1)
##
## Call:
## lm(formula = ozone ~ AQI, data = ca_ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0092823 -0.0019994 0.0000795 0.0019795 0.0117655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.492e-02 1.877e-03 7.946 1.20e-11 ***
## AQI 3.969e-04 3.878e-05 10.235 4.46e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.003988 on 78 degrees of freedom
## Multiple R-squared: 0.5732, Adjusted R-squared: 0.5677
## F-statistic: 104.8 on 1 and 78 DF, p-value: 4.461e-16
plot(fit_1, which = 1)
plot(fit_1, which = 2)
There may be some issues with this model, but for the moment we are going to proceed anyway
Whenever we build a regression model, we need to check whether spatial autocorrelation may be a problem. We might encounter spatial autocorrelation in the:
In the regression world, we’re often most concerned about spatial autocorrelation in the model residuals.
Using the code you’ve written above for inspiration, create and plot a variogram of the model residuals.
To make things easier, remember that you can extract a vector of model residuals using the residuals()
function. Since a SPDF behaves like a data.frame
, you can easily add a column of model residuals to your ca_ozone
object.
ca_ozone$resids = residuals(fit_1)
My spherical variogram (fit with gstat) looked like:
The variogram gives us lots of info, but it doesn’t really tell us about the significance of spatial autocorrelation.
To characterize autocorrelation, and significance, let’s use a spatial statistics approach with Moran’s I and correlograms.
The spdep
provided a nice way to calculate neighborhoods and Moran’s I statistics.
First you should use the code above as a template to create a neighborhood and test for significant spatial autocorrelation in the ozone residuals.
Here’s some hint code to get started
# Calculate the maximum distance for ozone points in CA
distmat_ca = dist(as.matrix(coordinates(ca_ozone)))
# Since CA is an elongated shape, we'll use 1/2 the max distance
maxdist_ca = 0.5 * max(distmat_ca)
# What's the minimum distance?
# This will be helpful for neighborhood building.
mindist_ca = min(distmat_ca)
# make a sequence of 10 distance classes between the min and max distances:
n_dist_class = 10
ca_nb_dists = seq(mindist_ca, maxdist_ca, length.out = n_dist_class)
# dnearneigh() accepts a sp or sf object as its x argument
ca_nbh_1 = dnearneigh(
x = ca_ozone,
#d1 is minimum distance, d2 is max distance
d1 = ca_nb_dists[1],
d2 = ca_nb_dists[2],
longlat=F)
# Make a weights object
wts_ca = nb2listw(
neighbours=ca_nbh_1,
#W = row-standardized weights
style='W',
zero.policy=T)
#Moran's I test with normal approximation versus Monte Carlo permutation test
#Monte Carlo
mor_mc_ca = moran.mc(
ca_ozone$resids,
listw =wts_ca,
nsim=999,
zero.policy=T)
#normal approximation
mor_norm_ca = moran.test(
ca_ozone$resids,
listw =wts_ca,
randomisation=F,
zero.policy=T)
#inspect
mor_mc_ca
##
## Monte-Carlo simulation of Moran I
##
## data: ca_ozone$resids
## weights: wts_ca
## number of simulations + 1: 1000
##
## statistic = 0.17122, observed rank = 992, p-value = 0.008
## alternative hypothesis: greater
mor_norm_ca
##
## Moran I test under normality
##
## data: ca_ozone$resids
## weights: wts_ca n reduced by no-neighbour observations
##
##
## Moran I statistic standard deviate = 2.3351, p-value = 0.009769
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.171224571 -0.013698630 0.006271484
Do you conclude there’s significant spatial autocorrelation?
We can modify the book code to crate a correlogram. It will take a bit of coding effort, so here’s a code template you can fill in.
The template shows how to make a correlogram for AQI, you need to modify it to create correlograms for ozone and the model residuals.
# make a sequence of 10 distance classes between the min and max distances:
n_dist_class = 10
ca_nb_dists = seq(mindist_ca, maxdist_ca, length.out = n_dist_class)
# Create data frames for storing output
ca_correlog_aqi =
data.frame(
dist=ca_nb_dists[-n_dist_class],
Morans.i=NA,
Null.lcl=NA,
Null.ucl=NA,
Pvalue=NA)
# then use a for loop to calculate Moran's I for lag distances
for (i in 1:(length(ca_nb_dists) - 1))
{
d.start = ca_nb_dists[i]
d.end = ca_nb_dists[i + 1]
# dnearneigh() accepts a sp or sf object as its x argument
neigh_i = dnearneigh(x = ca_ozone, d1 = d.start, d2 = d.end)
# Make a weights object
wts_i = nb2listw(
neighbours = neigh_i,
#W = row-standardized weights
style='W',zero.policy=T)
mor.i_aqi <- moran.mc(ca_ozone$AQI, listw =wts_i,nsim=999, zero.policy=T)
#summarize results from spdep
ca_correlog_aqi[i, "Morans.i"] = mor.i_aqi$statistic
ca_correlog_aqi[i, "Null.lcl"] = quantile(mor.i_aqi$res, 0.025, na.rm = TRUE)
ca_correlog_aqi[i, "Null.ucl"] = quantile(mor.i_aqi$res, probs = 0.975, na.rm = TRUE)
ca_correlog_aqi[i, "Pvalue"] = mor.i_aqi$p.value
}
Here are my correlograms, using the ggplot method. You may use base plotting or ggplot for yours.
Repeat the Kriging exercise using the AQI data (rather than the ozone as in the example).
Your variogram should use a Matérn model.
heat.colors()
.Modify the supplied code to create correlograms for ozone and the model residuals. You can use either base or ggplot to create your figures. Use the example code for the cactus correlograms as a starting point.