We’re going to get some more experience with rasters, and we’ll see some of the limitations of the raster data format.
First things first, let’s load the raster package:
require(raster)
If that worked, you’re ready to proceed!
To get started, we’re going to work through the first R example in F&F chapter 2. Work your way through the section 2.3.3 beginning on page 28. This will cover several important raster data concepts that will be relevant going forward.
Many functions that work with rasters in R require template rasters.
These are like normal rasters, except they don’t need to contain any data.
Try building an empty template raster of the same dimension as the toy raster in the text.
You can re-use much of the code. You don’t need to populate the raster with randomly-generated data.
Next, create another toy raster called rast_6
using the same code you used for the text (F&F) example, but this time use a larger value for lambda, at least 10.0.
You learned about the aggregate() and disaggregate() functions earlier. We’re going to add another function to our arsenal: resample(). Check out the help entry for resample()
aggregate() and disaggregate() created rasters with whole number multiples of the original raster.
resample() can create new rasters of any dimension. It just needs a template raster with the desired extent and dimension.
Try building a template square raster with 7 rows and 7 columns, using the same extent as rast_6
. Your template raster does not need to contain any data, so you can skip the random number generation part.
Now use resample() and your 7 by 7 template to build a new version of your data raster that is 7 by 7 (rather than 6 by 6). Plot both of these rasters and think about why they look different. Was information gained or lost during the transformation from size 6 to size 7?
Building template rasters is tedious so here’s a function to generate them for us:
rast_templ = function(nrows, ncols, xmn = 1, xmx = 6, ymn = 1, ymx = 6)
{
return(raster(
ncol = ncols, nrow = nrows,
xmx = xmx, xmn = xmn,
ymx = ymx, ymn = ymn))
}
Use resample() with templates of various dimensions and plot the results. Experiment with some different template rasters of different dimension. Include some square rasters whose number of columns is different from the number of rows, i.e. rasters with rectangular cells.
Note that in ArcGIS, we usually work with rasters that have square cells, but raster cells can also be tall or squat rectangles.
Try to build an intuitive sense of what happens when you resample with dimensions that are smaller or larger than the original raster. What happens when you resample with dimensions that are or aren’t whole number multiples of the original?
Now, let’s try to recover our original rasters from some resampled rasters of different dimension.
Experiment with recovering a 6 by 6 raster from transformed versions of our original data raster. Remember you can use text() to plot the cell values on top of a raster.
Read through section 2.3.4.1, following along with the code so that you can recreate the buffer analysis. We will expand on this code later, so be sure you understand this section well.
Note: The authors provided scripts of the code they used in the text as part of the supporting materials.
Feel free to take a look at their code, especially if you run into difficulties. - I strongly recommend retyping the code from the book yourself, however. - From my own experience, retyping is a more effective way to understand and remember what the code is doing. - You can find their corrected verison of the Chapter 2 code in Moodle.
Warning: since we converted the author-supplied nocd data to NetCDF format, we have to use a slightly different syntax to read the NLCD data.
If you’ve set up your directory structure as instructed, you can use this syntax:
dat_dir = here("data", "Fletcher_Fortin-2018-Supporting_Files", "data")
nlcd = raster(file.path(dat_dir, "nlcd2011SE.nc"))
In the sections 2.3.4.2 and beyond, the authors provide some statistical ways to detect a chatacteristic scale, but for now we’ll concentrate on a more visual and informal method using the scatterplots of forest cover percent at different buffer widths.
I used the following approach in my informal buffer analysis:
Decided upon an informal definition of characteristic scale based on the correlation coefficient of a buffer distance and the next larger buffer distance.
Calculate the forest percent at increasing buffer widths: 100m, 500m, 1km, 1.5km, 2km, 2.5km, 3km, and 3.5km.
Created a pairplot, with correlation coefficients, of the forest percent cover.
Examine the pairplot and correlation coefficients to determine the characteristic scale.
First, I used the following code to build a data frame to use for the analysis:
empty_vec = rep(NA, length = nrow(sites))
cover_data = data.frame(
f100m = empty_vec,
f500m = empty_vec,
f1000m = empty_vec,
f1500m = empty_vec,
f2000m = empty_vec,
f2500m = empty_vec,
f3000m = empty_vec,
f3500m = empty_vec
)
Next, I used the following loop to calculate the forest percentages:
for(i in 1:nrow(sites)) {
cover_data$f100m[i] = BufferCover(sites, 100, forest, grainarea)
cover_data$f500m[i] = BufferCover(sites, 500, forest, grainarea)
cover_data$f1000m[i] = BufferCover(sites, 1000, forest, grainarea)
cover_data$f1500m[i] = BufferCover(sites, 1500, forest, grainarea)
cover_data$f2000m[i] = BufferCover(sites, 2000, forest, grainarea)
cover_data$f2500m[i] = BufferCover(sites, 2500, forest, grainarea)
cover_data$f3000m[i] = BufferCover(sites, 3000, forest, grainarea)
cover_data$f3500m[i] = BufferCover(sites, 3500, forest, grainarea)
print(i)
}
I then created a pairplot using pairs.panels()
in the psych
package (code not shown). Note you may need to install this package if you haven’t used it before.
I can read the relevant correlation coefficients just above the diagonal in the plot matrix. Here’s a scatterplot:
Based on my analysis, I would conclude that a characteristic scale of forest cover at the skink sampling locations is somewhere between 500 and 1000 meters.
The F&F book data includes a portion of the National Landcover Database. See the code in the section 2.3.4.1 for an example of how to load it using raster()
.
Check to see how many rows, columns, and total cells are in the raster. Now try plotting it using plot()
or image()
. Take a mental note of how lone it takes to plot with each of these methods.
Find some vector data of the borders of US states from a reputable source. I found a good source here: https://www.arcgis.com/home/item.html?id=f7f805eb65eb4ab787a0a3e1116ca7e5
You could also use the state border data you retrieved from the US Census site.
After you download your data, you can read it using readOGR()
in the rgdal
package, like you did for the skink sampling point data.
Warning: this function can be very finicky about how you spell your path and file names.
Once you have the state borders loaded, plot them!
Next, re-plot the nlcd data. Try to overlay a plot of the state borders using the add = TRUE
argument in plot()
.
Did it work? If not, can you figure out what happened?
Always remember the importance of projections.
Unlike ArcGIS, R makes no attempt to reproject your data on-the-fly for plotting or other uses.
You always have to manually reproject all of your data sources into the same coordinate system so that they play well together.
This can seem annoying at first because it’s extra work for us, but on the other hand it forces us to be much more intentional about our coordinate systems.
If your NLCD raster did not plot where you expected it to on your state borders (or if you can’t see it at all), you should make sure your state border data is in the same projection as the NLCD data.
If it isn’t, try using spTransform()
to make it so.
Please take a look at this semi-technical description of the concept of scope in computer programming: https://press.rebus.community/programmingfundamentals/chapter/scope/
Every language has its own scoping rules. There are spirited, even ugly, debates about best practices for scoping, and which languages have the best scoping scheme.
R tends to lean more heavily on scoping that could be characterized as global. This can be super convenient, but it can also cause unexpected and hard-to-diagnose bugs.
Please read this excellent article on variable scoping in R: https://www.r-bloggers.com/dont-run-afoul-of-scoping-rules-in-r/
We’ll talk more about scope in class after you’ve had a chance to puzzle over the concept for a while on your own.
For now, consider the BufferCover
function definition provided by the authors. Note: you may need to make your browser window very wide for the function code to look nice on your screen.
#-----------------------------------------#
#Function that puts all the steps together
#requires:
# points: one set of x,y coordinates
# size: the buffer size (radius), in m
# landcover: a binary land-cover map
# grain: the resolution of the map
#-----------------------------------------#
BufferCover <- function(coords, size, landcover, grain){
bufferarea.i <- pi*size^2/10000 #ha; size must be in m
coords.i <- SpatialPoints(cbind(coords[i, 1],coords[i, 2])) #create spatial points from coordinates
buffer.i <- gBuffer(coords.i, width=size) #buffer from rgeos
crop.i <- crop(landcover, buffer.i) #crop with raster function
crop.NA <- setValues(crop.i, NA) #empty raster for the rasterization
buffer.r <- rasterize(buffer.i, crop.NA) #rasterize buffer
land.buffer <- mask(x=crop.i, mask=buffer.r) #mask by putting NA outside the boundary
coveramount<-cellStats(land.buffer, 'sum')*grain #calculate area
percentcover<-100*(coveramount/bufferarea.i) #convert to %
return(percentcover)
}
Think about the scope of all the variables the function uses.
Consider both the function arguments and variables that are created inside the function body.
Challenge: try to identify at least one variable which must be in R’s global environment for BufferCover
to run correctly.
Compile your answers to the following into a document and submit on Moodle.
Specifically address what happens when you make rasters with coarser and finer grains from the original. You should reference the figures you included in the first question.
What happened when you tried to recover your original 6 by 6 data raster?
We often need to resize or resample raster data. It’s common to have two rasters that cover your area of interest, but that have different grain, or are in different coordinate systems.
Parts of this lab used simulated data, so it might seem a little artificial.
You can make a SpatialPolygons
bounding box object from many types of spatial objects using the as
and extent
functions (consult Dr. Google and your neighbors for guidance). Create a plot that shows US state outlines for the southeastern quarter of the Lower 48 States (you’ll need to choose appropriate plotting limits using xlim
and ylim
. On the top of the state boundaries, plot a rectangle showing the bounding box of the NLCD data in the F&F supporting data sets. Your bounding box rectangle should have a black border and a translucent fill so that you can see the state borders underneath.
Hint: check out the adjustcolor()
function for an easy to way to make a solid color translucent.
Look at the code for the BufferCover()
function and make sure you can identify the four function arguments.
The BufferCover()
function stores its output value in a variable called percentcover
.
Assume that there is no variable called percentcover
in R’s global environment before you call BufferCover()
.
BufferCover
will R’s global environment contain a variable called percentcover
? Make sure you explain why or why not.The BufferCover
function relies on at least one global variable to run correctly.
BufferCover
might cause problems if you tried to re-use this function elsewhere.You defined the land cover types 41 - 43 as ‘forest’ (i.e. ‘habitat’) and all of the other types as ‘nonforest’ (‘nonhabitat’) in your walkthrough of book section 2.3.4.1.
Look up the NLCD codes and try an alternate classification scheme. Repeat the buffer analysis using your scheme and see if you find similar conclusions about a characteristic scale.