Raster exercises

We’re going to get some more experience with rasters, and we’ll see some of the limitations of the raster data format.

Load raster

First things first, let’s load the raster package:

require(raster)

If that worked, you’re ready to proceed!

Simulated raster example

To get started, we’re going to work through the first R example in F&F chapter 2. Work your way through the section 2.3.3 beginning on page 28. This will cover several important raster data concepts that will be relevant going forward.

Template rasters

Many functions that work with rasters in R require template rasters.

These are like normal rasters, except they don’t need to contain any data.

Try building an empty template raster of the same dimension as the toy raster in the text.

You can re-use much of the code. You don’t need to populate the raster with randomly-generated data.

Data raster

Next, create another toy raster called rast_6 using the same code you used for the text (F&F) example, but this time use a larger value for lambda, at least 10.0.

Changing the raster grain

You learned about the aggregate() and disaggregate() functions earlier. We’re going to add another function to our arsenal: resample(). Check out the help entry for resample()

aggregate() and disaggregate() created rasters with whole number multiples of the original raster.

resample() can create new rasters of any dimension. It just needs a template raster with the desired extent and dimension.

Try building a template square raster with 7 rows and 7 columns, using the same extent as rast_6. Your template raster does not need to contain any data, so you can skip the random number generation part.

Now use resample() and your 7 by 7 template to build a new version of your data raster that is 7 by 7 (rather than 6 by 6). Plot both of these rasters and think about why they look different. Was information gained or lost during the transformation from size 6 to size 7?

Building template rasters is tedious so here’s a function to generate them for us:

rast_templ = function(nrows, ncols, xmn = 1, xmx = 6, ymn = 1, ymx = 6)
{
  return(raster(
    ncol = ncols, nrow = nrows, 
    xmx = xmx, xmn = xmn, 
    ymx = ymx, ymn = ymn))
}

Experiment with transformations

Use resample() with templates of various dimensions and plot the results. Experiment with some different template rasters of different dimension. Include some square rasters whose number of columns is different from the number of rows, i.e. rasters with rectangular cells.

Note that in ArcGIS, we usually work with rasters that have square cells, but raster cells can also be tall or squat rectangles.

Try to build an intuitive sense of what happens when you resample with dimensions that are smaller or larger than the original raster. What happens when you resample with dimensions that are or aren’t whole number multiples of the original?

Recovering our original data raster.

Now, let’s try to recover our original rasters from some resampled rasters of different dimension.

Experiment with recovering a 6 by 6 raster from transformed versions of our original data raster. Remember you can use text() to plot the cell values on top of a raster.

F&F Chapter 2 R Examples

Read through section 2.3.4.1, following along with the code so that you can recreate the buffer analysis. We will expand on this code later, so be sure you understand this section well.

Note: The authors provided scripts of the code they used in the text as part of the supporting materials.
Feel free to take a look at their code, especially if you run into difficulties. - I strongly recommend retyping the code from the book yourself, however. - From my own experience, retyping is a more effective way to understand and remember what the code is doing. - You can find their corrected verison of the Chapter 2 code in Moodle.

Warning: since we converted the author-supplied nocd data to NetCDF format, we have to use a slightly different syntax to read the NLCD data.
If you’ve set up your directory structure as instructed, you can use this syntax:

dat_dir = here("data", "Fletcher_Fortin-2018-Supporting_Files", "data")
nlcd = raster(file.path(dat_dir, "nlcd2011SE.nc"))

Determining a ‘characteristic scale’.

In the sections 2.3.4.2 and beyond, the authors provide some statistical ways to detect a chatacteristic scale, but for now we’ll concentrate on a more visual and informal method using the scatterplots of forest cover percent at different buffer widths.

I used the following approach in my informal buffer analysis:

  1. Decided upon an informal definition of characteristic scale based on the correlation coefficient of a buffer distance and the next larger buffer distance.

    • “The smallest distance that has a correlation coefficient of 0.9 between itself and the next greater distance step of 500m.”
  2. Calculate the forest percent at increasing buffer widths: 100m, 500m, 1km, 1.5km, 2km, 2.5km, 3km, and 3.5km.

  3. Created a pairplot, with correlation coefficients, of the forest percent cover.

  4. Examine the pairplot and correlation coefficients to determine the characteristic scale.

Charactistic Scale Loop

First, I used the following code to build a data frame to use for the analysis:

empty_vec = rep(NA, length = nrow(sites))

cover_data = data.frame(
  f100m = empty_vec,
  f500m = empty_vec,
  f1000m = empty_vec,
  f1500m = empty_vec,
  f2000m = empty_vec,
  f2500m = empty_vec,
  f3000m = empty_vec,
  f3500m = empty_vec
)

Next, I used the following loop to calculate the forest percentages:

for(i in 1:nrow(sites)) {
  cover_data$f100m[i]  = BufferCover(sites, 100, forest, grainarea)
  cover_data$f500m[i]  = BufferCover(sites, 500, forest, grainarea)
  cover_data$f1000m[i] = BufferCover(sites, 1000, forest, grainarea)
  cover_data$f1500m[i] = BufferCover(sites, 1500, forest, grainarea)
  cover_data$f2000m[i] = BufferCover(sites, 2000, forest, grainarea)
  cover_data$f2500m[i] = BufferCover(sites, 2500, forest, grainarea)
  cover_data$f3000m[i] = BufferCover(sites, 3000, forest, grainarea)
  cover_data$f3500m[i] = BufferCover(sites, 3500, forest, grainarea)
  print(i)
}

I then created a pairplot using pairs.panels() in the psych package (code not shown). Note you may need to install this package if you haven’t used it before.

I can read the relevant correlation coefficients just above the diagonal in the plot matrix. Here’s a scatterplot:

Based on my analysis, I would conclude that a characteristic scale of forest cover at the skink sampling locations is somewhere between 500 and 1000 meters.

Plotting spatial data exercise

The F&F book data includes a portion of the National Landcover Database. See the code in the section 2.3.4.1 for an example of how to load it using raster().

Check to see how many rows, columns, and total cells are in the raster. Now try plotting it using plot() or image(). Take a mental note of how lone it takes to plot with each of these methods.

Find some vector data of the borders of US states from a reputable source. I found a good source here: https://www.arcgis.com/home/item.html?id=f7f805eb65eb4ab787a0a3e1116ca7e5

You could also use the state border data you retrieved from the US Census site.

After you download your data, you can read it using readOGR() in the rgdal package, like you did for the skink sampling point data.

Warning: this function can be very finicky about how you spell your path and file names.

Once you have the state borders loaded, plot them!

Next, re-plot the nlcd data. Try to overlay a plot of the state borders using the add = TRUE argument in plot().

Did it work? If not, can you figure out what happened?

Click to show/hide hint

Always remember the importance of projections.

Unlike ArcGIS, R makes no attempt to reproject your data on-the-fly for plotting or other uses.

You always have to manually reproject all of your data sources into the same coordinate system so that they play well together.

This can seem annoying at first because it’s extra work for us, but on the other hand it forces us to be much more intentional about our coordinate systems.

If your NLCD raster did not plot where you expected it to on your state borders (or if you can’t see it at all), you should make sure your state border data is in the same projection as the NLCD data.

If it isn’t, try using spTransform() to make it so.

Programming Exercise: Scope

Definitions and Rules in R

Please take a look at this semi-technical description of the concept of scope in computer programming: https://press.rebus.community/programmingfundamentals/chapter/scope/

Every language has its own scoping rules. There are spirited, even ugly, debates about best practices for scoping, and which languages have the best scoping scheme.

R tends to lean more heavily on scoping that could be characterized as global. This can be super convenient, but it can also cause unexpected and hard-to-diagnose bugs.

Please read this excellent article on variable scoping in R: https://www.r-bloggers.com/dont-run-afoul-of-scoping-rules-in-r/

We’ll talk more about scope in class after you’ve had a chance to puzzle over the concept for a while on your own.

For now, consider the BufferCover function definition provided by the authors. Note: you may need to make your browser window very wide for the function code to look nice on your screen.

#-----------------------------------------#
#Function that puts all the steps together
#requires:
#  points: one set of x,y coordinates
#  size: the buffer size (radius), in m
#  landcover: a binary land-cover map
#  grain: the resolution of the map
#-----------------------------------------#

BufferCover <- function(coords, size, landcover, grain){
  
  bufferarea.i <- pi*size^2/10000                             #ha; size must be in m
  coords.i <- SpatialPoints(cbind(coords[i, 1],coords[i, 2])) #create spatial points from coordinates
  buffer.i <- gBuffer(coords.i, width=size)                   #buffer from rgeos
  crop.i <- crop(landcover, buffer.i)                         #crop with raster function
  crop.NA <- setValues(crop.i, NA)                            #empty raster for the rasterization
  buffer.r <- rasterize(buffer.i, crop.NA)                    #rasterize buffer
  land.buffer <- mask(x=crop.i, mask=buffer.r)                #mask by putting NA outside the boundary
  coveramount<-cellStats(land.buffer, 'sum')*grain            #calculate area
  percentcover<-100*(coveramount/bufferarea.i)                #convert to %
  
  return(percentcover)
}

Think about the scope of all the variables the function uses.

Consider both the function arguments and variables that are created inside the function body.

Challenge: try to identify at least one variable which must be in R’s global environment for BufferCover to run correctly.

Report Questions

Compile your answers to the following into a document and submit on Moodle.

Q1: Original and Resampled Rasters

  • Q1 (3 pts.): Paste the following figures into your report:
  1. A plot of the original 6 by 6 raster.
  2. Plots of resampled rasters of various dimensions. You should include as many figures as you need to help illustrate your answer to the next questions.

Q2: Resampling Intuition

  • Q2 (4 pts.): Explain your intuitive understanding of what happens when you resample raster data to dimensions that are not whole number multiples of the original.

Specifically address what happens when you make rasters with coarser and finer grains from the original. You should reference the figures you included in the first question.

Q3: What Happened To My Raster Original Data?

What happened when you tried to recover your original 6 by 6 data raster?

  • Q3 (3 pts.): Explain what happened when you tried to recover your original 6 by 6 raster. Make sure you explain why your end product does or does not look like your starting raster. Include at least one figure to illustrate your answer.

Q4: Resampling Strategies

We often need to resize or resample raster data. It’s common to have two rasters that cover your area of interest, but that have different grain, or are in different coordinate systems.

  • Q4 (2 pts.): Propose one or more strategies to minimize distortions.

Q5: Why use simulated data?

Parts of this lab used simulated data, so it might seem a little artificial.

  • Q5 (2 pts.): Identify at least one good reason for using simulated data when you’re learning new concepts or techniques.

Q6: Bounding Box

You can make a SpatialPolygons bounding box object from many types of spatial objects using the as and extent functions (consult Dr. Google and your neighbors for guidance). Create a plot that shows US state outlines for the southeastern quarter of the Lower 48 States (you’ll need to choose appropriate plotting limits using xlim and ylim. On the top of the state boundaries, plot a rectangle showing the bounding box of the NLCD data in the F&F supporting data sets. Your bounding box rectangle should have a black border and a translucent fill so that you can see the state borders underneath.

Hint: check out the adjustcolor() function for an easy to way to make a solid color translucent.

  • Q6 (6 pts.): Include a figure of your map in your report

Q7: Scope: Buffer Cover 1

Look at the code for the BufferCover() function and make sure you can identify the four function arguments.

The BufferCover() function stores its output value in a variable called percentcover.

Assume that there is no variable called percentcover in R’s global environment before you call BufferCover().

  • Q7 (3 pts.): After you run BufferCover will R’s global environment contain a variable called percentcover? Make sure you explain why or why not.

Q8: Scope: Buffer Cover 2

The BufferCover function relies on at least one global variable to run correctly.

  • Q8 (3 pts.): Identify the name of the global variable, and explain why its use in BufferCover might cause problems if you tried to re-use this function elsewhere.

Q9: Land Cover Classification Scheme

You defined the land cover types 41 - 43 as ‘forest’ (i.e. ‘habitat’) and all of the other types as ‘nonforest’ (‘nonhabitat’) in your walkthrough of book section 2.3.4.1.

Look up the NLCD codes and try an alternate classification scheme. Repeat the buffer analysis using your scheme and see if you find similar conclusions about a characteristic scale.

  • Q9 (5 pts.): Describe your classification scheme and your conclusion about the characteristic scale.