Learning objectives, technical skills and concepts:

Random number generation
Distribution Functions
Building custom functions
Fitting deterministic models
Calculating residuals
Defining and quantifying error

The Ricker Function

The Ricker function is a modification of the exponential function. The formula is:

\(f(x) = a \times x \times e^{-b \times x}\)

You can see the general form of the curve (with some hints as to how to visually parameterize it) in this figure.

How would you describe the roles of the a and b parameters?
- The a parameter is the initial slope
- The highest point of the curve occurs at an x value of \(\frac{1}{b}\).
- The y-coordinate of the curve at its highest point is \(\frac{a}{b}e^{-1}\).
Note that in R \(e^x\) is written as exp(x)

We can create a Ricker function in R:

ricker_fun = function(x, a, b) 
{
  return(a * x * exp(-b * x))
}

Here’s a plot of the shape using the simplest possible parameters: a = 1 and b = 1:

curve(
  ricker_fun(x, 1, 1), 
  from = 0, to = 5, add = FALSE, 
  main = "Ricker function: a = 1, b = 1",
  ylab = "f(x)", xlab = "x")

Note the use of the curve() function:
The from and to arguments tells curve() what range of x-values it should include in the plot.
The add = FALSE argument value creates a new plot.

The exponential function

The exponential function is the building block of the Ricker.

Here’s McGarigal’s infographic:

What do the parameters mean in this case?

The a parameter is the starting height of the curve.
The b parameter determines the rate of decay.
- \(\frac{1}{b}\) is the time it takes to reach a y value of \(\frac{a}{e}\)
- \(\frac{ln(2)}{b}\) is the half life, i.e. the time it takes to reach a y value of \(\frac{a}{2}\)

We can create an exponential function in R:

I’ll let you build it! You can use the Ricker function code I wrote above as inspiration.

exp_fun = function(x ...

NOTE: McGarigal defines the Ricker function with a negative sign before the parameter b but no negative sign before b in the exponential.

For consistency, you need to include the negative sign with parameter b in your exponential function definition.

Plot of the exponential function

What code could you use to create a similar plot?

Self-Test

You can use test code to test your implementation.

Using parameter values b = 1/15 and a = 2.2, your exponential function should like my plot:

curve(
  exp_fun(x, 2.2, 1/15), add = FALSE, from = 0, to = 50,
  ann = FALSE, axes = TRUE, ylab = "f(x)"); box()

Simulating data with different stochastic distributions

We’ve used curve() to plot deterministic models using linear, logistic, Ricker, and exponential functions using parameter values we selected.

We can also create simulated data that use these functions as deterministic models.

We’ll use R’s random deviates functions to generate the noise or ‘errors’.

To begin, we can start with a simple function: a linear function!

Simulated data on a line.

First, let’s choose 50 uniformly-distributed random x-values within the interval 2 to 10:

# Seed the RNG so we can reproduce our results
set.seed(1234567)

# Specify the x-range and number of points:
n_pts = 50
x_min = 2
x_max = 10

# Generate the x-values
x_sim = runif(n_pts, min = x_min, max = x_max)

Next, we can choose an intercept and slope for our deterministic model and generate the ‘predicted’ y values:

param_intercept = 2.3
param_slope = 0.67
y_pred = param_intercept + x_sim * param_slope
plot(x_sim, y_pred, main = "Simulated Data\nNo Errors", xlab = "", ylab = "")

Normal errors 1

Now we can add some normally-distributed noise to generate our ‘observed’ y-values:

error_mean = 0
error_sd = 0.25

y_observed = 
  y_pred + 
  rnorm(
    n = n_pts, 
    mean = error_mean, 
    sd = error_sd)
plot(x_sim, y_observed, main = "Normally Distributed Errors\n Constant Variance", xlab = "", ylab = "")

Normal errors 2

We could also use a more sophisticated stochastic model. For example, we could make the variability larger with increasing values of x:

error_mean = 0
error_sd = 0.1

y_observed_2 = 
  y_pred + 
  rnorm(
    n = n_pts, 
    mean = error_mean, 
    sd = error_sd * x_sim)

par(mfrow = c(1, 2))
plot(x_sim, y_observed, main = "Normally Distributed Errors\n Constant Variance", xlab = "", ylab = "")
plot(x_sim, y_observed_2, main = "Normally Distributed Errors\n Increasing Variance", xlab = "", ylab = "")

What did I do differently to make the points more variable as x increased?

Exponentially-distributed errors.

To produce this plot, I generated exponentially-distributed errors using rexp(). I used a rate parameter of 1.2. I’ll let you think about how you might create such a plot yourselves. To follow along, you can create a new vector called y_observed_3 containing your data with exponentially-distributed errors.

Choosing a model

Consider the three plots

If you didn’t know the process that generated these data, how would you choose deterministic and stochastic models?

The deterministic functions look linear, so we’ll focus on the stochastic models.
A first step might be to fit linear models.
Next, we can examine histograms of the residuals:

Fitted Linear Models

We haven’t covered how to build linear model objects yet, so for now don’t worry about understanding this code:

fit_1 = lm(y_observed ~ x_sim)
fit_2 = lm(y_observed_2 ~ x_sim)
fit_3 = lm(y_observed_3 ~ x_sim)

par(mfrow = c(1, 3))

plot(y_observed ~ x_sim); abline(fit_1)
plot(y_observed_2 ~ x_sim); abline(fit_2)
plot(y_observed_3 ~ x_sim); abline(fit_3)

Model Residuals

We’ll apply these concepts to a real data set below.

Marbled salamander dispersal data

Background

The data for this exercise represent the dispersal of juvenile marbled salamanders from their natal ponds to neighboring ponds.

The data were derived from a long-term study of marbled salamanders in western Massachusetts in which a cluster of 14 vernal pools were monitored continuously between 1999-2004. All juveniles were marked upon leaving their natal ponds. Subsequent recaptures at non-natal ponds were used to determine dispersal rates between ponds for first-time breeders (ftb).

In the data set provided, the dispersal rates have been standardized to account for several factors, including the propensity for dispersal from each pond and the available distances between ponds owing to the unique configuration of ponds. The data set includes three variables:

dist.class = distance class, based on 100 m intervals;
disp.rate.ftb = standardized dispersal rate for first-time breeders, which can be interpreted as a relative dispersal probability.
disp.rate.eb = standardized dispersal rate for experienced breeders, which can be interpreted as a relative dispersal probability.

Data

The data are contained in the file dispersal.csv, which you can download from the course GitHub page.

You’ll try fitting three deterministic functions to the data:

linear
exponential
Ricker

Lab Questions Instructions

Your assignment is to examine the data set provided (dispersal.csv) and find at least three alternative mathematical functions to describe the apparent relationship between juvenile dispersal and distance. The overall goals are for you to

Examine the data set. Plot the relationship between juvenile dispersal (disp.rate.ftb) and distance (dist.class).
Visually parameterize linear, exponential, and Ricker functions via trial-and-error to fit the data.
- Note: Be patient, we’ll eventually use R’s power to automatically fit model parameters. For now, the goal is graphical intuition.
Calculate the residuals and propose a distribution to model the errors.

Lab Questions

Exponential Functions (Q 1-4)

Instructions:

Complete the template R code (from the walkthrough above) to build your exponential function. Make sure you read the note about parameter b in the lab instructions.
To gain visual intuition, try using curve() to plot your function.

Try out different values of the two parameters as well as plotting over different ranges of x-values.
Try plotting two or more curves with slightly different parameter values on the same plot
You may need to try out different ranges of x-values in your plot windows to make sure you are focusing on the interesting parts of the curves.
Try different line colors, widths, and textures.

Some example parameterizations that I created:

Q1 (2 pts.): Show the R code you used to create exp_fun()
Q2 (4 pts.): In your lab report, include a single figure containing four negative exponential curves with the following parameter values and line colors/textures:
- curve 1: a = 1.9, b = 0.1, line color = black, line texture = solid
- curve 2: a = 1.9, b = 0.3, line color = black, line texture = dotted
- curve 3: a = 1.2, b = 0.2, line color = red, line texture = solid
- curve 4: a = 1.2, b = 0.4, line color = red, line texture = dotted
- Hint: check out the from, to, ylim, and add arguments for curve(). Setting appropriate x- and y-limits in your plot will help you see all four curves.
Q3 (2 pts.): Observe how the curves vary as you change the two parameters’ values. Qualitatively describe what happens to the curve as you vary parameter a
Q4 (2 pts.): Observe how the curves vary as you change the two parameters’ values. Qualitatively describe what happens to the curve as you vary parameter b

Ricker Functions (Q 5-7)

Instructions:

To gain visual intuition, try using curve() to plot the Ricker function.
- Try out different values of the two parameters as well as plotting over different ranges of x-values.
- Try plotting two or more curves with slightly different parameter values on the same plot
- You may need to try out different ranges of x-values in your plot windows to make sure you are focusing on the interesting parts of the curves.
- Try different line colors, widths, and textures.

An example exploratory plotting exercise:

Note that I provided the R code for the Ricker function in the lab walkthrough.

Q5 (6 pts.): In your lab report, include a single plot containing 6 Ricker curves with these parameter values:
- curve 1: a = 25, b = 0.2, line color = black, line texture = solid
- curve 2: a = 20, b = 0.2, line color = black, line texture = dotted
- curve 3: a = 10, b = 0.2, line color = black, line texture = dotted
- curve 4: a = 75, b = 0.3, line color = red, line texture = solid
- curve 5: a = 50, b = 0.3, line color = red, line texture = dotted
- curve 6: a = 40, b = 0.3, line color = red, line texture = dotted
Q6 (2 pts.): Observe how the curves vary as you change the two parameters’ values. Qualitatively describe what happens to the curve as you vary parameter a
Q7 (2 pts.): Observe how the curves vary as you change the two parameters’ values. Qualitatively describe what happens to the curve as you vary parameter b

Salamander Models (Q 8-13)

A tip for visually parameterizing models:

You can use locator(1) to find the exact coordinates on a plot. Just type it into your console and click on the location on the plot that you’d like to know the coordinates of.

Using the scatterplot of the salamander dispersal data for first time breeders, visually fit:

a linear model
an exponential model
a Ricker model

For example, here is a very poorly fitted linear model that I created:

  plot(
    dat_dispersal$dist.class,
    dat_dispersal$disp.rate.ftb,
    xlim = c(0, 1500),
    xlab = "distance class", 
    ylab = "standardized dispersal rate", 
    main = "Marbled Salamander - first time breeders\n(Bad) linear model")
  curve(line_point_slope(x, 1500, 0.8, -0.00001), add = TRUE)

Use curve to visually test your parameter guesses.

If you use the argument add = TRUE, you can plot multiple guesses on the same scatterplot.

Q8 (2 pts.): Linear Model. Provide the values of the slope, x1, and y1 parameters you chose. Briefly describe how you chose the values.

Hint: click to show/hide the linear fit function code

  line_point_slope = function(x, x1, y1, slope)
  {
    get_y_intercept = 
      function(x1, y1, slope) 
        return(-(x1 * slope) + y1)
    
    linear = 
      function(x, yint, slope) 
        return(yint + x * slope)
    
    return(linear(x, get_y_intercept(x1, y1, slope), slope))
  }

Q9 (2 pts.): In your lab report, include a scatterplot of the salamander data with your fitted linear model.
Q10 (2 pts.): Exponential Model. Provide the values of the a and b. Briefly describe how you chose the values.
Q11 (2 pts.): In your lab report, include a scatterplot of the salamander data with your fitted exponential model.
Q12 (2 pts.): Ricker Model Provide the values of the a and b. Briefly describe how you chose the values.
Q13 (2 pts.): In your lab report, include a scatterplot of the salamander data with your fitted ricker model.

Hint: If you’re having trouble seeing your exponential fit, you should manually choose appropriate x-limits for your scatterplot. I used the range of 0 to 1500.

Salamander Model Residuals (Q 14-15)

Calculate the residuals for your three fitted models and store them in a data.frame

The columns containing the residuals should have the following names:
“resids_linear”
“resids_exp”
“resids_ricker”

Q14 (4 pts.): Show the R code you used to create your data frame of model residuals.
Q15 (3 pts.): In your lab report, include histograms of the residuals for each of your three models. You may create a single figure with three panels, or include three separate figures.

Report

Compile your answers to all 15 questions into a pdf document and submit via Moodle.

You may also do your work in an R Notebook and submit a rendered html file.

Analysis of Environmental Data Lab

Lab 5