Lab Concepts and Objectives

This lab covers an assortment of [hopefully fun] techniques.

Random Walks

Random walks are useful in various fields. In ecology, along with diffusion models, they serve as useful models of animal movements.

The basic idea behind random walks is that you have a walker (an agent) who takes a series of discrete steps. At each step, the walker must choose a random direction and step length.

One specific use of random walk concepts is in so-called Brownian bridge models. These models are helpful for quantifying the range of possible intermediate locations when we have location data at specific time points.

The multiple trajectories between the two known points represent possible paths the animal could have taken. As a whole, the set of possible paths helps us quantify the uncertainty in position between the two known points.

You can also use random walks as a naive path-finding method. Here’s an optional fun video about using random walks to find your way throug a maze.

Random Walk Components

We can use the basic idea of random walks to formulate a procedure for modeling them in two dimensions. Consider the items that we need to keep track of:

  • The walker’s previous positions.
  • The walker’s current position.
  • The direction of the walker’s next step.
  • The length of the walker’s next step.

We can be proactive in how we think about our random walk components.

Since our walker will be making multiple steps, it should be obvious that eventually we’ll need to use a for-loop.

  • With that in mind, we can structure our coding in anticipation of using a loop variable.

We will also likely want to carry out multiple simulations, perhaps with different numbers of steps. That means we’ll want to package our random walk loop into a function.

  • Anticipating writing a function can help us structure our thinking as we build up the components of the walk.

Position

In a random walk simulation, we’d like to keep track of the location of all the walker’s steps. A simple way to do this is to create a matrix or data.frame to store the x- and y-coordinates at each time step. A possible data.frame structure might be:

# How many steps?
# This is a variable... could be useful as a function argument later on.
n_steps = 1000

# Create the empty data.frame:
walk = data.frame(
  step = numeric(n_steps + 1),
  x = numeric(n_steps + 1),
  y = numeric(n_steps + 1)
)

Note that I made the number of rows n_steps + 1 so that we can store the initial position in the first row.

Direction: Degrees and Radians

Choosing a random direction is a little bit more difficult and we need a little bit of trigonometry. Don’t worry, we’ll just use simple sin() and cosine() functions to figure out the x- and y- coordinates of an angle!

We might (or might not) recall from trigonometry that the cosine function gives us the x-coordinate of the position on the unit circle given an angle theta. The sine function gives us the y-coordinate.

We might also remember that angles can be expressed in degrees or radians. We’re usually more familiar with degrees, but radians are computationally easier to work with.

A complete circle, i.e. 360 degrees, is \(2\pi\) radians. Therefore some common angles in radians are:

  • \(0^\circ = 0\) radians
  • \(90^\circ = \frac{\pi}{2}\) radians
  • \(180^\circ = \pi\) radians
  • \(270^\circ=\frac{3\pi}{2}\) radians

Direction: Choosing a Random Angle 1

Our walker needs to choose a random direction for each step. One option is to create a discrete set of possible angles from which to choose. If we choose a set of four angles, pointing in the cardinal directions, we would end up with a walk along a grid.

# Discrete set of angles.
# R works better with radians than degrees.

thetas = c(0, 0.5 * pi, pi, 1.5 * pi)

Next, we could use sample() to pick a random direction and calculate the x- and y-distances traveled:

# Choose a random angle
theta = sample(thetas, size = 1)
print(theta, digits = 3)
## [1] 1.57
# Calculate the x- and y- distances
# Round to 3 decimal points
delta_x = round(cos(theta), 3)
delta_y = round(sin(theta), 3)

print(data.frame(delta_x = delta_x, delta_y = delta_y))
##   delta_x delta_y
## 1       0       1

Notice that the delta_x term is zero. This makes intuitive sense because the x-coordinate of a 90 degree angle is zero. The delta_y term is one.

In other words, the walker will move one unit up, and zero units in the horizontal direction:

Direction: Choosing a Random Angle 2

Another option is to let the angle vary continuously. In this case, we can choose a random angle from a uniform distribution, recalling that the range of 0 to 360 degrees is equivalent to 0 to \(2\pi\) in radians:

# Choose a random angle
theta = runif(
  n = 1,
  min = 0,
  max = 2 * pi
)
print(theta, digits = 3)
## [1] 3.68
# Calculate the x- and y- distances
# Round to 3 decimal points
delta_x = round(cos(theta), 3)
delta_y = round(sin(theta), 3)

print(data.frame(delta_x = delta_x, delta_y = delta_y))
##   delta_x delta_y
## 1  -0.858  -0.513

This time, the walker travels to the left and down:

Step Length

Step length is much easier!

In the simplest case, we can use a fixed step length:

step_length = 1

If we want to vary the step length randomly, we have many options.

For example, we could let the step length vary uniformly between 1 and 2:

step_length = runif(n = 1, min = 1, max = 2)

Another useful technique is to let the step follow a right-skewed distribution. For example, we could have exponentially-distributed step lengths:

step_length = rexp(n = 1, rate = 1)

Random walks with step lengths that follow a right-skewed distribution (often a distribution with “fat” tails such as a Lévy or power-law distribution) are called Lévy flights because they have been used to describe foraging behavior in albatrosses.

Position 2

If we’ve been clever in constructing our angle and step length code, calculating the next position is simple.

It turns out that a straight-line movement from the origin to any position on the unit circle has a distance of 1.

That means we can simply multiply the delta-x and delta-y we calculated from theta by the step length to get the overall change in position.

Then we just need to add these changes in the x- and y-coordinates to the current position!

For example, to add the coordinates of our first step to our data.frame of positions, we could use something like:

# the step number.
# This syntax might be useful in the context of a loop...
i = 1

# Current position:
current_x = walk$x[i]
current_y = walk$y[i]


# Change in position in the x- and y-direction:
delta_x = step_length * round(cos(theta), 3)
delta_y = step_length * round(sin(theta), 3)

# Record the new position.
# Note we use i + 1 because the first row contains the starting point:
walk$x[i + 1] = current_x + delta_x
walk$y[i + 1] = current_y + delta_y

# Record the step number:
walk$step[i + 1] = i

Random Walk Skeleton

Since we were thoughtful in how we set up our code for the different components, it should be straightforward to combine them into a simple random walk loop.

Here’s a skeleton:

n_steps = ...

# Create our empty data frame
walk = data.frame(
  ... 
)

for (i in 1:n_steps)
{
  # Choose a step length:
  step_length = ...
  
  # Choose a random direction:
  theta = ...
  
  # What is the current position?
  current_x = ...
  current_y = ...
  
  # Figure out the change in the x- and y-directions:
  delta_x = ...
  delta_y = ...
  
  # Record the new position
  ...
  
}

Your First Random Walk

You’ve got the basic components, now it’s time to combine them and do some walking!

Grid Random Walk

Use the skeleton and the random walk components we developed above to create a grid-based random walk with a fixed step length of 1.

You should end up with a data.frame called walk.

Create a test plot of your walk data using this code:

plot(y ~ x, data = walk, type = "l", asp = 1)
# Plot a blue-filled point at the starting point
points(
  y ~ x, data = head(walk, 1),
  pch = 21, col = 1, bg = "steelblue")

# Plot a red-filled point at the end of the walk
points(
  y ~ x, data = tail(walk, 1),
  pch = 21, col = 1, bg = "red")

Some things to notice:

We’ve used the head() function before, but you might not be familiar with the tail() function. tail() gives you a glimpse of the last few rows of a data.frame object. The first argument (x) is the data.frame you want to preview, the optional second argument (n) is the number of rows to retrieve.

  • A convenient way to retrieve the last row in a data.frame without having to calculate the number of rows is to simply use tail() to retrieve the last row by passing a value of 1 for the n argument.

I used the following arguments to plot()

  • pch. This stands for ‘plotting character’. Plotting character #21 is a circle with an optional fill color.
  • col. The color argument determines the outline color of character #21.
  • bg. The background argument determines the fill color.
  • asp. This sets the aspect ratio of the plot window. Since we need our walks to plot without being stretched vertically or horizontally, we can use a value of 1.

Walk Plot Function

That was a nice way to plot our walk. Since we’ll be plotting numerous walks, let’s build a function so we don’t have to re-write the plotting code every time:

plot_walk = function(dat_walk, ...)
{
  plot(y ~ x, data = dat_walk, type = "l", asp = 1, ...)
  
  # Plot a blue-filled point at the starting point
  points(
    y ~ x, data = head(dat_walk, 1),
    pch = 21, col = 1, bg = "steelblue")
  
  # Plot a red-filled point at the end of the walk
  points(
    y ~ x, data = tail(dat_walk, 1),
    pch = 21, col = 1, bg = "red")
}

Let’s test it out on our walk data:

plot_walk(walk)

Our plotting function expects a data.frame with the following columns:

  • x: a set of x-coordinates of the random walk
  • y: a set of y-coordinates of the random walk

You may have noticed the ... argument that I used. You might also notice that I used it in my call to plot() within the function. I won’t go into detail of the use of ellipses in functions, but using this argument allows us to pass additional arguments along to plot().

If you’re interested, here is a nice blog entry that describes the use of ellipses in custom functions.

For now we’re not using the step column in our walk pltos, but we’ll need it later.

Grid Walk Function

To paraphrase the old saying:

“If you have to perform a task more than once, you should write a function!”

A single realization of a random walk is cool, but the real power of random walks comes from simulating multiple walks.

We’d like to be able to execute our grid-based walk multiple times using a compact syntax. By now you’re a whiz at creating your own functions and you can probably guess what I’m going to ask you to do:

Package your random-walk loop into a function!

  • Your function should accept a single argument, n_steps, which is the number of steps to take.
  • Your function needs to return a data.frame that contains the step number, the x-, and the y-coordinates.
  • The column names must be step, x, and y.

Name your function grid_walk()

Hint: Don’t overthink this! You only need a couple of extra lines of code to package your loop into a function.

Test Your Walk Function

Once you’ve got your walk function working, you can test it out. Use the following as a self-check.

set.seed(123)

par(mfrow = c(2, 2), mar = c(3, 3, 1, 1))
{
  plot_walk(grid_walk(100))
  plot_walk(grid_walk(100))
  plot_walk(grid_walk(100))
  plot_walk(grid_walk(100))
}

Your walkers’ trajectories might not match mine exactly since there is randomness built in to the random walk function.

Omnidirectional Walk

For the next random walk, we’ll allow the walker to turn in any direction, i.e. it won’t be constrained to a grid. In this walk, we’ll keep the step length fixed again.

Build a random walk function, call it omni_walk, that accepts a single argument n_steps. This function should perform a random walk in which the walker can turn in any direction.

Hint: If you implemented grid_walk() using the skeleton and template components above, it’s possible to build omni_walk() by changing only a single line of code.

Test out your omni_walk() function:

set.seed(123)

par(mfrow = c(2, 2), mar = c(3, 3, 1, 1))
{
  plot_walk(omni_walk(100))
  plot_walk(omni_walk(100))
  plot_walk(omni_walk(100))
  plot_walk(omni_walk(100))
}

Distance Traveled

A natural question to ask is: How far does the walker travel, on average, in n steps?

Random walk theory has closed-form predictions for the average distance. We’ll perform a simulation experiment to explore the variability in distance traveled.

First we need a function to calculate the overall distance traveled given a set of x- and y-coordinates. You may remember the formula for Euclidean distance in 2 dimensions:

\(d = \sqrt{x^2 + y^2}\)

We can take advantage of R’s vectorization to calculate the distances for a set of x- and y-coordinates from a random walk simulation:

set.seed(1234)
dat_sim = omni_walk(1000)
dist_sim = sqrt(dat_sim$x^2 + dat_sim$y^2)


# Plot both the walk and the distance:
par(mfrow = c(1, 2))
plot_walk(dat_sim)
title("Random Walk")
plot(
  x = dat_sim$step,
  y = dist_sim,
  xlab = "step", ylab = "distance",
  type = "l", main = "Distance Traveled"
)

Since we’ll want to calculate distance for multiple simulations, we should make a distance function:

euc_dist = function(x, y)
{
  return(sqrt(x^2 + y^2))
}

sim_grid = grid_walk(5000)
par(mfrow = c(1, 2))
plot_walk(sim_grid, main = "Grid Walk")
plot(
  x = sim_grid$step,
  y = euc_dist(sim_grid$x, sim_grid$y),
  type = "l")

Distance Simulation

Individual random walks are idiosyncratic. We can conduct a simulation experiment to understand the behavior of many walks.

n_sims = 200
n_steps = 1000

dat_sim = matrix(
  0,
  nrow = n_steps + 1,
  ncol = n_sims)

for(i in 1:n_sims)
{
  sim_dat_i = omni_walk(n_steps)
  dist_i = euc_dist(sim_dat_i$x, sim_dat_i$y)
  dat_sim[, i] = dist_i
}

We can plot the distance traveled of each simulation with matplot():

matplot(
  dat_sim, 
  type = "l",
  col = gray(0, 0.1),
  xlab = "step", ylab = "distance")

Distance Envelope

It looks like some of our walkers traveled a long distance while others ended up near the origin.

We could create a 90% null envelope. We could use the envelope to characterize a range of distances in which we expect most random walkers to travel. The quantile() function will be our friend here:

envelope_90 = t(apply(
  dat_sim,
  1,
  function(x) quantile(x, c(0.05, 0.95))))
dim(envelope_90)
## [1] 1001    2
head(envelope_90)
##             5%      95%
## [1,] 0.0000000 0.000000
## [2,] 0.9995958 1.000488
## [3,] 0.2017680 1.998388
## [4,] 0.4125787 2.806290
## [5,] 0.4692512 3.509248
## [6,] 0.5960585 3.765909

Now we can add the envelope, and the mean, to our plot:

matplot(
  dat_sim, 
  type = "l",
  col = gray(0, 0.05),
  xlab = "step", ylab = "distance")

# Lower edge of envelope
lines(
  x = 1:(n_steps + 1),
  y = envelope_90[, 1],
  lwd = 1, col = 2)

# Upper edge of envelope
lines(
  x = 1:(n_steps + 1),
  y = envelope_90[, 2],
  lwd = 2, col = 2)


lines(
  x = 1:(n_steps + 1),
  y = apply(dat_sim, 1, mean),
  lwd = 2, col = "blue"
)

What do you notice about the mean distance traveled? Does it seem linear, monotonic, logarithmic, or something else?

Random Walk Extensions (Optional)

If you are interested in random walks, you can try implementing some optional extensions to the code you’ve already built.

Here are some suggestions:

  • Vary the step lengths. You might allow step lengths to vary in several ways, for example following a normal distribution.
  • Correlated random walk. So far the walker in our simulations has been memoryless, that is each step it is allowed to choose a totally new random direction. It’s possible to give the walker a memory of the previous direction, and allow it to constrain the possible directions in the next step. Perhaps the new step is restricted to be within 15 degrees of the previous direction.
  • 3D random walk. We’ve simulated walks in 2 dimensions, but random walks in 3 dimensions are possible, and they have some interesting properties. To generate a random angle in 3D, you’ll need to figure out how to sample a random point on the unit sphere, which turns out to be surprisingly difficult!

Ordination

Ordinations are focused on finding interpretable trends in data, represented by changes in species composition with possible underlying changes in environmental gradients.

Ordinations fall under the umbrella of multivariate statistics. An aim of many ordination techniques is dimensionality reduction. In other words, to distill the information contained in a complex dataset with many variables into a small number of new variables that capture as much of the variation in the data as possible.

Other types of ordinations are based on a dissimilarity matrix. A dissimilarity matrix represents pairwise distances (measured in various ways) between samples. A common dissimilarity metric is the Bray-Curtis distance.

Ordination Data

We’ll use the Oregon bird survey data again, but this time we’ll use the data aggregated by subbasin.

# Bird abundance data
dat_bird_sub = read.csv(
  here("data", "bird.sub.csv"))

# Remove the non-species columns.
# Note the use of the select argument with the negative.
dat_bird_sub_2 = subset(
  dat_bird_sub,
  select = -c(
    b.total, b.rich, b.sidi, basin, sub, tsta))

# Site habitat covariates
dat_hab_sub = read.csv(
  here("data", "hab.sub.csv"))

Subbasin Grouping

We’ll be grouping observations by subbasin, so we’ll need a vector of colors for plotting.

# Create a vector of colors, coded by basin
basin_col_all = as.numeric(
  factor(dat_bird_sub$basin))

# A prime rule in colorblind design is to avoid red/green combos.
# Change the green to blue
basin_col_all = replace(
  basin_col_all,
  basin_col_all==3,
  4)

Nonmetric Multidimensional Scaling

NMDS attempts to plot points in 2D or 3D space, such that the distances between the points in the plot correspond as closely as possible to the distances in the dissimilarity matrix.

We’ll use the vegan package to perform NMDS.

The metaMDS function creates the dissimilarity matrix and performs the NMDS algorithm, returning a metaMDS object.

We’ll create 2- and 3- dimensional ordinations:

mds_b_sub_2 = metaMDS(dat_bird_sub_2, k = 2)
mds_b_sub_3 = metaMDS(dat_bird_sub_2, k = 3)
class(mds_b_sub_2)

MDS Stress Values

The MDS algorithm uses an iterative simulation method to find an optimal way to arrange the points in multidimensional space (that’s a very simplified explanation!).

The stress values are a measure of how well the algorithm was able to position the points in ordination space. Lower stress values are better, with a rule of thumb being:

  • \(stress> 0.3\): positioning is no better than random
  • \(0.3 > stress > 0.2\): good/ok-ish
  • \(0.2 > stress > 0.1\): great
  • \(stress < 0.1\): excellent

Let’s check the stress values for our fits:

axes stress
2 0.201
3 0.118

It looks like the 2-axis ordination did an acceptable job, but the 3-axis version did much better.

2D Results

We can use the ordiplot() function from the vegan package to plot the 2D ordination:

ordiplot(mds_b_sub_2)

Ordination plots are notoriously difficult to interpret, but we can make some simplifications to improve readability:

  • Color-code the sites by basin.
  • Plot only the sites (the round points), excluding the species (the plus signs).
# How many observations are in each basin?
table(dat_bird_sub$basin)
## 
##  D  L  N 
## 10 10 10
# Create a vector of colors, coded by basin
basin_col = as.numeric(factor(dat_hab_sub$basin))

# A prime rule in colorblind design is to avoid red/green combos.
# Change the green to blue
basin_col = replace(
  basin_col,
  basin_col==3,
  4)

# Open a blank plot window:
ordiplot(
  mds_b_sub_2,
  display = "sites",
  type = "n")

# Draw the points with custom colors
points(
  mds_b_sub_2,
  display = "sites",
  col = basin_col,
  lwd = 2)

It looks like there might be some patterning. How would you qualitatively describe the arrangement of the three colors of points?

3D Results

Let’s look at the 3-axis ordination using an interactive 3D scatterplot:

# Make the points big using cex = 3
ordirgl(
  mds_b_sub_3,
  col = basin_col,
  cex = 3, axes = F)

# Make sure the background is white
bg3d("white")
rglwidget()

Make sure you try rotating the plot with your mouse so you can get a better idea of how the points are distributed in 3D space.

The 3rd axis seems to contribute to the clustering.

Spider Plot

Spider plots are another way to visualize the clusters (or hypothesized clusters) in ordination space.

They work by first calculating the centroid (center of gravity) of the proposed group. Next, line segments are connected from the centroid to each point in the group.

Here’s what a 3D spider plot looks like with the MDS object

ordirgl(
  mds_b_sub_3,
  col = basin_col,
  cex = 3, axes = F)
orglspider(
  mds_b_sub_3,
  display = "sites",
  col = basin_col,
  groups = dat_bird_sub$basin,
  ax.col = 0) 
bg3d("white")
rglwidget()

Principal Correpondence Analysis

Another common ordination method is Principal Correspondence Analysis (PCoA).

Dissimiarity Matrix

Like NMDS, PCoA does not operate on the original data, but rather a dissimilarity matrix

Dissimilarity may be measured in various ways, including Eudlidean distance. A useful dissimilarity measure in ecology is the Bray-Curtis dissimilarity matrix.

You can use the vegdist() function from package vegan to create the matrix. By default it creates a Bray-Curtis dissimilarity matrix.
Check the help entry for details on how to calculate different types of dissimilarity matrices.

Perfom a PCoA

Using the function documentation and the code above as a template, perform a Principal Correspondence Analysis on the bird species data.

  1. Create the Bray-Curtis dissimilarity matrix.
  2. Create 2- and 3-axis PCoA objects. You can use the wcmdscale() function.
  3. Plot the 2-axis analysis, and compare the results to the NMDS analysis.
  4. Create a 3D spider plot of the 3-axis PCoA object.

Your 2D plots should look something like this:

Are there any qualitative similarities in the plots of both ordinations?

Your 3D Spider plots should resemble mine:

Ordination and Community Ecology Resources

A moderately detailed, but readable overview of ordination can be found at the Ordination Methods for Ecologists site.

Another good resource for information on community ecology methods in R is the aptly named Analysis of community ecology data in R site.

Tree Diameter Distributions

Characterizing the distribution of tree stem diameters is useful to forest managers. Understanding tree stand structure is useful for many economic (e.g. estimating the amount of marketable timber) and for meeting various management goals.

Many techniques exist for describing tree size distributions, but an especially useful method is using a Weibull distribution.

To illustrate, we’ll use some tree diameter data from a survey of a Tapajos national forest in Brazil in 2010.

The data were retrieved from the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC).

The inventory data is in the file Tapajos_inventory_data_2010.csv.

You can view the metadata here

Inga cylindrica

For the walkthrough, we’ll use a single species from the dataset Inga cylindrica.

You know how to use the subset() function to retrieve a subset of a dataframe that meets a logical condition.

Create an new data.frame object called dat_inga that meets the criterion that the entries in the scientific_name are Inga cylindrica.

Create a histogram of the I. cylindrica dbh. As a self-test, it should look like the example below. I used the breaks = 8 argument to hist().

Fitting a Weibull Curve

As a first approximation, we can try to fit a Weibull curve by guessing values for the shape and scale parameters.

I. Cylindrica Density

First, let’s create a density plot of the I. cylindrica dbh:

plot(
  density(dat_inga$dbh),
  main = "Mike's Kernel Density Plot\nInga cylindrica")

Add a Weibull Curve

Next, I can use the curve() function with dweibull() to draw a Weibull curve:

plot(
  density(dat_inga$dbh),
  main = "Mike's Kernel Density Plot\nInga cylindrica",
  lwd = 2)
curve(
  dweibull(x, shape = 3, scale = 25), add = T,
  col = "steelblue", lwd = 2, lty = 3)
legend(
  "topright",
  legend = c("data", "guess"),
  col = c(1, "steelblue"),
  lty = c(1, 3), lwd = 2,
  bty = "n")

The general shape looks promising, but the position in the plot doesn’t quite match the data.

You should tinker with the shape and scale parameters to see if you can eyeball a better fit.

We’ll use Maximum Likelihood to find a “best fit” below, but for now try the guess and check method to see how close you can get.

Weibull Best Guess

My best guess looked like:

It’s not quite a perfect fit, but it’s better than my first guess.

Before you proceed, see if you can do better than I did!

  • Keep track of your best-guess values, they will come in handy later.

Maximum Likelihood Fit

We’ll use a graphical approach to find an ML estimate of the Weibull parameters.

Fortunately, the Weibull distribution has only 2 parameters, so we can visualize a likelihood surface. No 4-dimensional plots are needed!

The general procedure is:

  1. Choose a set of parameter value pairs.
  2. Calculate the sum of log likelihoods of the data for each of the parameter value pairs.
  3. Visualize the surface.

Candidate Parameter Values

To create our maximum likelihood surface, we can make a regularly-spaced ‘grid’ of parameter value pairs for which to test the likelihood.

This may need some further explanation:

The first step is to choose a gradient of values over which to vary the shape parameter. Let’s try 1 to 3 with 5 values:

w_shape = seq(1, 3, length.out = 5)

Next, we can choose a range for the scale parameter:

w_scale = seq(12, 25, length.out = 5)

To create our set of parameter value pairs, we’ll use the expand.grid function:

param_pairs = expand.grid(shape = w_shape, scale = w_scale)

head(param_pairs)
##   shape scale
## 1   1.0 12.00
## 2   1.5 12.00
## 3   2.0 12.00
## 4   2.5 12.00
## 5   3.0 12.00
## 6   1.0 15.25
tail(param_pairs)
##    shape scale
## 20   3.0 21.75
## 21   1.0 25.00
## 22   1.5 25.00
## 23   2.0 25.00
## 24   2.5 25.00
## 25   3.0 25.00

Calculate Log-likelihoods

Now the fun part: Calculate the sum of log-likelihoods of the data for all pairs of parameter values.

First, we need to figure out a few things, for example:

How do I calculate the sum of log-likelihoods for a Weibull distribution?

Here’s how I can calculate the log-likelihood value for the first set of parameter values:

x = dat_inga$dbh
shape = param_pairs$shape[1]
scale = param_pairs$scale[1]

sum(dweibull(
  x,
  shape = shape,
  scale = scale,
  log = T)
)
## [1] -897.288

Weibull Lilelihood Function

You’ll need to do lots of likelihood calculations. By now, you know the routine: I’m asking you to create a Weibull Likelihood Function.

You can use the above code as inspiration for your implementation.

The function specs are:

  • Function name should be weib_lik().
  • First argument (x) is the data (for example DBH).
  • Second argument is the shape parameter.
  • Second argument is the scale parameter.

Calculate the Surface

Now let’s put it together and calculate a likelihood surface.

We can use a loop along with our new function:

# Create an empty column in the parameter
# data frame to hold th likelihoods

param_pairs$log_lik = 0

for (i in 1:nrow(param_pairs))
{
  param_pairs$log_lik[i] = 
    weib_lik(
      dat_inga$dbh,
      param_pairs$shape[i],
      param_pairs$scale[i])
}

For the plotting functions below, we need a matrix of z-values. We can create one from the log_lik column of param_pairs:

z_matrix = matrix(
  param_pairs$log_lik,
  nrow = length(w_shape),
  byrow = FALSE)

Contour Plot - First Draft

We can create a 2D contour plot:

contour(w_shape, w_scale, z_matrix)

The contour plot looks a little grainy. Let’s try a 3D perspective plot.

3D Plot - First Draft

You’ve already used persp3d() in a previous lab. You can use it again here to plot the likelihood surface.

rgl.viewpoint(zoom = 1)
persp3d(
  w_shape, w_scale, z_matrix,
  col = "darkgreen",
  xlab = "shape",
  ylab = "scale",
  zlab = "sum(log-lik)",
  alpha = 0.2,
  smooth = FALSE)
bg3d("white")
rglwidget()

This plot looks angular too. Try re-running the code with smooth = TRUE for a slight improvement.

Higher-Resolution Plots

Now it’s your turn to experiment.

To make your contour and perspective plots higher resolution, you’ll want to increase the number of parameter values you test. Recall that for the example, we used only 5 discrete values of each parameter.

To create the following high-resolution plots, for example, I used sequences of 50 discrete values for each parameter.

Lab Questions

Make your way through the lab walkthrough and answer the following questions:

Q1-2: Random Walks

Create the code for the onmidirectional walk function (omni_walk()) from the walkthrough.

  • Q1 (1 pt.): Include a plot of a random walk using your omni_walk() and plot_walk() functions. Your walk may be any any length between 100 and 10000 steps.

  • Q2 (2 pts.): Using either the grid or omnidirectional walk, perform a distance simulation (like in the walkthrough). Your simulation should use 200 random walks of 1000 steps. Using the code from the walkthrough as a template to create a plot of the distance envelope. Your envelope must show:

    • 2.5 and 97.5 percentile lines.
    • The mean line.

Q3-4: Ordinations

  • Q3 (2 pts.): Include a spider plot of your PCoA (Principal Correspondence Analysis).
  • Q4 (2 pts.): Qualitatively describe any patterns you see in the spider plot.

Q5-6: - Maximum Likelihood Fitting

  • Q5 (1 pt.): Use my code to create the kernel density plot of Inga cylindrica DBH. Try different values for the two Weibull parameters (shape and scale) to see if you can get a better fitting curve than my example. Include a plot of your best curve and list the values you used for the shape and scale parameters.

  • Q6 (2 pts.): Include a high-resolution contour plot of your likelihood surface. Use the code in the Maximum Likelihood Fit section as a template.