Learning Objectives

  • Calculating likelihoods
  • Estimating distribution parameters using maximum likelihood

Data Files

You’ll need bird.sta.csv and hab.sta.csv in your data subfolder.

What is likelihood?

Likelihood is a measure of the relative probability of observing a set of one or more observations given a proposed model and parameter values.

  • We’ll be calculating likelihoods using probability distributions and their parameter values in this activity.

Likelihood of a single observation is proportional to the value of the probabiltiy density/mass function.

Likelihood of multiple independent observations is proportional the product of individual likelihoods.

What is maximum likelihood?

Maximum likelihood (ML) is a criterion for selecting a model.

  • The likelihood of a collection of observations is measured as the sum of log-likelihoods of individual observations in a dataset.

  • ML selection procedures select the model parameter values that maximize the sum of log-likelihoods.

We’ve already met another selection, least squares.

  • Recall that in contrast to ML, in which we maximize our criterion, we minimize our criterion when we use least squares.

Calculating Likelihood

Likelihood is calculated from the probabiltiy density/mass functions.

  • Remember the d___() functions in R?

Likelihood of Two Observations

Let’s practice calculating likelihoods:

  • How can we calculate the likelihood for 2 observations?
  • Recall the general procedure for calculating likelihoods:
  1. Propose a model or distribution.
  2. Propose a set of model or distribution parameter values.
  3. Calculate the likelihood for each observation.
  4. Multiply the individual likelihoods to get the joint likelihood.
  • In practice, you calculate the sum of the log-likelihoods.

Here’s an example in R:

  • Let’s say I’ve been conducting counts of Wilson’s Warblers at study sites in Oregon.
  • I’ve been to 2 sites where I counted 2, and 6 birds.
x_observed = c(2, 6)
print(x_observed)
## [1] 2 6

I think I can model the population of birds at the study sites with a Poisson distribution.

  • I know that the Poisson distribution has a single parameter: \(\lambda\).
  • I also know that the mean and standard deviation of a Poisson distribution are equal to \(\lambda\)

Choosing a model:

  • I think the count of 2 is unusually low, so I decide to propose a Poisson distribution with \(\lambda = 4.5\) as a model of the counts of Wilson’s Warblers on my study sites.

  • I can use dpois() to calculate the probability mass for the two counts given a Poisson distribution with \(\lambda = 4.5\)

dpois(x = 2, lambda = 4.5)
## [1] 0.1124786
dpois(x = 6, lambda = 4.5)
## [1] 0.1281201

I know the likelihood of observing those particular counts together is the product of the individual likelihoods:

dpois(x = 2, lambda = 4.5) * dpois(x = 6, lambda = 4.5)
## [1] 0.01441077

I can take advantage of vectorization in R by storing the counts in a single vector:

wiwa_counts = c(2, 6)
dpois(x = wiwa_counts, lambda = 4.5)
## [1] 0.1124786 0.1281201

Product of Likelihoods

I can more easily calculate the product now:

prod(dpois(x = wiwa_counts, lambda = 4.5))
## [1] 0.01441077

Sum of Log-Likelihoods

And the sum of log-likelihoods:

sum(log(dpois(x = wiwa_counts, lambda = 4.5)))
## [1] -4.239779
  • How would you modify the code to calculate the likelihood using \(lambda = 4.2\)?

Likelihood of Many Observations

Now let’s say I want to find the value of \(\lambda\) that maximizes the likelihood for the counts of Wilson’s Warblers.

I first need to load the bird data:

dat_bird = read.csv(here::here("data", "bird.sta.csv"))
dat_habitat = read.csv(here::here("data", "hab.sta.csv"))
dat_all = merge(dat_bird, dat_habitat)

Numerical Data Exploration

Next I need to do some numerical and graphical data exploration.

I’ll start with a 5-number summary, and then plot a histogram.

The summary:

summary(dat_all$WIWA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00    2.00    1.67    2.00    6.00

Graphical Exploration: Histograms

Default histogram

Next, we’ll plot a histogram of census counts.

Here’s the histogram that R produces with default settings:

hist(dat_all$WIWA)

It’s not a pretty histogram.

Histogram with custom breaks

Let’s try setting the breaks argument to 7. This will suggest to R that it should create 7 bins (corresponding to observations between 0 and 7 wrens):

hist(dat_all$WIWA, breaks = 7)

NOTE: if you look closely, this histogram includes both the observations of zero and one in the first bin. We requested 7 bins, but it only plotted 6!

When you use a single number for the breaks argument, R interprets it as a suggestion, but it’s binning algorithm may decide to use a different number!

Histogram with custom breaks attempt 2

One more try: we need to un-group the zero and one counts so that we see them as distinct bars in the histogram. We used a single number for the breaks argument to tell to try to automatically figure out how to divide the counts into 7 bins.

You can use a vector for the breaks argument. R attempts to estimate bin breakpoints when breaks is a single number, but it will honor your input if you provide a vector of breakpoints:

hist(dat_all$WIWA, breaks = 0:7)

Still not exactly what we want. It turns out that for the first bin, R includes both endpoints in its count. In this case it’s counting all of the 0- and 1-observations together.

Histogram with custom breaks attempt 3: success!

We can trick R into only counting the lower endpoint if we cleverly manipulate the sequence that we give to bins.

0:7 - 0.5
## [1] -0.5  0.5  1.5  2.5  3.5  4.5  5.5  6.5

If we write code that subtracts a single value (a scalar in linear algebra lingo) from a vector, R will subtract the number from each element in the vector and return the output.

hist(dat_all$WIWA, breaks = 0:7 - .5)

This works because the data were discrete counts. It also looks nicer because the bars are now centered over the census counts.

  • This trick doesn’t work as well for continuous data.

Histograms with (discrete) count data

If we wanted to use code like this with data for which we didn’t know the maximum value ahead of time we could write:

par(mfrow = c(1, 2))
dat = dat_all$WIWA
hist(dat, breaks = 0:(max(dat) + 1) - 0.5, main = "Histogram of\nWilson's Warbler counts")

dat = dat_all$GRJA
hist(dat, breaks = 0:(max(dat) + 1) - 0.5, main = "Histogram of\nGray Jay counts")

Wilson’s Warbler Sum of Log-Likelihoods

I’ll try a Poisson distribution with lambda = 1.0:

sum(log(dpois(x = dat_all$WIWA, lambda = 1.0)))
## [1] -1739.857

Questions

Q1-2

Consider the likelihood calculation example for 2 observations of Wilson’s Warblers using a Poisson distribution model.

  • Recall the R-code I used in the walkthrough to find the log-likelihood of using \(\lambda = 4.5\):
wiwa_counts = c(2, 6)
dpois(x = wiwa_counts, lambda = 4.5)
## [1] 0.1124786 0.1281201
  • As a group, experiment with different values of \(\lambda\) to find a value that maximizes the sum of log-likelihoods of the hypothetical observations.
    • You only need to consider 2 significant figures!
  • Q1 (1 pt.): What value for \(\lambda\) did you select?
  • Q2 (1 pt.): How did you choose a value?

Q3-5

In the walkthrough, I tried using a Poisson distribution with \(\lambda = 1.0\) to model the set of census counts of Wilson’s Warbler.

You will create your own models for the census count of Winter Wrens.

  • Find the \(\lambda\) value of a Poisson model that makes all of the observed the Winter Wren’s census counts most likely.

  • Plot a histogram of Winter Wren counts (Check the metadata file to find the abbreviation for Winter Wrens). Make sure you choose an appropriate number of bins for the plot.

  • As a group, you’ll be searching for the value of \(\lambda\) that maximizes the sum of log-likelihood. In other words, you’ll find the the model parameter that makes the values of observed wren census counts most likely.

    • Hint: you only need to consider 2 significant figures in your value of \(\lambda\).
  • Q3 (1 pt.): Include your winter wren histogram in your report.
  • Q4 (1 pt.): What value for \(\lambda\) did you select?
  • Q5 (2 pts.): Show the R code you used to calculate the Poisson log-likelihood for the vector of Winter Wren census counts (using your chosen \(\lambda\).

Q6-9

In the walkthrough, I tried using a Poisson distribution with \(\lambda = 1.0\) to model the set of census counts of Wilson’s Warblers.

You will create models for the census count of Winter Wrens.

  • Suppose you wanted to use a binomial distribution instead of a Poisson distribution.
  • Find values of the parameters for a binomial distribution that make the vector of census counts most likely.
  • Q6 (1 pt.): What are the two parameters for a binomial distribution and what do they represent?
  • Q7 (2 pts.): What were the parameter values you selected?
  • Q8 (1 pt.): How did you choose a value for \(n\)?
  • Q9 (2 pts.): Show R code you used to calculate the binomial log-likelihood for the vector of Winter Wren census counts.

Q10-11

You explored the use of two discrete parametric distributions to model the Winter Wren census counts.

  • Which one was better?
  • Q10 (1 pt.): Considering a Maximum Likelihood criterion, which model better fit the data?
  • Q11 (2 pts.): Considering what you know about the Binomial and Poisson distributions, which model is more appropriate for census count data?

Q12-14

Consider a vector of random, normally-distributed numbers:

set.seed(1)
vec_rnorm = rnorm(n = 10, mean = 0, sd = 1)

We know that the numbers came from a normally-distributed population with mean 0 and standard deviation 1.

How could I calculate the log-likelihood that the observed numbers in the vector came from a standard normal distribution?

  • Q12 (1 pt.): Create the vector (make sure you set seed to 1) and calculate the log likelihood that it came from a standard normal distribution.
  • Q13 (2 pts.): Show the R code you used to make the calculation
  • Q14 (2 pts.): Use the guess and check method to find the maximum likelihood optimal values for mean and standard deviation (to 2 significant figures) of the numbers in the vector.

Q15-16

Load the Palmer penguins dataset.

Assume the data for flipper length are normally-distributed (they are not).

Determine maximum likelihood estimates for mean and standard deviation of a normal distribution that best describes the flipper lengths.

  • Q15 (1 pt.): How did you choose your starting guess values for mean and standard deviation?
  • Q16 (2 pts.): Using the guess and check method, what were the maximum likelihood estimates of mean and standard deviation of flipper length (to the nearest whole number).