You’ll need bird.sta.csv
and hab.sta.csv
in
your data
subfolder.
Likelihood is a measure of the relative probability of observing a set of one or more observations given a proposed model and parameter values.
Likelihood of a single observation is proportional to the value of the probabiltiy density/mass function.
Likelihood of multiple independent observations is proportional the product of individual likelihoods.
Maximum likelihood (ML) is a criterion for selecting a model.
The likelihood of a collection of observations is measured as the sum of log-likelihoods of individual observations in a dataset.
ML selection procedures select the model parameter values that maximize the sum of log-likelihoods.
We’ve already met another selection, least squares.
Likelihood is calculated from the probabiltiy density/mass functions.
d___()
functions in R?Let’s practice calculating likelihoods:
Here’s an example in R:
x_observed = c(2, 6)
print(x_observed)
## [1] 2 6
I think I can model the population of birds at the study sites with a Poisson distribution.
Choosing a model:
I think the count of 2 is unusually low, so I decide to propose a Poisson distribution with \(\lambda = 4.5\) as a model of the counts of Wilson’s Warblers on my study sites.
I can use dpois()
to calculate the probability mass
for the two counts given a Poisson distribution with \(\lambda = 4.5\)
dpois(x = 2, lambda = 4.5)
## [1] 0.1124786
dpois(x = 6, lambda = 4.5)
## [1] 0.1281201
I know the likelihood of observing those particular counts together is the product of the individual likelihoods:
dpois(x = 2, lambda = 4.5) * dpois(x = 6, lambda = 4.5)
## [1] 0.01441077
I can take advantage of vectorization in R by storing the counts in a single vector:
wiwa_counts = c(2, 6)
dpois(x = wiwa_counts, lambda = 4.5)
## [1] 0.1124786 0.1281201
I can more easily calculate the product now:
prod(dpois(x = wiwa_counts, lambda = 4.5))
## [1] 0.01441077
And the sum of log-likelihoods:
sum(log(dpois(x = wiwa_counts, lambda = 4.5)))
## [1] -4.239779
Now let’s say I want to find the value of \(\lambda\) that maximizes the likelihood for the counts of Wilson’s Warblers.
I first need to load the bird data:
dat_bird = read.csv(here::here("data", "bird.sta.csv"))
dat_habitat = read.csv(here::here("data", "hab.sta.csv"))
dat_all = merge(dat_bird, dat_habitat)
Next I need to do some numerical and graphical data exploration.
I’ll start with a 5-number summary, and then plot a histogram.
The summary:
summary(dat_all$WIWA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 2.00 1.67 2.00 6.00
Next, we’ll plot a histogram of census counts.
Here’s the histogram that R produces with default settings:
hist(dat_all$WIWA)
It’s not a pretty histogram.
Let’s try setting the breaks
argument to 7. This will
suggest to R that it should create 7 bins (corresponding to
observations between 0 and 7 wrens):
hist(dat_all$WIWA, breaks = 7)
NOTE: if you look closely, this histogram includes both the observations of zero and one in the first bin. We requested 7 bins, but it only plotted 6!
When you use a single number for the breaks
argument, R
interprets it as a suggestion, but it’s binning algorithm may
decide to use a different number!
One more try: we need to un-group the zero and one counts so that we
see them as distinct bars in the histogram. We used a single number for
the breaks
argument to tell to try to automatically figure
out how to divide the counts into 7 bins.
You can use a vector for the breaks
argument. R attempts
to estimate bin breakpoints when breaks
is a single number,
but it will honor your input if you provide a vector of breakpoints:
hist(dat_all$WIWA, breaks = 0:7)
Still not exactly what we want. It turns out that for the first bin, R includes both endpoints in its count. In this case it’s counting all of the 0- and 1-observations together.
We can trick R into only counting the lower endpoint if we cleverly manipulate the sequence that we give to bins.
0:7 - 0.5
## [1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5
If we write code that subtracts a single value (a scalar in linear algebra lingo) from a vector, R will subtract the number from each element in the vector and return the output.
hist(dat_all$WIWA, breaks = 0:7 - .5)
This works because the data were discrete counts. It also looks nicer because the bars are now centered over the census counts.
If we wanted to use code like this with data for which we didn’t know the maximum value ahead of time we could write:
par(mfrow = c(1, 2))
dat = dat_all$WIWA
hist(dat, breaks = 0:(max(dat) + 1) - 0.5, main = "Histogram of\nWilson's Warbler counts")
dat = dat_all$GRJA
hist(dat, breaks = 0:(max(dat) + 1) - 0.5, main = "Histogram of\nGray Jay counts")
I’ll try a Poisson distribution with lambda = 1.0:
sum(log(dpois(x = dat_all$WIWA, lambda = 1.0)))
## [1] -1739.857
Consider the likelihood calculation example for 2 observations of Wilson’s Warblers using a Poisson distribution model.
wiwa_counts = c(2, 6)
dpois(x = wiwa_counts, lambda = 4.5)
## [1] 0.1124786 0.1281201
In the walkthrough, I tried using a Poisson distribution with \(\lambda = 1.0\) to model the set of census counts of Wilson’s Warbler.
You will create your own models for the census count of Winter Wrens.
Find the \(\lambda\) value of a Poisson model that makes all of the observed the Winter Wren’s census counts most likely.
Plot a histogram of Winter Wren counts (Check the metadata file to find the abbreviation for Winter Wrens). Make sure you choose an appropriate number of bins for the plot.
As a group, you’ll be searching for the value of \(\lambda\) that maximizes the sum of log-likelihood. In other words, you’ll find the the model parameter that makes the values of observed wren census counts most likely.
In the walkthrough, I tried using a Poisson distribution with \(\lambda = 1.0\) to model the set of census counts of Wilson’s Warblers.
You will create models for the census count of Winter Wrens.
You explored the use of two discrete parametric distributions to model the Winter Wren census counts.
Consider a vector of random, normally-distributed numbers:
set.seed(1)
vec_rnorm = rnorm(n = 10, mean = 0, sd = 1)
We know that the numbers came from a normally-distributed population with mean 0 and standard deviation 1.
How could I calculate the log-likelihood that the observed numbers in the vector came from a standard normal distribution?
Load the Palmer penguins dataset.
Assume the data for flipper length are normally-distributed (they are not).
Determine maximum likelihood estimates for mean and standard deviation of a normal distribution that best describes the flipper lengths.