R has built-in functions for calculating the probabilities of events for many parametric distributions.
The d-functions (dbinom
, dnorm
,
dpois
, etc.) calculate the probability density (or mass) of
a single event. For a discrete distribution like the binomial, the
probability mass of an event is the probability that the event
happens.
Each of the d-functions has different arguments, depending on the specific parameters of the distribution in question.
For example, recall the Poisson distribution, which has 1 parameter: lambda.
Using dpois
I can ask what is the probability that I
observe a count of exactly 7 if I have a poisson-distributed
population with lambda = 10.4:
dpois(x = 7, lambda = 10.4)
## [1] 0.07945848
It turns out I have about an 8% chance to observe such an event.
Recall the binomial distributions, and their two parameters:
size
.prob
.For example, if I had a set of six forest plots, and I knew from earlier observations that when I do a survey, I observe birds in about 2/3 of the plots (I observe 4 presences out of 6 sites), I could use a binomial distribution to model my set of plots.
dbinom
Now, check out the help entry for dbinom
. I’ve
calculated the parameters for my binomial distribution, and I could use
dbinom
to calculate probabilities of observing different
numbers of bird presences.
Recall that for discrete distributions, the height of the probability mass function is the probability of a specific event, but for continuous distributions it is simply a measure of relative likelihood.
dnorm()
Keeping this in mind, using a standard normal distribution, I can ask
whether I am more likely to observe a value of 0.5 or 1.0 using
dnorm()
:
# Standard normal has mean = 0 and sd = 1
dnorm(0.5, mean = 0, sd = 1)
## [1] 0.3520653
dnorm(1, mean = 0, sd = 1)
## [1] 0.2419707
Which was more likely?
pnorm()
The function pnorm()
calculates the cumulative
density of a normal distribution.
For example, if I wanted to know the probability of observing a value of 0.5 or less from a standard normal distribution, I could use the following R code:
pnorm(0.5, mean = 0, sd = 1)
## [1] 0.6914625
Would you expect the probability of observing a value of 1.0 to be less than or greater than this quantity?
One of the best ways to understand probability functions is to practice plotting them. You’ve already seen lots of plots in the notes, but now you’ll make your own.
The general procedure for plotting a Probabiltiy Density function is:
Here’s an example using the Standard Normal Distribution
# How many points?
n = 13
# Create a vector of x-values from -4 to 4:
x = seq(from = -6, to = 6, length.out = n)
# Create the corresponding y-values:
y = dnorm(x, mean = 0, sd = 1)
# plot!
plot(y ~ x, type = "l")
That doesn’t look very good, why not?
Let’s try a higher number of points:
# How many points?
n = 1000
# Create a vector of x-values from -4 to 4:
x = seq(from = -6, to = 6, length.out = n)
# Create the corresponding y-values:
y = dnorm(x, mean = 0, sd = 1)
# plot!
plot(y ~ x, type = "l", ylab = "Probability Density")
Now I want you to fill in the code below to plot a normal
distribution with a mean of 0 and a standard deviation of 2.
Use the same x-values becuase you are going to plot it on the same
figure.
y_2 = dnorm(x, mean = ?, sd = ?)
When you’re successful, you can use the following code to create the plot:
plot(y ~ x, type = "l", ylab = "Probability Density")
points(y_2 ~ x, type = "l", lty = 2)
Finally, I’d like you to create a third Normal distribution with a standard deviation of 1 and a mean of -2:
The cumulative density function is closely related to the probability
density function. Recall we use the p-functions, like
pnorm()
for the cumulative probability. We can plot the CDF
for the standard normal:
y_cdf_1 = pnorm(x, mean = 0, sd = 1)
plot(y_cdf_1 ~ x, type = "l", ylab = "cumulative density")
We can add the CDF curve for the normal distribution with sd = 2:
y_cdf_2 = pnorm(x, mean = 0, sd = 2)
plot(y_cdf_1 ~ x, type = "l", ylab = "Cumulative Density")
points(y_cdf_2 ~ x, type = "l", lty = 2)
I’ll let you add the curve for the normal with mean = -2 and sd = 1.
Your plot should look like this:
For your submission, use par(mfrow = c(1, 2))
to make a
side-by-side plot:
Recall that the height of the curve for a PDF (for a continuous distribution) gives a relative measure of likelihood, while the height of the curve for a PMF (for a discrete distribution) gives the true probability.
For example, I can create a bar plot of a PMF for a binomial distribution with n = 5, and p = 0.4:
x_bin = 0:5
y_bin_2 = dbinom(x_bin, size = 5, prob = 0.4)
barplot(
height = y_bin_2,
# the names to print with each bar:
names.arg = x_bin,
# Tells R to remove space between bars:
space = 0,
ylab = "Pr(x)",
main = "Binomial: n = 5, p = 0.4")
To get a feel for how the the prob(success) parameter affects the shape, try plotting with different values.
For example, I can change p to 0.5
Try out some different values for n and p to get a feel for how the PMF shape changes.
Now it’s your turn. Create a bar plot of a binomial PMF using:
Your plot should look like:
Recall the hypothetical forest plots:
Q1 (1 pt.): If I wanted to use a binomial distribution to model my six forest plots, what values should I use for the two parameters of a binomial distribution?
Q2 (1 pt.): Use dbinom
to calculate
the probability of observing birds in exactly four of the six
patches. Include your R-code in your answer.
Q3 (1 pt.): Now, suppose I did a survey and
observed no birds in my plots. Use dbinom
to calculate the
probability of observing no presences.
The d-functions calculate the probability density or mass of observing a specific event.
I can use the p-functions to calculate the probabilities of observing a range of events.
For example, I can use ppois
to ask what is the
probability that I observe a count of 7 or fewer if I have a
poisson-distributed population with lambda = 10.4:
ppois(q = 7, lambda = 10.4)
## [1] 0.1863271
Law of Total Probability and Complementary Events
I can also use it to ask what the probability of observing a count greater than 7 using the law of total probability:
1 - ppois(q = 7, lambda = 10.4)
## [1] 0.8136729
pbinom
to calculate the probability of
observing four or fewer presences in the 6 plots. Show your R
code.pbinom
and the law
of total probability to calculate the probability of observing four
or more presences in the 6 plots. Show your R code.Hint: this is not the complementary event of observing four or fewer!
For these questions, you’ll consider a standard normal distribution.
Use the following plot to help:
Create the required Normal plots.
As a reminder, your Normal plot should look similar to:
Create the required Binomial plot.
Your Binomial plot should look similar to:
Upload your group’s answers to the questions to Moodle