Objectives

  • Practice calculating probabilities using R’s distribution functions.

R’s probability functions

R has built-in functions for calculating the probabilities of events for many parametric distributions.

The d-functions (dbinom, dnorm, dpois, etc.) calculate the probability density (or mass) of a single event. For a discrete distribution like the binomial, the probability mass of an event is the probability that the event happens.

Each of the d-functions has different arguments, depending on the specific parameters of the distribution in question.

For example, recall the Poisson distribution, which has 1 parameter: lambda.

Using dpois I can ask what is the probability that I observe a count of exactly 7 if I have a poisson-distributed population with lambda = 10.4:

dpois(x = 7, lambda = 10.4)
## [1] 0.07945848

It turns out I have about an 8% chance to observe such an event.

  • What’s the probability that I observe exactly 8?

Binomial Probabilities

Recall the binomial distributions, and their two parameters:

  1. n: the number of trials. R calls this size.
  2. p: the probability of success on each trial. R calls this prob.

For example, if I had a set of six forest plots, and I knew from earlier observations that when I do a survey, I observe birds in about 2/3 of the plots (I observe 4 presences out of 6 sites), I could use a binomial distribution to model my set of plots.

Using dbinom

Now, check out the help entry for dbinom. I’ve calculated the parameters for my binomial distribution, and I could use dbinom to calculate probabilities of observing different numbers of bird presences.

Normal Probabilities

Recall that for discrete distributions, the height of the probability mass function is the probability of a specific event, but for continuous distributions it is simply a measure of relative likelihood.

Using dnorm()

Keeping this in mind, using a standard normal distribution, I can ask whether I am more likely to observe a value of 0.5 or 1.0 using dnorm():

# Standard normal has mean = 0 and sd = 1
dnorm(0.5, mean = 0, sd = 1)
## [1] 0.3520653
dnorm(1, mean = 0, sd = 1)
## [1] 0.2419707

Which was more likely?

Using pnorm()

The function pnorm() calculates the cumulative density of a normal distribution.

For example, if I wanted to know the probability of observing a value of 0.5 or less from a standard normal distribution, I could use the following R code:

pnorm(0.5, mean = 0, sd = 1)
## [1] 0.6914625

Would you expect the probability of observing a value of 1.0 to be less than or greater than this quantity?

Plotting Probability Functions

One of the best ways to understand probability functions is to practice plotting them. You’ve already seen lots of plots in the notes, but now you’ll make your own.

Probability Density Plot

The general procedure for plotting a Probabiltiy Density function is:

  • Figure out which values of x you want to plot the corresponding y-values for.
  • Create a vector of x-values.
  • Create a vector of the corresponding y-values using a d-function.
  • Plot!

Here’s an example using the Standard Normal Distribution

# How many points?
n = 13

# Create a vector of x-values from -4 to 4:
x = seq(from = -6, to = 6, length.out = n)

# Create the corresponding y-values:
y = dnorm(x, mean = 0, sd = 1)

# plot!
plot(y ~ x, type = "l")

That doesn’t look very good, why not?

Let’s try a higher number of points:

# How many points?
n = 1000

# Create a vector of x-values from -4 to 4:
x = seq(from = -6, to = 6, length.out = n)

# Create the corresponding y-values:
y = dnorm(x, mean = 0, sd = 1)

# plot!
plot(y ~ x, type = "l", ylab = "Probability Density")

Now I want you to fill in the code below to plot a normal distribution with a mean of 0 and a standard deviation of 2.
Use the same x-values becuase you are going to plot it on the same figure.

y_2 = dnorm(x, mean = ?, sd = ?)

When you’re successful, you can use the following code to create the plot:

plot(y ~ x, type = "l", ylab = "Probability Density")
points(y_2 ~ x, type = "l", lty  = 2)

Finally, I’d like you to create a third Normal distribution with a standard deviation of 1 and a mean of -2:

Cumulative Density Plot

The cumulative density function is closely related to the probability density function. Recall we use the p-functions, like pnorm() for the cumulative probability. We can plot the CDF for the standard normal:

y_cdf_1 = pnorm(x, mean = 0, sd = 1)
plot(y_cdf_1 ~ x, type = "l", ylab = "cumulative density")

We can add the CDF curve for the normal distribution with sd = 2:

y_cdf_2 = pnorm(x, mean = 0, sd = 2)
plot(y_cdf_1 ~ x, type = "l", ylab = "Cumulative Density")
points(y_cdf_2 ~ x, type = "l", lty = 2)

I’ll let you add the curve for the normal with mean = -2 and sd = 1.

Your plot should look like this:

For your submission, use par(mfrow = c(1, 2)) to make a side-by-side plot:

Binomial Mass Plot

Recall that the height of the curve for a PDF (for a continuous distribution) gives a relative measure of likelihood, while the height of the curve for a PMF (for a discrete distribution) gives the true probability.

For example, I can create a bar plot of a PMF for a binomial distribution with n = 5, and p = 0.4:

x_bin = 0:5
y_bin_2 = dbinom(x_bin, size = 5, prob = 0.4)

barplot(
  height = y_bin_2,
  # the names to print with each bar:
  names.arg = x_bin,
  # Tells R to remove space between bars:
  space = 0,
  ylab = "Pr(x)",
  main = "Binomial: n = 5, p = 0.4")

  • What is the most likely value?
  • How likely are you to observe a value of 5?

To get a feel for how the the prob(success) parameter affects the shape, try plotting with different values.

For example, I can change p to 0.5

  • Now what’s the most likely value?

Try out some different values for n and p to get a feel for how the PMF shape changes.

Now it’s your turn. Create a bar plot of a binomial PMF using:

  • n = 6
  • p = 2/3

Your plot should look like:

Questions

Binomial Probabilities: Q1-3

Recall the hypothetical forest plots:

  • There are six plots.
  • I usually observe birds in about 2/3rds of the plots.
  • Q1 (1 pt.): If I wanted to use a binomial distribution to model my six forest plots, what values should I use for the two parameters of a binomial distribution?

  • Q2 (1 pt.): Use dbinom to calculate the probability of observing birds in exactly four of the six patches. Include your R-code in your answer.

  • Q3 (1 pt.): Now, suppose I did a survey and observed no birds in my plots. Use dbinom to calculate the probability of observing no presences.

Binomial cumulative probability: p-functions: Q4-5

The d-functions calculate the probability density or mass of observing a specific event.

I can use the p-functions to calculate the probabilities of observing a range of events.

For example, I can use ppois to ask what is the probability that I observe a count of 7 or fewer if I have a poisson-distributed population with lambda = 10.4:

ppois(q = 7, lambda = 10.4)
## [1] 0.1863271

Law of Total Probability and Complementary Events

I can also use it to ask what the probability of observing a count greater than 7 using the law of total probability:

1 - ppois(q = 7, lambda = 10.4)
## [1] 0.8136729
  • How could I change the code to calculate the probability of observing seven or greater?
  • Q4 (1 pt.): Back to the binomial scenario (bird presence/absence in 6 forest plots).
    • Now use pbinom to calculate the probability of observing four or fewer presences in the 6 plots. Show your R code.
  • Q5 (1 pt.): Now use pbinom and the law of total probability to calculate the probability of observing four or more presences in the 6 plots. Show your R code.

Hint: this is not the complementary event of observing four or fewer!

Normal probabilities: Q6-8

For these questions, you’ll consider a standard normal distribution.

Use the following plot to help:

  • Q6 (1 pt.): Are you more likely to observe a value of 1.0 or 2.0?
  • Q7 (1 pt.): What is the probability of observing a value of 1.0 or less? Show the R code you used to find your answer.
  • Q8 (1 pt.): What is the probability of observing a value between 1.0 and 2.0? Show the R code you used to find your answer.

Normal Plots: Q9-10

Create the required Normal plots.

As a reminder, your Normal plot should look similar to:

  • Q9 (2 pts.): Show the complete R-code you used to create your plot. Make sure you include all the code to recreate your plot in a fresh R session.
  • Q10 (1 pt.): Include a figure of your plot.

Binomial Plot: Q11-12

Create the required Binomial plot.

Your Binomial plot should look similar to:

  • Q11 (2 pts.): Show the complete R-code you used to create your plot. Make sure you include all the code to recreate your plot in a fresh R session.
  • Q12 (1 pt.): Include a figure of your plot.

Report

Upload your group’s answers to the questions to Moodle