Learning Objectives

  • Use the distribution functions in R.
  • Sampling distribution insight

Seeing Theory

You’ll be using the Seeing Theory site to practice help with graphical intuition of Frequentist concepts.

It’s a fantastic site, and I encourage you to check out the parts that we won’t cover in this assignment.

Distribution functions for discrete distributions

R implements probability, cumulative probability, and quantile functions for many parametric distributions.

The naming convention is to use the name, or an abbreviation, of the distribution preceded by a prefix. For example with the binomial distribution the functions are:

dbinom: the probability mass function

pbinom: the cumulative mass function

qbinom: the quantile function

  • Remember that discrete distributions use the term mass rather than density.
  • Remember that each distribution has different parameters, and you’ll have to look at the documentation on how to specify them with the probability functions.

For a discrete parametric distribution, the distribution functions allow you to ask the following questions:

  1. What is the probability that I observe a value of exactly \(x\)?: The probability mass function, e.g. dbinom()
  2. What is the probability that I observe a value of \(x\) or less?: The cumulative mass function, e.g. pbinom()
  3. What is the probability that I observe a value of \(x\) or more?: The cumulative mass function, e.g. pbinom()
  4. What is the median or 50th percentile?: The quantile function, e.g. qbinom()
  5. What is the 90th percentile?: The quantile function, e.g. qbinom()

Distribution functions for continuous distributions

The naming convention is the same as for discrete distributions. For example with the normal distribution the functions are:

dnorm: the probability density function

pnorm: the cumulative density function

qnorm: the quantile function

  • Remember that continuous distributions use the term density rather than mass.
  • Remember that each distribution has different parameters, and you’ll have to look at the documentation on how to specify them with the probability functions.

For a continuous parametric distribution, the distribution functions allow you to ask the following questions:

  1. Is a value of 1.2 or 2.4 more likely? - The probability density function, e.g. dnorm()
  2. What is the probability that I observe a value between 1.2 and 2.4?: The cumulative density function, e.g. pnorm()
  3. What is the probability that I observe a value of 1.3 or more?: The cumulative density function, e.g. pnorm()
  4. What is the probability that I observe a value of 2.4 or less?: The cumulative density function, e.g. pnorm()
  5. What is the 20th percentile of fish lengths?: The quantile function, e.g. qnorm()

The law of total probability

You can use R’s distribution functions to answer many probability questions, but sometimes you have to be creative and use them in combinations.

The sum of all events in the sample space is 1.0.

Sometimes it’s easier to calculate the probability of the complement of an event, rather than the event’s probability directly.

Suppose you have an event \(E\)

The complement of an event \(E^c\) is just the rest of the sample space not occupied by \(E\).

\(Pr(E) + Pr(E^c) = 1.0\)

Total probability: Normal distribution example

The r function pnorm() gives us the probability of observing a value of \(x\) or less.

What is the probability of observing a value less than 7.5 in a normal distribution with mean 10 and standard deviation 3?

It’s easy with pnorm():

pnorm(7.5, mean = 10, sd = 3)
## [1] 0.2023284

We have around a 20% chance to observe a value of 7.5 or less.

  • What if we want to know the probability of observing the value of \(x\) or higher?

HINT: you could use pnorm() and the law of total probability to figure it out!

Sampling distribution

Recall that the sampling distribution describes the distribution of a sample statistic.

  • Do you remember the two main factors that determine the width of a sampling distribution?

Central limit theorem and the sampling distribution

Navigate to the probability theory page on Seeing Theory.

Follow the link to the Central Limit Theorem section.

This demo samples values from the beta distribution, a two-parameter continuous distribution with a bounded domain from 0 to 1.

  • Adjust the sliders to see how the two parameters affect the distribution’s shape.

Try sampling with different numbers of draws and samples.

Questions

Q 1: Binomial Probability 1

Binomial Probabiltiy Mass Function

Binomial Probabiltiy Mass Function

  • Q1 (2 pts.): What is the probability of observing a count of exactly 3 successes in a binomial distribution with parameters n = 4 and p = 0.75?
    • Include your answer and the R code you used to find it.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 2: Binomial Probability 2

Binomial Probabiltiy Mass Function

Binomial Probabiltiy Mass Function

  • Q2 (2 pts.): What is the probability of observing a count of 3 successes or fewer in a binomial distribution with parameters n = 4 and p = 0.75?
    • Include your answer and the R code you used to find it.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 3: Binomial Probability 3

Binomial Probabiltiy Mass Function

Binomial Probabiltiy Mass Function

  • Q3 (2 pts.): What is the probability of observing more than 3 successes in a binomial distribution with parameters n = 5 and p = 0.75?
    • Include your answer and the R code you used to find it.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 4: Normal Probability 1

Normal Distribution - Probability Density Function

Normal Distribution - Probability Density Function

  • Q4 (2 pts.): - What is the probability of observing a value of less than 1.2 from a normally-distributed population with mean = 2 and standard deviation = 2?
    • Include your answer and the R code you used to find it.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 5: Normal Probability 2

Normal Distribution - Probability Density Function

Normal Distribution - Probability Density Function

  • Q5 (2 pts.): - What is the probability of observing a value of greater than 1.2 from a normally-distributed population with mean = 2 and standard deviation = 2?
    • Include your answer and the R code you used to find it.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 6: Normal Probability 3

Normal Distribution - Probability Density Function

Normal Distribution - Probability Density Function

  • Q6 (4 pts.): - What is the probability of observing a value between 1.2 and 3.2 from a normally-distributed population with mean = 2 and standard deviation = 2?
    • Include both your answer and the R code you used.
    • Note: To receive full credit, you cannot use lower.tail = FALSE in your code.

Q 7: Central Limit Theorem 1

  • Central Limit Theorem

Navigate to the probability theory page on Seeing Theory.

Follow the link to the Central Limit Theorem section.

  • Sampling Distribution
  1. Choose a set of \(\alpha\) and \(\beta\) parameters that result in a skewed beta distribution.
  2. Set the sample size to 1 and the draws to 50.
  3. Hit the sample button several times and observe the evolution of the histogram.
  • Q7 (2 pts.): Describe how the shape of the histogram changes as you continue to press the sample button.

Q 8: Central Limit Theorem 2

  • Sampling Distribution
  1. Choose a set of \(\alpha\) and \(\beta\) parameters that result in a skewed beta distribution.
  2. Set the sample size to 2 and the draws to 50.
  3. Hit the sample button several times and observe the evolution of the histogram.
  • Q8 (2 pts.): Describe how the shape of the histogram changes as you continue to press the sample button.

Q 9-11: Central Limit Theorem 3

  • Sampling Distribution
  1. Choose a set of \(\alpha\) and \(\beta\) parameters that result in a skewed beta distribution.
  2. Set the sample size to 15 and the draws to 50.
  3. Hit the sample button several times and observe the evolution of the histogram.
  • Q9 (2 pts.): Describe how the shape of the histogram changes as you continue to press the sample button.
  • Q10 (2 pts.): Why is there such a drastic change in the shape of the sampling distribution when you change the sample size from 1 to 2?
  • Q11 (2 pts.): What are the two main factors that determine the width of the sampling distribution of the mean?

Q 12: Library of Babel 1

“Those examples allowed a librarian of genius to discover the fundamental law of the Library. This philosopher observed that all books, however different from one another they might be, consist of identical elements: the space, the period, the comma, and the twenty-two letters of the alphabet. …”

A set of 25 characters doesn’t seem unmanageable.

“He also posited a fact which all travelers have since confirmed: In all the Library, there are no two identical books. From those incontrovertible premises, the librarian deduced that the Library is”total“-perfect, complete, and whole-and that its bookshelves contain all possible combinations of the twenty-two orthographic symbols (a number which, though unimaginably vast, is not infinite)-that is, all that is able to be expressed, in every language. …”

That sounds like a permutations/combinations problem.

There are \(25\) possible words consisting of a single character in the Library. There are \(25 \times 25 = 25^2 = 625\) possible 2-character words.

  • Q12 (2 pts.): How many 3-character words are possible?

Q 13: Library of Babel 2

Given the properties of the books in the Library:

  • 410 pages
  • 40 rows per page
  • 80 positions per row

There are \(410 \times 40 \times 80 = 1,312,000\) positions for characters in each book.

Since there are 25 characters in the Library’s character set, there are a total of \(25 ^ {1,312,000}\) possible books.

That’s a very large number. It’s about \(2 \times 10^{1,834,097}\). Imagine a 2 followed by almost 2 million zeroes! It certainly wouldn’t fit on Earth.

That’s such a large number that it’s easier to just think of it in symbols. Let’s define the variable \(B\) as the number of books in the Library of Babel.

  • Q13 (2 pts.): How many books would the Library contain if you added one additional position to the book size (i.e. one extra letter on the last page)? Express your answer in terms of \(B\).