Learning Objectives
- Use the distribution functions in R.
- Sampling distribution insight
Seeing Theory
You’ll be using the
Seeing
Theory site to practice help with graphical intuition of
Frequentist concepts.
It’s a fantastic site, and I encourage you to check out the parts
that we won’t cover in this assignment.
Distribution functions for discrete distributions
R implements probability, cumulative probability, and quantile
functions for many parametric distributions.
The naming convention is to use the name, or an abbreviation, of the
distribution preceded by a prefix. For example with the
binomial distribution the functions are:
dbinom:
the probability mass function
pbinom:
the cumulative mass function
qbinom:
the quantile function
- Remember that discrete distributions use the term mass
rather than density.
- Remember that each distribution has different parameters, and you’ll
have to look at the documentation on how to specify them with the
probability functions.
For a discrete parametric distribution, the distribution
functions allow you to ask the following questions:
- What is the probability that I observe a value of
exactly \(x\)?: The
probability mass function, e.g.
dbinom()
- What is the probability that I observe a value of \(x\) or less?: The cumulative mass
function, e.g.
pbinom()
- What is the probability that I observe a value of \(x\) or more?: The cumulative mass
function, e.g.
pbinom()
- What is the median or 50th percentile?: The
quantile function, e.g.
qbinom()
- What is the 90th percentile?: The quantile function,
e.g.
qbinom()
Distribution functions for continuous distributions
The naming convention is the same as for discrete distributions. For
example with the normal distribution the functions are:
dnorm: the
probability density function
pnorm: the
cumulative density function
qnorm: the
quantile function
- Remember that continuous distributions use the term density
rather than mass.
- Remember that each distribution has different parameters, and you’ll
have to look at the documentation on how to specify them with the
probability functions.
For a continuous parametric distribution, the distribution
functions allow you to ask the following questions:
- Is a value of 1.2 or 2.4 more likely? - The probability density
function, e.g.
dnorm()
- What is the probability that I observe a value between 1.2 and 2.4?:
The cumulative density function, e.g.
pnorm()
- What is the probability that I observe a value of 1.3 or more?: The
cumulative density function, e.g.
pnorm()
- What is the probability that I observe a value of 2.4 or less?: The
cumulative density function, e.g.
pnorm()
- What is the 20th percentile of fish lengths?: The
quantile function, e.g.
qnorm()
The law of total probability
You can use R’s distribution functions to answer many probability
questions, but sometimes you have to be creative and use them in
combinations.
The sum of all events in the sample space is 1.0.
Sometimes it’s easier to calculate the probability of the
complement of an event, rather than the event’s probability
directly.
Suppose you have an event \(E\)
The complement of an event \(E^c\) is just the rest of the sample space
not occupied by \(E\).
\(Pr(E) + Pr(E^c) = 1.0\)
Total probability: Normal distribution example
The r function pnorm()
gives us the probability of
observing a value of \(x\) or
less.
What is the probability of observing a value less than 7.5 in a
normal distribution with mean 10 and standard deviation 3?
It’s easy with pnorm()
:
pnorm(7.5, mean = 10, sd = 3)
## [1] 0.2023284
We have around a 20% chance to observe a value of 7.5 or less.
- What if we want to know the probability of observing the value of
\(x\) or higher?
HINT: you could use pnorm()
and the law of total
probability to figure it out!
Sampling distribution
Recall that the sampling distribution describes the
distribution of a sample statistic.
- Do you remember the two main factors that determine the width of a
sampling distribution?
Central limit theorem and the sampling distribution
Navigate to the
probability
theory page on Seeing Theory.
Follow the link to the Central Limit Theorem
section.
This demo samples values from the beta distribution,
a two-parameter continuous distribution with a bounded domain from 0 to
1.
- Adjust the sliders to see how the two parameters affect the
distribution’s shape.
Try sampling with different numbers of draws and samples.
Questions
Q 1: Binomial Probability 1
- Q1 (2 pts.): What is the probability of observing a
count of exactly 3 successes in a binomial distribution with
parameters n = 4 and p = 0.75?
- Include your answer and the R code you used to find it.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 2: Binomial Probability 2
- Q2 (2 pts.): What is the probability of observing a
count of 3 successes or fewer in a binomial distribution with
parameters n = 4 and p = 0.75?
- Include your answer and the R code you used to find it.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 3: Binomial Probability 3
- Q3 (2 pts.): What is the probability of observing
more than 3 successes in a binomial distribution with
parameters n = 5 and p = 0.75?
- Include your answer and the R code you used to find it.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 4: Normal Probability 1
- Q4 (2 pts.): - What is the probability of observing
a value of less than 1.2 from a normally-distributed population
with mean = 2 and standard deviation = 2?
- Include your answer and the R code you used to find it.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 5: Normal Probability 2
- Q5 (2 pts.): - What is the probability of observing
a value of greater than 1.2 from a normally-distributed
population with mean = 2 and standard deviation = 2?
- Include your answer and the R code you used to find it.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 6: Normal Probability 3
- Q6 (4 pts.): - What is the probability of observing
a value between 1.2 and 3.2 from a normally-distributed
population with mean = 2 and standard deviation = 2?
- Include both your answer and the R code you used.
- Note: To receive full credit, you cannot use
lower.tail = FALSE
in your code.
Q 7: Central Limit Theorem 1
Navigate to the
probability
theory page on Seeing Theory.
Follow the link to the Central Limit Theorem
section.
- Choose a set of \(\alpha\) and
\(\beta\) parameters that result in a
skewed beta distribution.
- Set the sample size to 1 and the
draws to 50.
- Hit the sample button several times and observe the evolution of the
histogram.
- Q7 (2 pts.): Describe how the shape of the
histogram changes as you continue to press the sample
button.
Q 8: Central Limit Theorem 2
- Choose a set of \(\alpha\) and
\(\beta\) parameters that result in a
skewed beta distribution.
- Set the sample size to 2 and the
draws to 50.
- Hit the sample button several times and observe the evolution of the
histogram.
- Q8 (2 pts.): Describe how the shape of the
histogram changes as you continue to press the sample
button.
Q 9-11: Central Limit Theorem 3
- Choose a set of \(\alpha\) and
\(\beta\) parameters that result in a
skewed beta distribution.
- Set the sample size to 15 and the
draws to 50.
- Hit the sample button several times and observe the evolution of the
histogram.
- Q9 (2 pts.): Describe how the shape of the
histogram changes as you continue to press the sample
button.
- Q10 (2 pts.): Why is there such a drastic change in
the shape of the sampling distribution when you change the sample size
from 1 to 2?
- Q11 (2 pts.): What are the two main factors that
determine the width of the sampling distribution of the mean?
Q 12: Library of Babel 1
“Those examples allowed a librarian of genius to discover the
fundamental law of the Library. This philosopher observed that all
books, however different from one another they might be, consist of
identical elements: the space, the period, the comma, and the twenty-two
letters of the alphabet. …”
A set of 25 characters doesn’t seem unmanageable.
“He also posited a fact which all travelers have since confirmed: In
all the Library, there are no two identical books. From those
incontrovertible premises, the librarian deduced that the Library
is”total“-perfect, complete, and whole-and that its bookshelves contain
all possible combinations of the twenty-two orthographic symbols (a
number which, though unimaginably vast, is not infinite)-that is, all
that is able to be expressed, in every language. …”
That sounds like a permutations/combinations problem.
There are \(25\) possible words
consisting of a single character in the Library. There are \(25 \times 25 = 25^2 = 625\) possible
2-character words.
- Q12 (2 pts.): How many 3-character words are
possible?
Q 13: Library of Babel 2
Given the properties of the books in the Library:
- 410 pages
- 40 rows per page
- 80 positions per row
There are \(410 \times 40 \times 80 =
1,312,000\) positions for characters in each book.
Since there are 25 characters in the Library’s character set, there
are a total of \(25 ^ {1,312,000}\)
possible books.
That’s a very large number. It’s about \(2
\times 10^{1,834,097}\). Imagine a 2 followed by almost 2 million
zeroes! It certainly wouldn’t fit on Earth.
That’s such a large number that it’s easier to just think of it in
symbols. Let’s define the variable \(B\) as the number of books in the Library
of Babel.
- Q13 (2 pts.): How many books would the Library
contain if you added one additional position to the book size
(i.e. one extra letter on the last page)? Express your answer in terms
of \(B\).