Introduction

One of the best ways to improve your R skills and comfort level is to experiment with lots of variations on code templates.

One of the best ways to learn about probability distributions and their relationship to real data is to experiment with random number generation. Today, we’ll practice generating random numbers using both the normal and uniform distributions today.

Report

Upload a document with answers to questions 1 - 4 (2 points each)

Include each of your group members’ names in the report.

Only one member of your group needs to submit the report, but everybody in your group should keep a copy of the code you used to do the exercises.

You can do your work in an RMarkdown document and knit it to html.

Alternatively, you may do your work in a word doc (or Google doc), pasting your figures into the document and saving as a pdf.

Random Numbers in R

Because R is a programming language specialized for statistical analysis, it has some sophisticated random number generators built in.

Note: The proper term for random numbers generated via computer is pseudorandom.

Click for more details (optional)

We hope our CPU always produces the same results when we give them the same instructions. Current computers cannot produce truly random numbers (but quantum computers may be able to one day).

We do have very good algorithms for producing sequences of numbers that have the statistical properties of sequences of truly random numbers.

One desirable property of pseudorandom numbers is that we can choose what number we want R to use as a starting key, called the random seed to the generator.

When we specify a seed, R will always create the same sequence. This is useful when we want to test different code on the same data.

set.seed(12345)
rnorm(n = 4)

## [1]  0.5855288  0.7094660 -0.1093033 -0.4534972

set.seed(12345)
rnorm(n = 4)

## [1]  0.5855288  0.7094660 -0.1093033 -0.4534972

Notice what happens if I don’t set the random seed:

rnorm(n = 4)

## [1]  0.6058875 -1.8179560  0.6300986 -0.2761841

rnorm(n = 4)

## [1] -0.2841597 -0.9193220 -0.1162478  1.8173120

Random Number Functions

R’s random number functions are all based on probability distributions.

Some of the most famous distributions are the Normal, Uniform, Binomial, and Poisson distributions. The corresponding R random number generating functions are:

rnorm()
runif()
rbinom()
rpois()

Here is a demo using the uniform distribution for x-values and the Normal distribution for y-values:

# generate a sequence of 20 normally distributed numbers:
rnorm(n = 20, mean = 10, sd = 1.5)

##  [1] 10.555942 10.780325  8.874202 11.225350  8.670464  9.502634 11.681069
##  [8] 10.448086 11.169433 12.183678  9.033507  7.670294  7.603436 12.707646
## [15]  9.277529 10.930570 10.918185  9.756534 11.217810 13.295250

# generate two sequences that you can use as coordinates to make a plot
n_pts = 3000
x = runif(n = n_pts, min = 2, max = 20)
y = rnorm(n_pts, mean = 4, sd = 0.75)

plot(
  x, y,
  main = "Scatterplot of random numbers",
  col = adjustcolor("steelblue", 0.3))

We can also plot histograms of values of the x- and y-coordinates:

hist(x, main = "Histogram of 3000 uniform distributed random numbers")

hist(y, main = "Histogram of 3000 normally distributed random numbers")

Note the differences in the appearance of the two histograms.

Exercises

Question 1

Histograms of uniformly distributed numbers

Experiment with the runif function. Check out the help entry to see how to use the arguments:

n
min
max

Try to create sequences of different lengths and print them to your console: 5, 50, 500.

What are the default upper and lower bounds of the random numbers? How can you change these?

Before you create any plots, discuss your predictions about the histograms might appear different using with small numbers of points vs. the histograms from large numbers of points. Write down your predictions before you make any plots.

Plot histograms with the following uniform random number sequences. You can use this code as a template:

hist(x = runif(n = , min =, max =))

NOTE: you’ll have to fill in numbers for n, min, and max for the code to work.

Histogram with 5 numbers between 0 and 1.
Histogram with 20 numbers between 0 and 1.
Histogram with 200 numbers between 0 and 1.
Histogram with 5000 numbers between 0 and 1.

Your answer needs to contain your histograms and your predictions.

Question 2

Describe the differences in appearance in the histograms as you increased the number of randomly-generated numbers. Did they meet your predictions?

Question 3

Experiment with the rnorm function.

For now, we’ll use the default values for the mean and sd arguments.

Histograms of normally distributed numbers

Before you create any plots, discuss your predictions about the histograms might appear different using with small numbers of points vs. the histograms from large numbers of points. Write down your predictions before you make any plots. Make sure you describe your predictions of how the following will change as you increase the number of points:
- Symmetry
- Smoothness

Plot histograms with the following normally-distributed random number sequences. You can use this code as a template:

hist(x = rnorm(n = ))

NOTE: you’ll have to fill in the appropriate number for n.

Histogram with 5 numbers.
Histogram with 20 numbers.
Histogram with 200 numbers.
Histogram with 5000 numbers.