Lab Credits

In addition to my original materials, this lab contains material adapted from Kevin McGarigal’s ECO 634 course.

Kevin taught this course for many years and his lab and lecture materials are an invaluable resource!

Data Files

You’ll need to retrieve the following files:

vegdata.csv
bird.sub.csv
hab.sub.csv
Save them in the data subdirectory of your main course RProject folder.

Bootstrap: Continuous Data

In the Bootstrap lab (lab 7), you used bootstrap resampling to calculate confidence intervals and create a rarefaction curve.

In this lab, we’ll be using bootstrap resampling to explore sampling distributions of model parameters in the context of one- and two-sample difference of means tests.

Bootstrap: Exploring the Alternative Hypothesis

Recall that t-tests are useful when we have a categorical predictor with two levels and a continuous response variable.

Penguin Data

We’ll use the palmerpenguins dataset again, but this time, let’s remove the Gentoo penguins:

Don’t forget to use droplevels()

Parametric Two-Sample Test

Perform a t-test of the alternative hypothesis that Adelie penguins have shorter flippers than Chinstrap penguins. Is this a one- or two-tailed test?

t.test(flipper_length_mm ~ species, data = penguin_dat, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  flipper_length_mm by species
## t = -5.7804, df = 119.68, p-value = 3.025e-08
## alternative hypothesis: true difference in means between group Adelie and group Chinstrap is less than 0
## 95 percent confidence interval:
##       -Inf -4.186534
## sample estimates:
##    mean in group Adelie mean in group Chinstrap 
##                189.9536                195.8235

Take note of the confidence interval, the p-value, and the group means.

Bootstrap Two-Sample Test

In a previous lab you calculated bootstrap confidence intervals for the mean value of one group of observations.

You can also create a bootstrap confidence interval on the difference of two means… That sounds a lot like a t-test!

Use `two.boot()`

You could write your own custom function to do a two-sample bootstrap, but you are already a whiz at writing functions so you can save some time by checking out two.boot() in the package simpleboot. Remember that you’ll have to install simpleboot before you can use any of its functions.

Install the package.
Look at the help entry for two.boot(). I’ll let you figure out the syntax since you’re getting good at R.
Take special note of what it expects as the first two arguments.
Conduct a bootstrap analysis of the difference in means of the two penguin species.
I suggest using no more than 1000 replicates at first. If your computer is fast, you can increase the number of reps. Once you get your code right, you can run the simulation with 10000 iterations.

## List of 12
##  $ t0       : num -5.87
##  $ t        : num [1:10000, 1] -5.29 -4.02 -6.03 -6.66 -6.89 ...
##  $ R        : num 10000
##  $ data     : int [1:219] 181 186 195 193 190 181 195 193 190 186 ...
##  $ seed     : int [1:626] 10403 337 -31981972 -306496208 255981706 -339368191 1223042828 1397488521 95684349 556754659 ...
##  $ statistic:function (x, idx)  
##  $ sim      : chr "ordinary"
##  $ call     : language boot(data = c(sample1, sample2), statistic = boot.func, R = R, strata = ind,      weights = weights)
##  $ stype    : chr "i"
##  $ strata   : num [1:219] 1 1 1 1 1 1 1 1 1 1 ...
##  $ weights  : num [1:219] 0.00662 0.00662 0.00662 0.00662 0.00662 ...
##  $ student  : logi FALSE
##  - attr(*, "class")= chr "simpleboot"
##  - attr(*, "boot_type")= chr "boot"

I saved the output of two.boot to an object called pen_boot. It’s an complex object (actually a list) with lots of named sub-parts. Recall we can use str() to examine the structure of an object:

str(pen_boot)

## List of 12
##  $ t0       : num -5.87
##  $ t        : num [1:10000, 1] -5.29 -4.02 -6.03 -6.66 -6.89 ...
##  $ R        : num 10000
##  $ data     : int [1:219] 181 186 195 193 190 181 195 193 190 186 ...
##  $ seed     : int [1:626] 10403 337 -31981972 -306496208 255981706 -339368191 1223042828 1397488521 95684349 556754659 ...
##  $ statistic:function (x, idx)  
##  $ sim      : chr "ordinary"
##  $ call     : language boot(data = c(sample1, sample2), statistic = boot.func, R = R, strata = ind,      weights = weights)
##  $ stype    : chr "i"
##  $ strata   : num [1:219] 1 1 1 1 1 1 1 1 1 1 ...
##  $ weights  : num [1:219] 0.00662 0.00662 0.00662 0.00662 0.00662 ...
##  $ student  : logi FALSE
##  - attr(*, "class")= chr "simpleboot"
##  - attr(*, "boot_type")= chr "boot"

The bootstrapped differences are stored in the component called t. How can you retrieve them?

Do you remember the syntax for retrieving named elements from a list? I bet you’d pay good money to be able to do so!

When you figure out how to retrieve the differences, you can make a histogram:

Tree data

We’ll examine a dataset from a field experiment designed to assess tree seedling response to understory vegetation treatments.

Four treatments (including a control) were randomly assigned to 32 plots in a randomized block design. We’ll not worry about the randomized block design part of the experiment.

For now, let’s consider just the number of pine seedlings (pine) under the various treatments.

Let’s visualize the data using box conditional boxplots:

boxplot(pine ~ treatment, dat = veg)

Note the use of the formula notation.
What do you notice about the data?

Tree Treatments

Since we’re focusing on two-sample tests, let’s create a new data frame that contains only the observations that received the “clipped” or “control” treatments.

You can use the subset() function in conjunction with a new operator: %in%:

dat_tree = droplevels(subset(veg, treatment %in% c("control", "clipped")))

Don’t forget the call to droplevels().

Create a new conditional boxplot on the new data to verify that your subsetting worked:

Use table() to determine how many observations are in each of the treatments. I’ll let you figure out the syntax. Hint: you’ll need to tell table() which column of the data to use.

Nonparametric two-sample test

Conduct a Wilcoxon ranked sum test on the difference in means between the treatments.

The syntax for wilcox.test() is nearly identical to that of t.test().
What was the p-value associated with your test?

Bootstrap

Conduct a bootstrap of the tree data. For this demo code, I’m using the object tree_boot that I created using two.boot().

What are the endpoints of a 95% CI?

I can use the boot.ci() function to give find out!

boot.ci(tree_boot)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = tree_boot)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 3.23, 28.83 )   ( 2.50, 27.75 )  
## 
## Level     Percentile            BCa          
## 95%   ( 4.25, 29.50 )   ( 4.38, 29.75 )  
## Calculations and Intervals on Original Scale

The function returns four different confidence intervals. Consult the help entry for the reference describing the different interval types. The ‘Percentile’ interval is the type of CI we’ve manually calculated for bootstrapped data in the bootstrapping lab.

To prove this to yourself, you can use the quantile() function and see if the results match:

quantile(tree_boot$t, c(0.025, 0.975))

##  2.5% 97.5% 
##  4.25 29.50

Finally, we can examine a histogram of the bootstrapped differences in means:

Bird Data: linear model

We can use Monte Carlo randomization to assess the significance of regression parameters.

We’ll estimate the significance of a slope parameter of a simple linear regression.

Although R’s lm() function produces a fitted model with parametric significance values for model coefficients, you may want to use other methods to assess significance. For example, if your data don’t meet the assumptions of a Frequentist linear regression, resampling techniques can sidestep some of the assumptions.

Bird Data

We’ll use a standardized version of the Oregon birds data.

Z-standardization is a form of normalization.

When we modeling data that has values that span different orders of magnitude, normalizing ensures that the range of values for each variable span approximately the same range.

This is helpful for comparing model coefficients.

The data in the subbasin files (which we are using here) are aggregated by sub-basin from the full bird census count data we have used previously.

There are habitat variables for 30 subbasins.

You’ll need to retrieve the bird.sub.csv and hab.sub.csv files from the course github site (see the Data Files section above).

See the corresponding metadata file (birdhab.meta.pdf) for more info.

Read the data files and merge into a single data.frame:

# I already read my data into dat_bird and dat_habitat:
dat_all = merge(
  dat_bird, 
  dat_habitat,
  by = c("basin", "sub"))

head(dat_all[, c("b.sidi", "s.sidi")])

##       b.sidi s.sidi
## 1 0.06678912   0.12
## 2 0.06509689   0.34
## 3 0.06092608   0.78
## 4 0.06012721   0.57
## 5 0.04112905   0.84
## 6 0.06086158   0.73

Note that the b.sidi column values are about 1/10 the value of the s.sidi column.

R has functions for centering and standardizing data, but to illustrate the z-standardization process, we’ll do it manually. Don’t worry, it’s not very difficult!

Recall the process:

Calculate the sample mean and standard deviation.
Subtract the mean from each value.
Divide each value by the sample standard deviation.

Here’s how to do it for the b.sidi column. I’ll create a new column called b.sidi.standardized:

# Calculate the sample mean and sd:
b_sidi_mean = mean(dat_all$b.sidi, na.rm = TRUE)
b_sidi_sd   = sd(dat_all$b.sidi, na.rm = TRUE)

# Use the subset-by-name symbol ($) to create a 
# new column of z-standardized values.

dat_all$b.sidi.standardized = (dat_all$b.sidi - b_sidi_mean)/b_sidi_sd

Now, examine the standardized data to see if our standardization worked:

mean(dat_all$b.sidi.standardized)

## [1] 7.166938e-17

sd(dat_all$b.sidi.standardized)

## [1] 1

Note that \(7.2 \times 10^{-17}\) is effectively 0.

I’ll let you follow the pattern to standardize the s.sidi column.

In this case, the relational fields that link the bird and habitat data sets are:

basin (3 unique basins)
sub (10 unique sub-basins in each basin).

We are only going to use two of the many variables in the data.

Model Variables

We’ll only be using two of the many variables in the merged data:

Simpson’s diversity index for breeding birds: b.sidi
Simpson’s diversity index for vegetation cover types: s.sidi

Note, the latter index represents the diversity in landscape composition as defined largely by vegetation seral (intermediate stages of succession) stage.

Let’s begin by plotting the data:

plot(
  b.sidi ~ s.sidi, data = dat_all,
  main = "Simpson's diversity indices",
  xlab = "Vegetation cover diversity",
  ylab = "Bird diversity")

It appears that there is a negative relationship between bird diversity and vegetation diversity; specifically, bird diversity declines as the vegetation diversity increases.

This seems somewhat counter-intuitive, since we usually think of animal diversity increasing with the diversity of habitats (proxied here by vegetation seral stages).

Could this result have been due to chance?

Or is this relationship likely to be real?

Simple Linear Regression

We can fit a simple linear regression using a Least Squares criterion.

We can examine the model coefficients with coef(). We’ll want to save the slope coefficient for later.

The slope coefficient is labeled s.sidi since that is the predictor we specify in the model.

fit_1 = lm(b.sidi ~ s.sidi, data = dat_all)
coef(fit_1)

## (Intercept)      s.sidi 
##  0.07116980 -0.02437131

slope_observed = coef(fit_1)[2]

You can easily add a regression line from a model fitted by lm() to an existing plot using abline() and the fitted model object.

plot(
  b.sidi ~ s.sidi, data = dat_all,
  main = "Simpson's diversity indices",
  xlab = "Vegetation cover diversity",
  ylab = "Bird diversity")
abline(fit_1)

The Slope Coefficient

We would like to know how likely it is for us to observe a slope coefficient this large and negative if in fact there is no real relationship between bird diversity and vegetation diversity.

We can use the t statistic and corresponding p-value computed by the lm() function, but this assumes that the errors are normally distributed about the mean, which may or may not be the case here.

To be on the safe side, we are going to construct our own null hypothesis test using a Monte Carlo randomization procedure.

First, let’s simplify our data set by extracting just the two variables we need for this exercise:

dat_1 = 
  subset(
    dat_all,
    select = c(b.sidi, s.sidi))

Note the use of the select argument!

Resampling The Data

Our goal is to use resampling techniques, Monte Carlo and bootstrapping, to characterize the null and alternative distributions.

Null Distribution: Monte Carlo Simulation

Monte Carlo randomization breaks up the associations by sampling frandom values from each column, instead of keeping rows intact.

To create a resampled dataset, we can create two vectors of randomly generated row indices. Then we can use these to create two new vectors of bird and vegetation diversity indices.

set.seed(123)
index_1 = sample(nrow(dat_1), replace = TRUE)
index_2 = sample(nrow(dat_1), replace = TRUE)

dat_resampled_i = 
  data.frame(
    b.sidi = dat_1$b.sidi[index_1],
    s.sidi = dat_1$s.sidi[index_2]
  )

fit_resampled_i = lm(b.sidi ~ s.sidi, data = dat_resampled_i)
slope_resampled_i = coef(fit_resampled_i)[2]

print(slope_resampled_i)

##      s.sidi 
## 0.006235381

And we can re-create the scatterplot with regression line:

plot(
  b.sidi ~ s.sidi, data = dat_resampled_i,
  main = "Simpson's diversity indices (MC resampled data)",
  xlab = "Vegetation cover diversity",
  ylab = "Bird diversity")
abline(fit_resampled_i)

We need to do this many times to see what the range of outcomes we expect to see by chance alone.

We know that because of sampling error, a single permutation could by chance have a slope that is different from nearly zero.

Think of the sampling error like a book in the Library of Babel that happens to have several pages of perfect English!

Monte Carlo Randomization Loop

We can repeat the process many times to estimate the sampling distribution of the null hypothesis.

I call it the distribution of the null because the null hypothesis states that there is no association between variables. We destroyed associations by shuffling both columns independently.

First we can pre-allocate a vector to hold the results using numeric(). Check out the help entry to find out more about numeric().

m = 10000 
result_mc = numeric(m)

Now it’s up to you to build a loop that will resample the data, fit a simple linear regression, and extract the slope parameter. Here’s a skeleton:

for(i in 1:m)
{
  index_1 = sample(...,,)
  
  # ... your loop code ...  
  
  result_mc[i] = coef(fit_resampled_i)[2]
}

The Null Distribution

The output of your loop is a collection of regression slope parameters that were calculated on randomized and resampled data.

Plot a histogram of the slope values.

use abline() with the argument v = slope_observed to draw a vertical line showing the value of the slope from the regression model on the observed data.

Your result should look something like this:

hist(
  result_mc,
  main = "Mike's Null Distribution of Regression Slope",
  xlab = "Slope Parameter")
abline(v = slope_observed, lty = 2, col = "red", lwd = 2)

Critical Slope Value

Just like finding the critical z-value for confidence intervals, we can calculate a critical value for the slope.

Use quantile() to find the 5th percentile of the null distribution of slopes.

In my simulation I got:

quantile(result_mc, c(.05))

##          5% 
## -0.01320388

Do you remember what the observed slope of the real data was?

How does it compare to the lower critical value?

How many slopes from the Monte Carlo randomization were equal to or less than the observed slope?

For my simulation, I found 18 slopes (out of 10000 simulations) generated from the null distribution that were equal to or less than the observed slope.

If we want an exact p-value for this lower one-side test, we can compute the percentage of the permuted distribution less than or equal to our observed slope value.

## [1] 0.0018

Alternative Distribution: Bootstrapping

Now that we’ve characterized the null distribution, let’s use bootstrapping to characterize the alternative.

We’ll need slightly different code, and the canned bootstrapping methods in the packages we’ve used won’t help us. But don’t worry, you’ve got the R skills to handle this task!

Recall that for bootstrapping we sample entire rows. This means we only need 1 set of resampled row indices.

set.seed(345)
index_1 = sample(nrow(dat_1), replace = TRUE)

dat_boot = dat_1[index_1, ]
head(dat_boot)

##        b.sidi s.sidi
## 29 0.08263485   0.00
## 23 0.05705873   0.62
## 19 0.05820778   0.54
## 21 0.07254766   0.41
## 18 0.06365076   0.49
## 28 0.06284046   0.36

Now, let’s build another linear model and compare the slope coefficient to what we calculated with the original data:

fit_bs1 = lm(b.sidi ~ s.sidi, data = dat_boot)

coef(fit_bs1)

## (Intercept)      s.sidi 
##  0.07489893 -0.03146039

We can compare this with the coefficients of the model of the original data:

Model	Intercept	Slope
Original	0.071	-0.024
Bootstrap	0.075	-0.031

They are pretty close!

Now, we need to repeat the process many times and record the slope coefficients.

The code will be very similar to the code you used for your Monte Carlo loop. I’ll let you figure out the code. Hint: you only need to make a few modifications to your Monte Carlo loop.

A histogram of my bootstrapped model slopes looks like:

hist(
  result_boot,
  main = "Mike's Alternative Distribution of Regression Slope",
  xlab = "Slope Parameter")
abline(v = slope_observed, lty = 2, col = "red", lwd = 2)
abline(v = 0, lty = 2, col = 1, lwd = 2)

Note that I added vertical lines at the observed slope value from the original data and at zero.

Compare Null and Alternative

We can compare histograms of the null and alternative distributions:

It would be nice if we could plot them on the same panel. It’s difficult to plot two histograms, but we can create density plots.

Here’s the syntax to make a density plot of the null distribution:

plot(
  density(result_mc),
  main = "Mike's Null Distribution Density Plot",
  xlab = "Slope Coefficient")

See if you can use similar syntax to plot the alternative distribution:

Now for a challenge: let’s plot both on the same panel.

I’ll let you figure out the code, but I’ll give you a few hints.

Use the lines() function to plot your alternative density curve. The syntax will be similar to you call to plot().
You’ll need to set the x- and y-axis limits manually so that both curves show up. Check out the xlim and ylim arguments for plot().
You should plot the null and alternative distribution density curves in different colors.
To create the legend, you can use the legend() function. I used the arguments x = "topright" to position my legend in the upper right and bty = "n" to suppress the drawing of a box around the legend.

My double density plot looks like:

Lab Questions

Q1-4 Penguin Boot 1

Use the two.boot() function to calculate 10000 bootstrap replicates of the difference in mean flipper length of Chinstrap and Adelie penguins.
Is there missing data? If so, what argument have we used before in functions like mean() and sd() to exclude NA values?
Save the output of two.boot() to a variable called pen_boot. This is so that I can replicate your code easily on my machine for grading and assistance.
Plot a histogram of your bootstrapped differences.
Use quantile() with pen_boot to calculate a 95% Confidence interval on the difference in mean flipper lengths.
Calculate the mean and median difference in flipper lengths.

Hint: you’ll need to investigate the structure of pen_boot to figure out what component contains the bootstrapped differences.

Q1 (1 pt.): Calculate the standard deviation of the differences in mean flipper length from your bootstrap simulation. Show the R-code you used to do the calculation.
Q2 (2 pts.): Include your histogram of bootstrapped differences in your lab report (you don’t need to show the R-code but make sure your plot includes appropriate title, axes, etc.).
Q3 (2 pts.): What was the 95% bootstrap CI you calculated using quantile()? Show the R-code you used to answer the question.
Q4 (4 pts.): Do you think the resampled differences in means follow a skewed distribution? Your answer should make reference to the mean, median, and histogram of the differences in means.

Q5-7 - Penguin ECDF

Create a distribution function from pen_boot using ecdf().

Name the function created by ecdf() pen_ecdf.
You can use the new function to calculate the cumulative density.

Use pen_ecdf() to calculate the empirical probability of observing a mean difference of -4.5 or greater.
Use pen_ecdf() to calculate the empirical probability of observing the mean difference predicted by the null hypothesis, i.e. 0 or greater.

Note we could also ask what is the probability of observing some value in the range of -1 to 1, for example.
We could even ask what is the probability of observing a value within the range of a 95% CI of the null hypothesis… We might even generate such a 95% CI of the null hypothesis using a Monte Carlo simulation!
This sounds a lot like Type I error rates, Type II error rates, and statistical power! We’ll be covering these in detail in lecture.

ECDF Hints (click to show/hide)

The ecdf() function is a little different than other functions that we’ve worked with because it returns a new function instead of an R data object.

Recall that we used bootstrapping to characterize the sampling distribution of the alternative hypothesis.
Remember the law of total probability and complementary events.
pen_ecdf() is a lot like pnorm(). It calculates the area under the density curve to the left of x.
Remember plots like this one that show the shaded area to the left of x for a standard normal distribution?
For this plot I have used \(x = -0.5\).

We get the area of the blue shaded area with pnorm(-0.5), which turns out to be 0.3085375.

The usage of pen_ecdf() is very similar.

Q5 (2 pts.): Show the R-code you used to create pen_ecdf()
Q6 (2 pts.): What is the probability, according to the empirical distribution function, of observing a mean difference of -4.5 or greater? Show the R code you used to perform the calculation.
Q7 (2 pts.): What is the probability, according to the empirical distribution function, of observing a mean difference of -8 or smaller? Show the R code you used to perform the calculation.

Q8 Hypotheses

Consider the null and alternative hypotheses for a two-sample, and two-tailed, test of the difference in mean flipper length between the two penguin species.

Q8 (3 pts.): State the null and alternative hypotheses of a two-sample, two-tailed test for the difference in mean flipper lengths between the two penguin species.

Q9 - Pines, Non-Parametric Test

Read the pine tree data into a data.frame called veg.
Check out the first few rows of the data to look at the column names and to see if there are any obvious issues with the data.
Use droplevels and subset to keep just the control and clipped levels of the treatment column.
Conduct a two-tailed Wilcoxon ranked sum test on the difference in the mean number of pine seedlings between the control and clipped treatments.
Examine the output of the test.

To receive full credit, you must use the formula notation in R to conduct the test.

Q9 (2 pts.): What was the p-value? Show the R-code you used to find out.

Q10-11 Pines, Bootstrap

Use two.boot() to create a bootstrapped data set of the differences in mean pine tree count between the clipped and control treatments.

Save your results as tree_boot.

Use quantile() to find a 95% CI.

Q10 (1 pt.): What were the endpoints of your bootstrap CI? Show the R-code you used to find out.
Q11 (1 pt.): What is the observed difference in mean tree counts and does it fall within the 95% bootstrap CI?

Q12-17 Resampling Model Coefficients

Z-standardize the s.sidi and b.sidi column of dat_all. Add the standardized data to dat_all using the new column names s.sidi.standardized and b.sidi.standardized.
Complete the code for a loop to resample the slope parameter of a simple linear regression of the Simpson’s diversity indices for vegetation and birds.
Use your loop to create a MC resampled vector of 10000 model slope parameters.
Plot a histogram of the MC simulated slope parameters.
Use quantile to find the 5% quantile of slopes in the null distribution. This is the critical value.
Use abline() along with the argument v = to add a vertical line showing where the observed slope occurred.

This line should be be blue, solid, and have a width of 2.0.

Use abline() along with the argument v = to add a vertical line showing the critical value.

This line should be be red, dotted, and have a width of 2.0.

Do you remember how do z-standardize? Check out the walkthrough, or lecture slide deck 5 if you need a refresher.
Consult the lab walkthrough for some help getting started.
Hint: Recall how we have ‘wrapped up’ code into loops or functions before?
Debugging and testing hint: Once you’ve created a working loop, clear out your R environment to make sure your loop runs correctly.
Consult the R help and Dr. Google for help, as needed.

Q12 (2 pts.): Briefly describe the Simpson diversity index, and explain what it quantifies.
Q13 (2 pts.): Show the code you used to z-standardize the s.sidi column.
Q14 (6 pts.): Show the code for your completed Monte Carlo simulation loop.
Q15 (2 pts.): In your report, include a plot of your histogram of Monte Carlo resampled slopes. Include vertical lines showing the observed slope and the critical value from the resampled MC slopes.
Q16 (2 pts.): What was your critical value? Was the observed slope less than the critical value?
Q17 (3 pts.): What is your conclusion regarding the evidence of a negative relationship between vegetation cover diversity and bird diversity? Make sure to justify your conclusions using the results of your simulation analysis.

Q18-20

Create the bootstrap loop from the walkthrough.
Create a double density plot that shows both the null and alternative distributions. Your figure should resemble the example in the walkthrough. See the hints above for some tips.
As a reminder, your plot should look something like the following (the shading is optional):

Q18 (2 pts.): Show the code you used in your bootstrap loop.
Q19 (4 pts.): Include your double density plot. For full credit your plot must include:
- a legend
- the two density curves, in different colors
- appropriate axis labels and title
Q20 (2 pts.): Recall that the bootstrap curve shows the distribution of plausible values for the slope coefficient if we could resample the original data. The Monte Carlo curve shows the distribution of plausible values for the slope coefficient if the null hypothesis were true. How can you interpret the region that falls under both curves?

Analysis of Environmental Data Lab

Lab 08: Modeling Data 1

Lab Credits

Data Files

Bootstrap: Continuous Data

Bootstrap: Exploring the Alternative Hypothesis

Penguin Data

Parametric Two-Sample Test

Bootstrap Two-Sample Test

Use `two.boot()`

Tree data

Tree Treatments

Nonparametric two-sample test

Bootstrap

Bird Data: linear model

Bird Data

Model Variables

Simple Linear Regression

The Slope Coefficient

Resampling The Data

Null Distribution: Monte Carlo Simulation

Monte Carlo Randomization Loop

The Null Distribution

Critical Slope Value

Alternative Distribution: Bootstrapping

Compare Null and Alternative

Lab Questions

Q1-4 Penguin Boot 1

Q5-7 - Penguin ECDF

Q8 Hypotheses

Q9 - Pines, Non-Parametric Test

Q10-11 Pines, Bootstrap

Q12-17 Resampling Model Coefficients

Q18-20

Analysis of Environmental Data Lab

Lab 08: Modeling Data 1

Lab Credits

Data Files

Bootstrap: Continuous Data

Bootstrap: Exploring the Alternative Hypothesis

Penguin Data

Parametric Two-Sample Test

Bootstrap Two-Sample Test

Use two.boot()

Tree data

Tree Treatments

Nonparametric two-sample test

Bootstrap

Bird Data: linear model

Bird Data

Model Variables

Simple Linear Regression

The Slope Coefficient

Resampling The Data

Null Distribution: Monte Carlo Simulation

Monte Carlo Randomization Loop

The Null Distribution

Critical Slope Value

Alternative Distribution: Bootstrapping

Compare Null and Alternative

Lab Questions

Q1-4 Penguin Boot 1

Q5-7 - Penguin ECDF

Q8 Hypotheses

Q9 - Pines, Non-Parametric Test

Q10-11 Pines, Bootstrap

Q12-17 Resampling Model Coefficients

Q18-20

Use `two.boot()`