Learning Objectives

Conduct commonly used tests on categorical data

Key Concepts

Analysis of categorical data: 1- and 2-sample tests
Testing assumptions: normality
Interpreting test output

Introduction

NOTE: Make sure you fill in all of the missing code in the walkthrough. You’ll need it for the assignment questions.

Marbled Salamanders and Catastrophe Rate Data

Marbled Salamanders

The data represent the observed reproductive success of marbled salamanders in 13 vernal pools over a period of 7 years in western Massachusetts.

For additional info, check out the Marbled Salamader Conservation Plan for Massachusetts.

Late Filling

From ncwildlife.org:

“Instead of breeding in water during spring like other mole salamanders, the marbled salamander breeds in the fall on land. Females lay their eggs in a variety of dried-up ponds, pools or ditches that have not yet filled with winter rains. Nests are usually laid under leaf litter, logs or other debris within the depression basin, and are guarded by the female until the eggs are covered with water. Clutch size ranges from 50-200 eggs. While in the nest, embryos will develop within their jelly coats, but they cannot hatch until the area fills with water. If there is not enough rain to fill these wetlands in the fall, the eggs will sometimes overwinter and hatch in the spring instead.”

During years in which the ponds fill later than normal, the eggs are at risk of freezing or dessication.

Catastrope Rate

Catastrophe rate (cat.rate) is the proportion of years with any breeding effort (i.e., >0 breeding females) in which the number of emerging juveniles per breeding female (i.e., fecundity) was less than 1, a level that Kevin McGarigal and his collaborators deemed to be a reproductive “catastrophe”

Find the data file catrate.csv on the course GitHub page and save it in the data subdirectory of your main course RProject folder.
Use read.csv() and here() to load the data into a data.frame called catrate. Examine the first few rows using head():

head(catrate)

##   pond success years  cat.rate
## 1    2       5     7 0.2857143
## 2    3       5     7 0.2857143
## 3    4       6     7 0.1428571
## 4    5       4     7 0.4285714
## 5    6       0     7 1.0000000
## 6    7       1     4 0.7500000

Numerical and Graphical exploration

Use the summary() function to view numerical summaries of all the columns in catrate.

##       pond       success          years          cat.rate     
##  Min.   : 2   Min.   :0.000   Min.   :1.000   Min.   :0.1429  
##  1st Qu.: 5   1st Qu.:1.000   1st Qu.:3.000   1st Qu.:0.2857  
##  Median : 8   Median :2.000   Median :4.000   Median :0.4286  
##  Mean   : 8   Mean   :2.538   Mean   :4.692   Mean   :0.5394  
##  3rd Qu.:11   3rd Qu.:5.000   3rd Qu.:7.000   3rd Qu.:0.7500  
##  Max.   :14   Max.   :6.000   Max.   :7.000   Max.   :1.0000

Plot a histogram of the catastrophic rates (column cat.rate).

Check for Normality

There are several methods to check whether your data are likely to have been drawn from a Normal distribution.

Shapiro Test

One of the most common is the Shapiro-Wilk test, implemented in R as shapiro.test().

Try running the shapiro.test() function on the cat.rate column of the catrate data.
- How can you extract just the cat.rate column?

You’ll need to figure out the shapiro.text() syntax.

Once you get it right, your output should look like:

## 
##  Shapiro-Wilk normality test
## 
## data:  catrate$cat.rate
## W = 0.86202, p-value = 0.04097

The R help entry for shapro.test() doesn’t provide much information about how to interpret the output.

The null hypothesis for the Shapiro-Wilk test is: “The data were sampled from a normally-distributed population”.

Recall that low p-values suggest that there is good evidence against the null hypothesis.

How would you interpret the output of the Shapiro test? Is there strong evidence that the cat.rate is non-normal?

Other Normality Tests

The package nortest contains many other functions for assessing normality including ad.test(), lillie.test(), and many others.

One-Sample Tests: Tests for Difference From Expectation

One of the simplest one-sample tests is whether the observed mean (or median) is significantly different from a specified value

The default null hypothesis is that the mean is equal to zero, but this is not a very meaningful null hypothesis in many cases. For example, if we wanted to

In two of the seven years, vernal pools filled later than normal in the fall.

Late pond-filling can cause the desiccation and/or freezing of the eggs prior to inundation, resulting in death.
The observed late-filling rate is \(\frac{2}{7} \approx 0.28\).
The observed mean catastrophic rate from the data is 0.54. How could you calculate this?

We might want to test the alternative hypothesis that the observed mean cat.rate (0.54) is significantly different from an expected value of 0.28.

Under this causal model, 2 of the 7 (2/7=0.28) years of the study between 1999-2005 experienced late pond filling.

Is the mean observed catastrophe rate significantly different from expected (if we assume reproductive catastrophe is caused only by late-filling)?

Parametric One-Sample Test: The t-test

If the data are normally distributed, we can use the Student’s t test.

The function in R is t.test().

Recall that for a one-sample test, the null hypothesis is: “The mean of population from which the data were collected is not different from x”

The default null hypothesis has \(x=0\), but we have a good reason to choose a different value for the null.

You need to use t.test() to perform a 1-sample test on the catastrophic rate.

The t.test() arguments you need to use for this test are x and mu.

mu is the null hypothesis you want to test against. What value should you use for mu in your t.test() call?

Once you have the correct R syntax, you should see t-test output that looks like:

## 
##  One Sample t-test
## 
## data:  catrate$cat.rate
## t = 2.9595, df = 12, p-value = 0.01193
## alternative hypothesis: true mean is not equal to 0.2857143
## 95 percent confidence interval:
##  0.3526250 0.7261295
## sample estimates:
## mean of x 
## 0.5393773

What does the Student’s t test say about the probability that the observed catastrophe rate comes from a population in which catastrophes are solely the result of late pond-filling in the fall? Note, the default t test is a two-sided alternative; i.e., the alternative hypothesis is that the observed mean is not equal to the expected mean, which can mean less than or greater than the expected mean

One-sided Alternative Hypothesis

If we have a good reason to believe that the observed mean should be greater than the null hypothesis mean, you can use a one-tailed test.

NOTE: we saw that the observed catastrophic rate was greater than the late-filling rate. Since we didn’t state that we thought the observed rate would be greater than the late-filling rate before we looked at the data it is not valid to retroactively specify a 1-tailed alternative hypothesis.

In the case that you propose a one-tailed hypothesis before looking at the data you can conduct a one-sided test by including the argument alternative = "greater" in the call to t.test()

For a 1-sided t-test on the catastrophic rate you should obtain results similar to:

## 
##  One Sample t-test
## 
## data:  catrate$cat.rate
## t = 2.9595, df = 12, p-value = 0.005966
## alternative hypothesis: true mean is greater than 0.2857143
## 95 percent confidence interval:
##  0.3866123       Inf
## sample estimates:
## mean of x 
## 0.5393773

If we proposed a one-tailed hypothesis that the observed data would be less than the null hypothesis, we could use alternative = "less"

What p-value do you observe when you conduct the t-test on the catastrophic rate using alternative = "less"?

Non-Parametric One-Sample Test: The Wilcoxon Rank Sum Test

In the case of small samples from a non-normal distribution, we can use the Wilcoxon’s signed rank test (also known as the Mann-Whitney test). The syntax is almost exactly the same as the t.test() syntax. You can use the same value for mu.

wilcox.test(catrate$cat.rate, mu = 2 / 7)

## Warning in wilcox.test.default(catrate$cat.rate, mu = 2/7): cannot compute
## exact p-value with ties

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  catrate$cat.rate
## V = 85, p-value = 0.006275
## alternative hypothesis: true location is not equal to 0.2857143

The syntax for conducting a 1-tailed test is exacly the same as for the t.test(): use alternative = "less" or alternative = "greater"

Comparing two sample means

In the one-sample tests, we compared a sample mean with a fixed value that we specified.

If you need to compare the means of two groups of observations, you can use a two-sample test.

The null hypothesis in a two-sample test is: “There is no difference in mean values between the two groups.”

Alternatively:

The values in the two groups were drawn from the same population.

We’ll use the penguin data for our 2-sample tests. We’ll exclude the Gentoo penguins so that we only have two penguin species.

You can copy this code directly into your assignment script:

require(palmerpenguins)

## Loading required package: palmerpenguins

penguin_dat = droplevels(subset(penguins, species != "Gentoo"))

Numerical/Graphical Exploration

Use summary() to summarize the columns in the data:

summary(penguin_dat)

##       species          island    bill_length_mm  bill_depth_mm  
##  Adelie   :152   Biscoe   : 44   Min.   :32.10   Min.   :15.50  
##  Chinstrap: 68   Dream    :124   1st Qu.:37.75   1st Qu.:17.50  
##                  Torgersen: 52   Median :40.60   Median :18.40  
##                                  Mean   :41.91   Mean   :18.37  
##                                  3rd Qu.:45.95   3rd Qu.:19.10  
##                                  Max.   :58.00   Max.   :21.50  
##                                  NA's   :1       NA's   :1      
##  flipper_length_mm  body_mass_g       sex           year     
##  Min.   :172.0     Min.   :2700   female:107   Min.   :2007  
##  1st Qu.:187.0     1st Qu.:3400   male  :107   1st Qu.:2007  
##  Median :191.0     Median :3700   NA's  :  6   Median :2008  
##  Mean   :191.8     Mean   :3711                Mean   :2008  
##  3rd Qu.:196.0     3rd Qu.:3988                3rd Qu.:2009  
##  Max.   :212.0     Max.   :4800                Max.   :2009  
##  NA's   :1         NA's   :1

For this example we’ll consider the flipper length, in mm.

Let’s visualize the data using a conditional box plot, as follows:

boxplot(
  flipper_length_mm ~ species, 
  data = penguin_dat,
  ylab = "Flipper Length (mm)")

Note that I used the formula notation: flipper_length_mm ~ species along with the data argument.

Testing for normality

I’ll let you use shapiro.test() on your own to determine whether the flipper lengths are normally-distributed, but I’ll give you a couple of hints:

You’ll need to compare the flipper lengths for each species separately. You can use subset() to extract the flipper measurements for individual species as follows:

# Extract the Adelie penguin data
dat_adelie = subset(penguin_dat, species == "Adelie")

Then you can use the normality test on the flipper length column of dat_adelie. You can use a similar syntax to test the Chinstrap penguins.
- If you’re feeling fancy, you can try to use the aggregate() function to conduct the species-specific tests instead of creating separate data sets for each species.

Parametric and Nonparametric Tests

Again, you can use the t-test!

This time the syntax is slightly different because you have two samples, or groups.

You can use exactly the same formula notation syntax that I used for the boxplot above.

When you have your t.test() function call set up correctly, you should see something like:

## 
##  Welch Two Sample t-test
## 
## data:  flipper_length_mm by species
## t = -5.7804, df = 119.68, p-value = 6.049e-08
## alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
## 95 percent confidence interval:
##  -7.880530 -3.859244
## sample estimates:
##    mean in group Adelie mean in group Chinstrap 
##                189.9536                195.8235

The Wilcoxon rank test works with exactly the same syntax. I’ll leave it up to you to conduct the test using wilcox.test()

You can test 1-tailed hypotheses in the two-sample tests as described above for the one-sample tests.

The trick is that you have to figure out which group R considers to be the the ‘base level’.
We won’t try it in this assignment, but you can use levels() to find out which is the base level:

levels(penguin_dat$species)

## [1] "Adelie"    "Chinstrap"

It looks like Adelie is the base level!

Questions

Q1 Catastrophic Rate Histogram

Q1 (1 pt.): Create a histogram of the salamander reproduction catastrophic rates.
- Make sure you include an appropriate title and label for the x-axis.

Q2-4 Normality Test

Q2 (1 pt.): Conduct a Shapiro-Wilk test of normality of the salamander catastrophic rates. Report the p-value and show the R-code you used to conduct the test.
Q3 (1 pt.): What is the null hypothesis for the Shapiro test?
Q4 (1 pt.): Based on the Shapiro test results, is there strong evidence that the sample came from a non-normally-distributed population?

Q5-7 T-Test 1

Conduct a one-sample t-test of the alternative hypothesis that the catastrophic rate is different from the pond late-filling rate.

Review the assignment walkthrough if needed.

Q5 (1 pt.): Show the code you used to conduct the t-test.
- Hint: your answer should only be a single line of code.
Q6 (1 pt.): State the null hypothesis of the test, in plain nontechnical English.
Q7 (1 pt.): Is this a one- or two-tailed test?

Q8-10 T-Test 2

Interpret the results of your one-sample t-test of the alternative hypothesis that the catastrophic rate is different from the pond late-filling rate.

Q8 (2 pts.): What is the p-value from your t-test? Interpret the p-value as a false-positive rate using nontechnical English that a non-scientist would understand.
Q9 (1 pt.): What is the confidence interval for the difference between the null hypothesis and alternative hypothesis means? Did it include zero?
Q10 (2 pts.): Considering the results from your t-test, did you conclude that there was strong evidence to reject the null hypothesis?
- Make sure you justify your answer using the output of the t-test.

Q11-13 Wilcoxon Test 1

Conduct a one-sample Wilcoxon rank sum test of the alternative hypothesis that the catastrophic rate is different from the pond late-filling rate.

Review the assignment walkthrough if needed.

Interpret the results of your one-sample Wilcoxon rank sum test of the alternative hypothesis that the catastrophic rate is different from the pond late-filling rate.

Q11 (1 pt.): Show the code you used to conduct the test.
- Hint: your answer should only be a single line of code.
Q12 (1 pt.): Compare the p-value with the p-value you got from the t-test.
Q13 (2 pts.): Considering the results from your rank sum test, did you conclude that there was strong evidence to reject the null hypothesis?
- Make sure you justify your answer using the output of the test.

Q14-15: Test Comparison

Consider the one-sample t-test and Wilcoxon rank sum tests that you performed on the catastrophic rate data. Explain your reasoning.

Q14 (1 pt.): Compare the overall conclusions you could draw from the results of the two tests.
Q15 (1 pt.): Considering the numerical and graphical data exploration, which test do you think was more appropriate for these data?

Q16-17: Flippers - Normality Tests

Q16 (2 pts.): Show the R-code you used to conduct tests of normality for the flipper lengths of Chinstrap and Adelie penguins.
Q17 (2 pts.): Interpret the test results. Do you conclude that the flipper lengths are normally-distributed for each species? Make sure your answers make reference to the test output.

Q18: Flippers - Histograms

Create a single figure consisting of histograms of flipper lengths of Adelie and Chinstrap penguins.

Your figure should have two histograms arranged in one row.

Q18 (2 pts.): Save your figure to a file and include it in your report. Your figure needs to have appropriate dimensions such that the two histograms are not vertically stretched.
- Hint: Check out the width and height arguments.
- Hint: Remember the par() function? Which argument did we use to include multiple plots in the same figure?

Q19-20 Flippers T-test

Conduct a two-sample t-test of the alternative hypothesis that the Adelie penguins have different flipper lengths than Chinstrap penguins.

Q19 (2 pts.): State the alternative hypothesis of the test, in plain nontechnical English.
- Consider whether you used a one- or two- tailed test.
Q20 (1 pt.): Include the code you used to conduct the t-test.
- Hint: your answer should only be a single line of code.

Report

Compile your answers to all 20 questions into a pdf document and submit via Moodle.

You may also do your work in an R Notebook and submit a rendered html file.

Analysis of Environmental Data

Using Models 1

Michael France Nelson