Overview

Although this is an individual assignment, I encourage students to work through the analyses together. Your submission, however, will be individual and responses must be in your own words.

Eastern Red-Backed Salamanders

You are part of a research team studying populations of Eastern Red-Backed Salamander (Plethodon cinereus) in Massachusetts.

(Plethodon cinereus) Dave Huth from Allegany County, NY, USA / CC BY

Species description from Salamander Species in Massachusetts page from Mass Audubon:

This small salamander may be the most abundant vertebrate (backboned animal) in the northeast, and it’s found all across the state. It is lungless, and breathes through its moist skin. Despite its name, its color varies; it’s often gray with a red stripe down its back, but it may be entirely red or entirely gray. Its belly is finely speckled with white and gray. Unlike our other salamanders, it spends its entire life on land, and lays its eggs on the moist forest floor. The young skip the typical aquatic stage and emerge as tiny terrestrial salamanders.

Research goals

Your research team has collected data on salamanders from several populations near vernal pools in the Pioneer Valley in Western Massachusetts.

We are interested in whether salamander snout-to-vent length (SVL) varies by sex and/or site.

The data

You can find the data file mander_anova.csv on the course GitHub page. Make sure you save it to your data subfolder.

Read the data into a data.frame called sals using here() and read.csv().

The first few rows of the data look like this:

sals = read.csv(here("data", "mander_anova.csv"))
head(sals)
##   Collector Year Season Site SVL Total_length    Sex
## 1     Chris 2014   Fall    A  36           72 female
## 2     Chris 2014   Fall    A  46           83 female
## 3     Chris 2014   Fall    A  42           89 female
## 4     Chris 2014   Fall    A  34           75 female
## 5     Chris 2014   Fall    A  37           80 female
## 6     Chris 2014   Fall    A  40           79 female
  • NOTE: You may not need to use all of the data in these analyses.
  • Remember our data-recording concepts from earlier in the course.
  • We recorded information in our data that might be relevant at some point, but that might not be directly needed in every analysis we consider.
  • The data contain the following columns of interest:
  • Site: there are four sites (P1A, P1B, P2A, P2B)
  • Sex: M (male) and F (female)
  • SVL: the snout-to-vent length in mm

Data Analysis: ANOVA in R

You’ll conduct all analyses in R. You’ll need to do download the data file and start a new R script to store your code for the assignment.

You can use the following code templates to see how Analysis of Variance can be implemented in R. Note that you will need to adjust the code to analyze the variables of interest for this assignment.

Note that while we could use aov() to output the ANOVA directly to the console window, it is better to first create a linear model fit object using lm().

The close connection between ANOVA and linear models will become clear later on when we look at linear regression.


Example: 1-way ANOVA - SVL and Collector

I want to know if there is a significant difference between measurements collected by different observers.

  • My null hypothesis is that the sizes of salamanders were not significantly different among observers.

First, I’ll make a boxplot:

boxplot(SVL ~ Collector, data = sals)

There seems to be a difference, but I need to do a statistical test to confirm my intuition.

I can use the following syntax to conduct a 1-way ANOVA of total length explained by collector in R:

fit_collector = lm(SVL ~ Collector, data = sals)
anova(fit_collector)
## Analysis of Variance Table
## 
## Response: SVL
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Collector   2 4788.5 2394.24  110.58 < 2.2e-16 ***
## Residuals 270 5846.2   21.65                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • According to the ANOVA output, is the difference between collectors significant?

How do you interpret the formula notation above?

  • The item to the left SVL is explained by the item(s) to the right, in this case collector.
  • We are proposing a model of Snouth to Vent Length as a function of Collector.

Example: 2-way Additive ANOVA - Site and Collector

I also want to know if salamanders differ by collection site:

boxplot(SVL ~ Site, data = sals)

I can use the following syntax to conduct a 2-way additive ANOVA of total length explained by collector and site in R:

fit2 = lm(SVL ~ Site + Collector, data = sals)
anova(fit2)
## Analysis of Variance Table
## 
## Response: SVL
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Site        3  738.7  246.22  11.541 3.882e-07 ***
## Collector   2 4199.8 2099.88  98.427 < 2.2e-16 ***
## Residuals 267 5696.3   21.33                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Are salamanders significantly different, with respect to SVL, in the different sites?

Analyses

We will conduct three analyses using the salamander data.

  1. An individual test of whether salamander SVL varies by sex.
  2. An individual test of whether salamander SVL varies by collection site.
  3. A combined test of whether salamander SVL varies by sex and/or site.

Graphical Exploration

Before you begin a formal statistical analysis, make some exploratory plots of your data.

  • A histogram of Snout to Vent Length
  • A boxplot of SVL grouped by Sex
  • A boxplot of SVL grouped by Site

Analysis 1: Does SVL vary by sex?

Review the template code I used to build a model of SVL as explained by Site.

  • Create a boxplot (with appropriate labels and title) of SVL grouped by sex.
  • Use lm() to create a 1-way ANOVA model of SVL as explained by sex.
  • Call your model object fit_sex.

Analysis 2: Does SVL vary by site?

Review the template code I used to build a model of SVL as explained by Site.

  • Create a boxplot (with appropriate labels and title) of SVL grouped by site.
  • Use lm() to create a 1-way ANOVA model of SVL as explained by sex.
  • Call your model object fit_site.

Analysis 3: Does SVL vary by sex and/or site?

  • Use lm() to create a 2-way ANOVA model of SVL as explained by site and sex.
  • Call your model object fit_sex_site.

Report

Create a report that contains plots and answers to the following questions. Save your report as a single pdf document and upload it via the file input box on the Moodle assignment page.

Graphical Analysis

  • Q1 (1 pt.): Histogram of SVL.
  • Q2 (1 pt.): Conditional boxplot of SVL grouped by Sex
  • Q3 (1 pt.): Conditional boxplot of SVL grouped by Site

Include your histogram and two boxplots with appropriate titles and axes labels.

Analysis 1 Questions

Use the results of your analysis of SVL and Sex to answer the following questions:

  • Q4 (1 pt.): State the null hypothesis in sentence form.
  • Q5 (1 pt.): State an alternative hypothesis in sentence form.
  • Q6 (1 pt.): What is the name of the test statistic you use to test \(H_0\), and what distribution does it follow?
  • Q7 (2 pts.): What are the within- and between-group degrees of freedom?
  • Q8 (1 pt.): What is the value of the test statistic and its associated p-value?
  • Q9 (4 pts.): Write a short paragraph reporting on the evidence that SVL varies (or not) by sex. Use values from the statistical test to support your conclusion.

Analysis 2 Questions

Use the results of your analysis of SVL and collection site to answer the following questions:

  • Q10 (1 pt.): State the null hypothesis in sentence form.
  • Q11 (1 pt.): State the alternative hypothesis in sentence form.
  • Q12 (2 pts.): What are the within- and between-group degrees of freedom?
  • Q13 (1 pt.): What is the value of the test statistic and its associated p-value?
  • Q14 (4 pts.): Write a short paragraph reporting on the evidence that SVL varies (or not) by collection site. Use values from the statistical test to support your conclusion.

Analysis 3 Questions

Use the results of your analysis of SVL and sex + collection site to answer the following questions:

  • Q15 (2 pts.): (2 pts): State the null hypothesis in sentence form.
  • Q16 (2 pts.): State the alternative hypothesis in sentence form.
  • Q17 (3 pts.): What are the within- and between-group degrees of freedom?
  • Q18 (3 pts.): What are the values of the test statistics and their associated p-values?
  • Q19 (6 pts.): Write a short paragraph reporting on the evidence that SVL varies varies (or not) by collection sex or site. Use values from the statistical tests to support your conclusion.

Synthesis Questions

The following questions require you to compare the output of all 3 analyses. Assume you are using a significance level of \(p < 0.05\) to determine whether a factor is significant.

  • Q20 (1 pt.): Did the significance levels for site or sex in the 1-way ANOVAs change when you combined them into a 2-way ANOVA?
  • Q21 (2 pts.): Explain why your conclusions about site and sex change or do not change when you use individual 1-way ANOVAs versus the complete 2-way ANOVA.
  • Q22 (2 pts.): Which of the three models do you think best explains snout to vent length in salamanders and why? We haven’t yet talked about how to compare models in class, so I don’t need a technical answer. I just want to know what you think given what you know so far.