Objectives and Concepts

  • Practice Analysis of Variance on the penguin data

Data

Load the palmer penguins data

Example

ANOVA Procedure

We will use a slightly different procedure for contucting an ANOVA than that presented in the Gardner text.

The outline of our procedure is:

  1. Do a graphical/numerical exploration.
  2. Check assumptions.
  3. Create a linear model object using the lm() function and the formula notation.
  4. Use the anova() function on our model object.
  5. (Optional) perform a post-hoc test as needed.

Data Exploration

Let’s suppose that we are interested in knowing whether flipper length is different among the three penguin species. We can explore the relationship using a conditional boxplot:

require(palmerpenguins)

boxplot(
  flipper_length_mm ~ species, 
  data = penguins,
  main = "Mike's Plot of\nPenguin Species and Flipper Lengths",
  ylab = "flipper length (mm)")

Assumptions

Remember some of our important assumptions for ANOVA:

  1. Data are normally distributed within groups.
  2. The variance is the same within each group.

Based on a visual inspection, do you think these assumptions are met? We’re going to assume normality within the groups (but you would want to check this for your own data).

To check the second assumption we can use a Bartlett test. The syntax for the Bartlett test is very similar to that for the boxplot code above. This is not a coincidence.

The null hypothesis of the Bartlett test is that the variance is the same within groups.

bartlett.test(
  flipper_length_mm ~ species, 
  data = penguins,
)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  flipper_length_mm by species
## Bartlett's K-squared = 0.91722, df = 2, p-value = 0.6322

Based on the result of the Bartlett test, do you conclude that the variances are equal?

Conduct the test!

Now we’ll create the model object and perform the ANOVA.

Note again the similar syntax to the conditional boxplot and the Bartlett test.

fit_flippers = lm(
  flipper_length_mm ~ species, 
  data = penguins,
)

anova(fit_flippers)
## Analysis of Variance Table
## 
## Response: flipper_length_mm
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## species     2  52473 26236.6   594.8 < 2.2e-16 ***
## Residuals 339  14953    44.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Some questions about the ANOVA table:

  1. There are 3 penguin species, how many degrees of freedom are there for the species factor?
  2. What is the p-value for species?
  3. Do you conclude that there is a difference in fliper length among species?
    • Compare this result to the conditional boxplot above and think about whether it makes intuitive sense.

Follow up with a post-hoc test

TukeyHSD(aov(fit_flippers))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fit_flippers)
## 
## $species
##                       diff       lwr       upr p adj
## Chinstrap-Adelie  5.869887  3.586583  8.153191     0
## Gentoo-Adelie    27.233349 25.334376 29.132323     0
## Gentoo-Chinstrap 21.363462 19.000841 23.726084     0

It looks like all three species are significantly different from one-another.

Species and Body Mass

Now it’s your turn to perform an ANOVA of penguin body mass and species.

Boxplot

First, create a conditional boxplot of penguin body mass grouped by species. You can use my code above as a guide, just make sure to adjust the title!

Click to show/hide self-test

Your boxplot should look something like

Assumptions

Now you need to check for homogeneity of variances with the Bartlett test.

Based on the p-value, do you think the variances are the same in all the groups?

Conduct the ANOVA

For the sake of this activity, we’ll say the assumptions are met.

Now, conduct the ANOVA (using my code above as a template).

Click to show/hide self-test

Your ANOVA table should look like this:

## Analysis of Variance Table
## 
## Response: body_mass_g
##            Df    Sum Sq  Mean Sq F value    Pr(>F)    
## species     2 146864214 73432107  343.63 < 2.2e-16 ***
## Residuals 339  72443483   213698                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • What do you conclude about body mass among the three species?

Report

Prepare a report that includes responses to the following:

  • Q1 (1 pt.): Conditional boxplot of body mass grouped by species.
  • Q2 (1 pt.):. Show the code you used to perform the Bartlett test and the results of the test.
  • Q3 (2 pts.):. Do you conclude the groups have equal variance? Why or why not?
  • Q4 (1 pt.):. Show the ANOVA table and the code you used to construct it.
  • Q5 (2 pts.):. Do you conclude there are differences in body mass among the three species? Explain your reasoning.
  • Q6 (0 pts.): (Optional) conduct a post-hoc analysis and describe which species (if any) have significantly different body masses.