Overview

In our scenario, we want to test the grazing preferences for cows in two different pastures, and with three grass height varieties. Embedded into our scenario are several questions of interest:

  • 1: Do cows prefer some grass varieties over others? Or does grass height not matter?
  • 2: Do cows prefer to graze in one pasture?
  • 3: Do the cows’ preferences for certain varieties change between the two pastures?

In ANOVA terms, we realize that our data has information about two factors: the grass height and the pasture.

In ANOVA terms we can re-formulate the questions as:

  • 1: Is there a significant effect of grass on grazing preferences?
  • 2: Is there a significant effect of pasture on grazing preferences?
  • 3: Is there an interaction between pasture and grass on grazing preferences?

The first two questions are straightforward, but the third is trickier.

Fortunately we can answer these questions with 2-way interactive ANOVA.

Recreate the grazing data

Build the data by parts

This code will create the grazing data for the examples below:

There are three grass varieties: short, medium, and tall

grass_levels = c("short", "med", "tall")

There are two pastures: upper and lower

pasture_levels = c("upper", "lower")

This is a shortcut to build a matrix of factor levels for a balanced anova design:

expand.grid(
  replicate = 1:3, 
  grass = grass_levels, 
  pasture = pasture_levels)
   replicate grass pasture
1          1 short   upper
2          2 short   upper
3          3 short   upper
4          1   med   upper
5          2   med   upper
6          3   med   upper
7          1  tall   upper
8          2  tall   upper
9          3  tall   upper
10         1 short   lower
11         2 short   lower
12         3 short   lower
13         1   med   lower
14         2   med   lower
15         3   med   lower
16         1  tall   lower
17         2  tall   lower
18         3  tall   lower

The grazer abundance data are:

abundance = c(9,11,6,14,17,19,28,31,32,7,6,5,14,17,15,44,38,37)

Finally we can combine everything into a single data frame:

grazing_data =
  data.frame(
    abundance, 
    expand.grid(
      replicate = 1:3, 
      grass = grass_levels, 
      pasture = pasture_levels))

Create grazing data: complete

The complete code to build the grazing data

grass_levels = c("short", "med", "tall")
pasture_levels = c("upper", "lower")
expand.grid(
  replicate = 1:3, 
  grass = grass_levels, 
  pasture = pasture_levels)
   replicate grass pasture
1          1 short   upper
2          2 short   upper
3          3 short   upper
4          1   med   upper
5          2   med   upper
6          3   med   upper
7          1  tall   upper
8          2  tall   upper
9          3  tall   upper
10         1 short   lower
11         2 short   lower
12         3 short   lower
13         1   med   lower
14         2   med   lower
15         3   med   lower
16         1  tall   lower
17         2  tall   lower
18         3  tall   lower
abundance = c(9,11,6,14,17,19,28,31,32,7,6,5,14,17,15,44,38,37)
grazing_data =
  data.frame(
    abundance, 
    expand.grid(
      replicate = 1:3, 
      grass = grass_levels, 
      pasture = pasture_levels))
head(grazing_data)
  abundance replicate grass pasture
1         9         1 short   upper
2        11         2 short   upper
3         6         3 short   upper
4        14         1   med   upper
5        17         2   med   upper
6        19         3   med   upper

Factorial design

Our Factors

We know that our data contain two predictor variables:

  • grass (3 levels)
  • pasture (2 levels)

We like to use Analysis of Variance with data sets that have a factorial design, preferably a balanced factorial design.

  • Recall that factorial just means that we have some observations at each combination of factor levels.
  • Recall that balanced just means that we have the same number of observations at each combination of factor levels.

We call each possible factor level combination a cell.

Factors and observations

R has a convenient command we can use to check on the factorial structure:

xtabs(
  ~grass + pasture, 
  data = grazing_data)
grass upper lower
short short 3 3
med med 3 3
tall tall 3 3

It looks like we have some observations in each of the cells, and that the numbers of observations are the same in each cell.

We have a balanced, fully-crossed factorial sampling design (the best possible kind)!

Cells

To clarify what we mean by factor level combinations and cells, let’s look at a few examples in our data.

  • We have some observations of cows in the short grass region in the lower pasture. This cell combines the short level of the grass factor and the lower level of the pasture factor.

  • How would you describe the cell (in regular-person English) that contained data from the medium level of the grass and the upper level of the pasture factor?

Numerical Exploration

Pasture

Do the cows prefer the upper or lower pasture?

xtabs(abundance ~ pasture, data = grazing_data)
pasture Freq
upper 167
lower 183

It looks like they spend more time in the lower pasture, but the difference doesn’t seem very large. We’ll need to perform a statistical test to quantify our evidence.

Grass

Do cows like all of the grasses?

xtabs(abundance ~ grass, data = grazing_data)
grass Freq
short 44
med 96
tall 210

They seem to spend more time in the tall grass, but we need to perform a statistical test to quantify our evidence.

Both

We can examine both variables at the same time:
Pasture
grass upper lower
short 26 18
med 50 46
tall 91 119

NOTE: since we have a balanced design we can compare the counts in the cells directly.

Means

What was the mean number of cows observed per sample?
Pasture
grass upper lower
short 8.67 6.00
med 16.67 15.33
tall 30.33 39.67

If we had an unbalanced design, the table of means would allow us to compare cells directly, while the table of sums would be misleading. Why?

Graphical Exploration

Base R

These are very basic boxplots. You would want to customize them for inclusion in assignments or publications.

par(mfrow = c(1, 2))
boxplot(
  abundance ~ grass, 
  data = grazing_data)
boxplot(
  abundance ~ pasture, 
  data = grazing_data)

boxplot(
  abundance ~ grass + pasture,
  data = grazing_data)

Ggplot

Ggplot makes nicer looking boxplots with much shorter code than base R. The syntax is very different from the code we’ve used in class.

I’m providing these as examples (with a template below), but we won’t go over details of ggplot syntax in class.

Part of the ggplot philosophy is that plots can be built up from individual components. You can combine these individual components to create a final product.

Build graphics objects

We’ll be using the grazing data for all of our plots. For each boxplot, the abundance column will be displayed on the y-axis. We can create a ggplot object that keeps track of the data frame and the response variable:

gg_graze = ggplot(grazing_data, aes(y = abundance))

We can create a theme object to apply to all of the final graphs. If we wanted to, we could change our theme, and the easily apply the changes to each plot we create. For now, we’ll jsut specify that the legend will go on the bottom:

t1 = theme(legend.position = "bottom")

Single-factor boxplots

To make the boxplots, we can use geom_boxplot() to extend our basic ggplot object.

  • We’ll use aes() to specify the grouping factor using the x argument:
box_grass = 
  gg_graze + 
  geom_boxplot(aes(x = grass)) 

box_past =
  gg_graze + 
  geom_boxplot(aes(x = pasture))

Displaying a ggplot figure

To actually display the plot, simply call the object you created:

box_grass

  • Notice how the different levels of grass are separated along the x-axis.

Multi-panel display

par(mfrow = c(1, 2)) doesn’t work with ggplot objects.

Instead we can use the plot_grid() function from package cowplot.

plot_grid(box_grass, box_past, nrow = 1)

Models

First create the linear model objects:

fit_pasture = lm(
  abundance ~ pasture, 
  data = grazing_data)

fit_grass = lm(
  abundance ~ grass,
  data = grazing_data)

fit_additive = lm(
  abundance ~ pasture + grass, 
  data = grazing_data)

fit_interactive = lm(
  abundance ~ pasture * grass, 
  data = grazing_data)

1-Way: Pasture

Abundance modeled by pasture:

Analysis of Variance Table

Response: abundance
          Df  Sum Sq Mean Sq F value Pr(>F)
pasture    1   14.22  14.222  0.0874 0.7713
Residuals 16 2602.22 162.639               



1-Way: Grass

Abundance modeled by grass:

Analysis of Variance Table

Response: abundance
          Df  Sum Sq Mean Sq F value    Pr(>F)    
grass      2 2403.11 1201.56  84.484 6.841e-09 ***
Residuals 15  213.33   14.22                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2-Way: Additive

The model, without an interaction

Analysis of Variance Table

Response: abundance
          Df  Sum Sq Mean Sq F value    Pr(>F)    
pasture    1   14.22   14.22   1.000    0.3343    
grass      2 2403.11 1201.56  84.484 1.536e-08 ***
Residuals 14  199.11   14.22                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2-Way: Interactive

The model, with an interaction

Analysis of Variance Table

Response: abundance
              Df  Sum Sq Mean Sq  F value    Pr(>F)    
pasture        1   14.22   14.22   2.4615  0.142644    
grass          2 2403.11 1201.56 207.9615 4.863e-10 ***
pasture:grass  2  129.78   64.89  11.2308  0.001783 ** 
Residuals     12   69.33    5.78                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA Interaction Null Hypothesis

You can think of the null hypothesis for an interaction term as meaning that ‘the differences in group means for factor 1 are the same when the data are grouped by factor 2’ and vice versa.

It’s much easier to understand the null and alternative hypotheses graphically:

Interaction Plots With ggplot

Interaction plots are similar to the 2-factor boxplots, but they show group means by points and use lines to connect the points in the second group:

Interaction Plots

Interaction plots are similar to the 2-factor boxplots, but they show cell means by points and use lines to connect the points in the second group:

Build gg objects:

stat_summary() will calculate a statistic (such as mean, or standard deviation) then plot it.

Two-way interaction plots perform two groupings. We can vary which group is the color and which is along the x-axis:

  • x-axis grouping: pasture
gg_graze +
  stat_summary(mapping = aes(
    x = pasture, colour = grass, group = grass),
    fun = mean, geom = "line")

  • x-axis grouping: grass height
gg_graze +
  stat_summary(mapping = aes(
    x = grass, colour = pasture, group = pasture),
    fun = mean, geom = "line")