Introduction to Quantitative
Ecology
In-class Contingency Tables and Chi-square tests:
The goals of this activity are disparate:
palmerpenguins
data using
require()
.table()
.The table()
function is helpful for two important
tasks:
For example, if I want to know how many penguins were observed in each year, I could use the following syntax:
table(penguins$year)
##
## 2007 2008 2009
## 110 114 120
This tells me that there were 110 observations in 2007, 114 in 2008, and 120 in 2009.
Question: Use table()
to determine how many penguins
were counted on each of the three islands. How many penguins were
observed on Dream island?
table()
.I can use table()
to build a contingency table,
also known as a two-way table. For example, if I wanted to
build a contingency table showing the counts of male and female penguins
that were observed each year, I could use
require(palmerpenguins)
head(penguins)
## # A tibble: 6 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm
## <fct> <fct> <dbl> <dbl> <int>
## 1 Adelie Torgersen 39.1 18.7 181
## 2 Adelie Torgersen 39.5 17.4 186
## 3 Adelie Torgersen 40.3 18 195
## 4 Adelie Torgersen NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193
## 6 Adelie Torgersen 39.3 20.6 190
## # ... with 3 more variables: body_mass_g <int>, sex <fct>, year <int>
table(penguins$sex, penguins$year)
##
## 2007 2008 2009
## female 51 56 58
## male 52 57 59
We can test whether significant associations occur in a contingency table using the chi-square test.
The syntax is pretty simple in R, the function is just
chisq.test()
.
chisq.test()
works with the output from the
table()
function:
sex_year_table = table(penguins$sex, penguins$year)
chisq.test(sex_year_table)
##
## Pearson's Chi-squared test
##
## data: sex_year_table
## X-squared = 7.8283e-05, df = 2, p-value = 1
The test statistic for the chi-square test is just the X-squared value in the test output. We don’t usually try to interpret the value directly, but notice here that the value is quite small: approximately 0.00008.
One possible way you could frame the null and alternative hypotheses for this test might be:
Note that these hypotheses are pretty generic. A better set of hypotheses would be:
Let’s experiment with some ways to customize scatterplots in R.
We’ll use the following basic scatterplot of penguin body mass and bill length as an example. You should copy my code and use it as a template for your explorations.
plot(
bill_length_mm ~ body_mass_g,
data = penguins)
The first elaboration we’ll look at is changing the size of the
plotting symbol using the cex
argument. For example, if I
wanted to make the points twice as large, I could use
cex = 2
:
plot(
bill_length_mm ~ body_mass_g,
data = penguins,
cex = 2)
I can use the pch
argument to change the shape of the
plotting symbol:
plot(
bill_length_mm ~ body_mass_g,
data = penguins,
cex = 1.2,
pch = 16)
I happen to like plotting symbol 16 because it is a filled circle.
Here’s a guide to the basic plotting symbols available in R.
It’s easy to change the plotting symbol color using the
col
argument.
There are several ways you can specify a color. The first is using a numeric code. The code 2 stands for red:
plot(
bill_length_mm ~ body_mass_g,
data = penguins,
cex = 1.2,
pch = 17,
col = 2)
You should experiment with some different numbers to see what colors you get!
R also understands the names of many colors.
Here’s a guide to the named colors in R
I like the steelblue color:
plot(
bill_length_mm ~ body_mass_g,
data = penguins,
cex = 1.2,
pch = 17,
col = "steelblue")