Learning Objectives

  • Practice interpreting model coefficient tables for categorical and continuous predictors.

Data

For this exercise, we’ll use the famous Iris dataset. It’s one of R’s built-in datasets. Use the data() function to load it:

data(iris)

Model Coefficients: Categorical Predictor

We’ll practice using model coefficients to make predictions.

Let’s fit a simple linear model of sepal length as predicted by species:

fit_species = 
  lm(
    Sepal.Length ~ Species,
    data = iris)

And the model coefficient table:

summary(fit_species)
## 
## Call:
## lm(formula = Sepal.Length ~ Species, data = iris)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6880 -0.3285 -0.0060  0.3120  1.3120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.0060     0.0728  68.762  < 2e-16 ***
## Speciesversicolor   0.9300     0.1030   9.033 8.77e-16 ***
## Speciesvirginica    1.5820     0.1030  15.366  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5148 on 147 degrees of freedom
## Multiple R-squared:  0.6187, Adjusted R-squared:  0.6135 
## F-statistic: 119.3 on 2 and 147 DF,  p-value: < 2.2e-16

Some questions to consider from the model table:

  • What is the base case for species?
  • What is the mean sepal length for the base species?
  • How could you calculate the mean sepal length for Iris virginica?

Model Coefficients: Continuous Predictor

The Iris dataset contains measurements for various floral characteristics.

We’ll fit a model of petal length as predicted by petal length.

First, let’s look at a scatterplot:

plot(
  Petal.Width ~ Petal.Length,
  data = iris,
  xlab = "Petal Length (cm)",
  ylab = "Petal Width (cm)")

Now you can fit a model of petal width as predicted by petal length.

Call your model fit_petals.

Use summary() to view the model table:

summary(fit_petals)
  • What are the intercept and slope coefficients?
  • What is the expected width of a petal of length 0cm?
  • What is the expected width of a petal of length 4cm?

Questions

Examine the model coefficient tables for the two models you created and use the values to answer the following questions.

Model 1: Sepal Length and Species

  • Q1 (1 pt.): What is the base species?
  • Q2 (1 pt.): What is the mean sepal length of the base species?
  • Q3 (1 pt.): What is the mean sepal length of Iris virginica? Show your calculation.
  • Q4 (1 pt.): Include a conditional boxplot of sepal length and species in your report.
  • Q5 (1 pt.): Conduct a normality test on the residuals of the species/sepal length model and report the p-value. Do the residuals meet the assumption of normality, how do you know?
    • Hint: check out the residuals() and shapiro.test() functions.
  • Q6 (1 pt.): Given your boxplot and the results of your normality test, do you conclude that a linear model is appropriate? Why or why not?

Model 2: Petal Width and Length

  • Q7 (1 pt.): What is the expected width of a petal of length 0cm? Show your calculation.
  • Q8 (1 pt.): What is the expected width of a petal of length 4cm? Show your calculation.
  • Q9 (1 pt.): Does the model meet the assumption of normality of the residuals? How do you know?