Learning Objectives

  • Practice plotting data in R
  • Save plots to files programmatically

Introduction

You’ll be working with the palmerpenguins dataset to practice creating and interpreting different plot types in R

The purpose of this activity is for you to explore different types of plots and discuss the different kinds of insight they provide.

Instructions

Install the palmerpenguins package and dataset

You can check out the information about the data set here. It’s a nice alternative to the original iris data we’ve used before.

Before you can use it you’ll have to install the palmerpenguins package:

install.packages("palmerpenguins")

If you haven’t already, you should also install the here package while you’re at it:

install.packages("here")

Since these packages aren’t part of base-R, you have to tell R that you want to use the using the require() or library() function. I prefer require():

require(palmerpenguins)
require(here)

To Quote or not to quote

Quotation marks are optional when loading installed packages, even though it seems like you should need them. - It’s also ok to include them. - This seems inconsistent because we had to use quotation marks to install the packages!

This will also run:

library("palmerpenguins")

The details are kind of complicated, but it has to do with environments. Essentially, R keeps a list of all the packages you’ve installed.

  • If a package is on the list, you can refer to it as if it were a variable (i.e. without quotes).
  • Since packages you haven’t installed yet aren’t on the list, you must use quotation marks with install.packages().

Prepare the dataset

Use class() to check what kind of object penguins is.

  • The data are in a slightly different type of object than our familiar data.frame.
  • To avoid any difficulties, you need to convert it to a data.frame object:
penguins = data.frame(penguins)

Data exploration

Numerical exploration

Mean

Try to calculate the mean body mass of all the penguins:

mean(penguins$body_mass_g)
## [1] NA

Oops… What happened?

Use head() to preview the data. Do you see any potential issues?

Some functions, like mean() don’t work well with missing data, i.e. elements that are NA or NULL.

  • Take a look at the help entry for mean().
  • Did you notice the description for the na.rm argument.
    • What does it do?
    • Try setting it to TRUE.

Try out the summary() function on the entire penguins data frame. This function provides a lot of information:

##       species          island    bill_length_mm 
##  Adelie   :152   Biscoe   :168   Min.   :32.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23  
##  Gentoo   :124   Torgersen: 52   Median :44.45  
##                                  Mean   :43.92  
##                                  3rd Qu.:48.50  
##                                  Max.   :59.60  
##                                  NA's   :2      
##  bill_depth_mm   flipper_length_mm
##  Min.   :13.10   Min.   :172.0    
##  1st Qu.:15.60   1st Qu.:190.0    
##  Median :17.30   Median :197.0    
##  Mean   :17.15   Mean   :200.9    
##  3rd Qu.:18.70   3rd Qu.:213.0    
##  Max.   :21.50   Max.   :231.0    
##  NA's   :2       NA's   :2        
##   body_mass_g       sex           year     
##  Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :4050   NA's  : 11   Median :2008  
##  Mean   :4202                Mean   :2008  
##  3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :6300                Max.   :2009  
##  NA's   :2

Graphical exploration

  • Try out some of the plots you already know:
    • pair plot
    • scatterplot
    • histogram
  • And some of the others we haven’t used in class yet:
    • boxplot()
    • coplot()

Boxplots

I’ll give you a template for the coplot and boxplot syntax:

boxplot(penguins$bill_depth_mm)

boxplot(bill_depth_mm ~ sex, data = penguins)

How are these two plots different?

  • What is the different insight you can gain from each boxplot separately and together?

  • It might be easier to compare them side-by-side:

par(mfrow = c(1, 2))
boxplot(penguins$bill_depth_mm)
boxplot(bill_depth_mm ~ sex, data = penguins)

See if you can figure out what the par(mfrow = c(1, 2)) code did. Folks who are in the lab course may be able to help.

Coplots

Coplots can be difficult to interpret

Here’s some sample code:

coplot(body_mass_g ~ bill_depth_mm | sex, data = penguins)

## 
##  Missing rows: 4, 9, 10, 11, 12, 48, 179, 219, 257, 269, 272
  • What variable did I use as the conditioning variable?
  • Try different conditioning variables.
    • You should try both categorical and numerical conditioning variables.

Saving plots to a file

There are several methods of exporting graphics, but the best way is with one of the image output functions.

I’ll demonstrate the png() function.

The process is easy:

  1. call png() with appropriate arguments.
  2. Run the code you need to build your plot.
  3. Call dev.off() to tell R to save the file.

PNG export demo:

require(here)
png(filename = here("basic_histogram.png"), width = 800, height = 600)
hist(penguins$body_mass_g)
dev.off()

Now you should find the image file in your eco 602 r project directory.

If you like, you can check out a tutorial I created.

Questions and deliverables

As a group, select two different classes of plots.

Try plotting different combinations of variables.

Your group will submit a single document.

For the two classes of plots you chose:

  • Export your plots to image files and paste them into a document along with your responses to the questions below
  • Describe whether the plot shows a summary of the data, or all of the data points.
  • Describe the insight your plot can provide, for example:
    • Are the data evenly distributed, or skewed?
    • If your plot contains more than one variable, does the plot reveal any interesting relationships?