You’ll be working with the palmerpenguins
dataset to
practice creating and interpreting different plot types in R
The purpose of this activity is for you to explore different types of plots and discuss the different kinds of insight they provide.
palmerpenguins
package and datasetYou can check out the information about the data set here. It’s a nice alternative to the original iris data we’ve used before.
Before you can use it you’ll have to install the
palmerpenguins
package:
install.packages("palmerpenguins")
If you haven’t already, you should also install the here
package while you’re at it:
install.packages("here")
Since these packages aren’t part of base-R, you have to tell R that
you want to use the using the require()
or
library()
function. I prefer require()
:
require(palmerpenguins)
require(here)
To Quote or not to quote
Quotation marks are optional when loading installed packages, even though it seems like you should need them. - It’s also ok to include them. - This seems inconsistent because we had to use quotation marks to install the packages!
This will also run:
library("palmerpenguins")
The details are kind of complicated, but it has to do with environments. Essentially, R keeps a list of all the packages you’ve installed.
install.packages()
.Use
class()
to check what kind of objectpenguins
is.
data.frame
.data.frame
object:penguins = data.frame(penguins)
Try to calculate the mean body mass of all the penguins:
mean(penguins$body_mass_g)
## [1] NA
Oops… What happened?
Use head()
to preview the data. Do you see any potential
issues?
Some functions, like mean()
don’t work well with missing
data, i.e. elements that are NA
or NULL
.
mean()
.na.rm
argument.
Try out the summary()
function on the entire
penguins
data frame. This function provides a lot of
information:
## species island bill_length_mm
## Adelie :152 Biscoe :168 Min. :32.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23
## Gentoo :124 Torgersen: 52 Median :44.45
## Mean :43.92
## 3rd Qu.:48.50
## Max. :59.60
## NA's :2
## bill_depth_mm flipper_length_mm
## Min. :13.10 Min. :172.0
## 1st Qu.:15.60 1st Qu.:190.0
## Median :17.30 Median :197.0
## Mean :17.15 Mean :200.9
## 3rd Qu.:18.70 3rd Qu.:213.0
## Max. :21.50 Max. :231.0
## NA's :2 NA's :2
## body_mass_g sex year
## Min. :2700 female:165 Min. :2007
## 1st Qu.:3550 male :168 1st Qu.:2007
## Median :4050 NA's : 11 Median :2008
## Mean :4202 Mean :2008
## 3rd Qu.:4750 3rd Qu.:2009
## Max. :6300 Max. :2009
## NA's :2
boxplot()
coplot()
I’ll give you a template for the coplot and boxplot syntax:
boxplot(penguins$bill_depth_mm)
boxplot(bill_depth_mm ~ sex, data = penguins)
How are these two plots different?
What is the different insight you can gain from each boxplot separately and together?
It might be easier to compare them side-by-side:
par(mfrow = c(1, 2))
boxplot(penguins$bill_depth_mm)
boxplot(bill_depth_mm ~ sex, data = penguins)
See if you can figure out what the par(mfrow = c(1, 2))
code did. Folks who are in the lab course may be able to help.
Coplots can be difficult to interpret
Here’s some sample code:
coplot(body_mass_g ~ bill_depth_mm | sex, data = penguins)
##
## Missing rows: 4, 9, 10, 11, 12, 48, 179, 219, 257, 269, 272
There are several methods of exporting graphics, but the best way is with one of the image output functions.
I’ll demonstrate the png()
function.
The process is easy:
png()
with appropriate arguments.dev.off()
to tell R to save the file.PNG export demo:
require(here)
png(filename = here("basic_histogram.png"), width = 800, height = 600)
hist(penguins$body_mass_g)
dev.off()
Now you should find the image file in your eco 602 r project directory.
If you like, you can check out a tutorial I created.
As a group, select two different classes of plots.
Try plotting different combinations of variables.
Your group will submit a single document.
For the two classes of plots you chose: