Objectives

  • Data frame subsetting
  • Creating histograms

Instructions

Your group will submit a single document, but every member of your group needs to create their own markdown document and follow the code in the exercise.

Crate an R Script file in RStudio

Create a new R Markdown file and name it in_class_histograms.Rmd.

Install the palmerpenguins R Package

One of the great things about R is that you can extend it by installing packages.

Just use the install.packages() function.

For this exercise your group will need to install the palmerpenguins package, which contains data collected from 3 species of penguin.

install.packages("palmerpenguins")

You only need to install a package once. After the package installs, you should put a comment character at the beginning of the line so that you don’t re-install the package every time!

Load a Package

After you’ve successfully installed a package, you have to tell R to load it into memory using the library() or require() functions.

  • I prefer to use require():
require("palmerpenguins")

Inspect the Penguins Data Frame

Now you’ll have the penguins data frame loaded into R’s memory.

  • Take a look at the Environment window in the upper right panel of RStudio.

Use the head() function to print the first 6 lines of the penguins data frame.

  • I’ll let you figure out the code for this.
  • Note the names of the columns.

Subset By Name

Recall from the DataCamp assignment that you can pull out a named column of a data frame using the dollar sign symbol, followed by the name of the column.

Try it for the body_mass_g column.

Histograms

There are 4 numeric columns in the penguins data:

  • bill_length_mm
  • bill_depth_mm
  • flipper_length_mm
  • body_mass_g

Each member of your group needs to choose one of the columns and build a histogram. If your group has more than 4 members, then two members may choose the same column.

Remember how to use the dollar sign to subset a named column?

Here’s a very basic histogram of body mass

hist(penguins$body_mass_g)

Note:

  • The ugly title
  • The ugly x-axis name

Each group member needs to plot a histogram of their chosen column, with the following customizations:

  1. The plot title should include your first name.
  2. The x-axis needs to have a better title.

Here’s how I could use the main argument to customize my body length histogram with my name:

hist(
  x = penguins$body_mass_g,
  main = "Mike's Histogram of Penguin Mass")

You can also use the xlab argument to customize the x-axis name.

See if you can re-create this plot without seeing the code:

Additional plot options

If your group finishes all of your histograms early and wants to try some more histogram customizations, try out some of the other arguments to hist(), for example:

  • ylab
  • col
  • breaks

Group R-Script

Your group will submit an R-script and at least one question for me for discussion on Thursday.

  1. Your group members’ names, in comments, at the top of the script
  2. Code to load the palmerpenguins package
  3. Code to plot the histograms:
  • Custom title
  • Custom x-axis

Rubric: 20 points possible

  • 2 pts. Group members’ names in comments
  • 2 pts. Load the package
  • 10 pts. Code runs and produces customized histograms.
  • 6 pts. Group question(s) for me