Importing data using read.csv() and here()

You’ve already installed them, but just in case here’s a reminder:

You’ll need install the here and palmerpenguins packages before you can complete this assignment.

Click to show/hide hint

Remember how we installed the penguins package?

install.packages("palmerpenguins")

You can use similar syntax to install the here package.

  • Note the use of quotation marks and lowercase letters.

Data

Data file

You’ll need to download the grazing_data.csv data file and save it in your data subdirectory.

Loading the here package

Do you remember how you used the require() function to load the palmerpenguins package?

Click to show/hide hint

This is the syntax you used to load the penguins package:

require(palmerpenguins)

Use similar syntax to load the here package.

  • Note that you don’t have to use quotation marks for installed packages.

Reading the data

To read data in a csv file, you’ll use the here() and read.csv() functions.

Review the instructions for the week 3 pre-class assignment if you need a refresher.

Read the data file into a data.frame object called grazing_dat

Previewing a data.frame

To test that you’ve read the file correctly, run the following code to preview the first six lines:

head(grazing_dat)
##   X abundance replicate grass pasture
## 1 1         9         1 short   upper
## 2 2        11         2 short   upper
## 3 3         6         3 short   upper
## 4 4        14         1   med   upper
## 5 5        17         2   med   upper
## 6 6        19         3   med   upper

The head() function will print out the first six rows of a data.frame object.

Extracting a column from a data.frame

You may recall from the DataCamp assignment that there are two primary ways to subset columns from a data frame:

By Name

You can retrieve a named column using the dollar sign. This method searches for a column in the data frame with a matching name. If found, it will print out the contents of the column. For example:

grazing_dat$abundance
##  [1]  9 11  6 14 17 19 28 31 32  7  6  5 14 17 15 44 38 37

returns the contents of the abundance column.

How would you retrieve the pasture column?

By Position

You can use the square brackets to retrieve one or more columns by their position:

the following retrieves the first column of grazing_dat

grazing_dat[, 1]
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18

while this syntax retrieves the second and fourth columns:

grazing_dat[, c(2, 4)]
##    abundance grass
## 1          9 short
## 2         11 short
## 3          6 short
## 4         14   med
## 5         17   med
## 6         19   med
## 7         28  tall
## 8         31  tall
## 9         32  tall
## 10         7 short
## 11         6 short
## 12         5 short
## 13        14   med
## 14        17   med
## 15        15   med
## 16        44  tall
## 17        38  tall
## 18        37  tall

Subsetting data.frame rows with logical tests

We’ll use the penguins data for an example. Run the following code to turn the data into a data.frame before you start:

require(palmerpenguins)
penguins = data.frame(penguins)

Recall that we can use the $ or [] to extract entire columns from a data frame. For example I can pull out the flipper length column.

penguins$flipper_length_mm

I can also use the subset() function along with a logical test to pull out rows that meet criteria that we specify.

For example, I can pull out all the penguins that were measured on Torgersen island:

subset(penguins, island == "Torgersen")

Then I could plot a histogram of their body masses:

torger_penguins = subset(penguins, island == "Torgersen")
hist(
  x = torger_penguins$body_mass_g,
  main = "Body mass of penguins on Torgersen Island",
  xlab = "body mass (g)"
)

Things to note from the example:

  • The first argument to subset() is the data frame I want to subset
  • The second argument is a logical test for equality
  • The syntax for the equality test uses a “double equals” sign: ==
  • The name of the island was in quotes because we wanted to match the literal text name of the island.

Report

Your group will submit code that accomplishes the following tasks:

  • Q1 (1 pt.): Reads the data file into a data.frame object called grazing_dat using only the functions here() and read.csv().
    • you cannot use read_csv (with an underscore) or file.choose()
  • Q2 (1 pt.): Prints the first six lines of grazing_dat.
  • Q3 (1 pt.): Retrieves the grass column from grazing_dat.
  • Q4 (1 pt.): Creates a histogram of the flipper lengths of penguins on Torgersen island. Your histogram must have an appropriate x-axis label and descriptive title. Hint: use the code I provided above as a template.

Submit your answers as a knitted html file on Moodle.

Optional challenge question

Challenge: Using two successive calls to subset() can you create a histogram of the body masses of only the male penguins on Dream island?