In-class File Import And Logical Subset Exercise Introduction to Quantitative Ecology:
read.csv()
and
here()
You’ve already installed them, but just in case here’s a reminder:
You’ll need install the here
and
palmerpenguins
packages before you can complete this
assignment.
Remember how we installed the penguins package?
install.packages("palmerpenguins")
You can use similar syntax to install the here
package.
You’ll need to download the grazing_data.csv
data file
and save it in your data
subdirectory.
here
packageDo you remember how you used the require()
function to
load the palmerpenguins
package?
This is the syntax you used to load the penguins package:
require(palmerpenguins)
Use similar syntax to load the here
package.
To read data in a csv file, you’ll use the here()
and
read.csv()
functions.
Review the instructions for the week 3 pre-class assignment if you need a refresher.
Read the data file into a data.frame
object called
grazing_dat
data.frame
To test that you’ve read the file correctly, run the following code to preview the first six lines:
head(grazing_dat)
## X abundance replicate grass pasture
## 1 1 9 1 short upper
## 2 2 11 2 short upper
## 3 3 6 3 short upper
## 4 4 14 1 med upper
## 5 5 17 2 med upper
## 6 6 19 3 med upper
The head()
function will print out the first six rows of
a data.frame
object.
data.frame
You may recall from the DataCamp assignment that there are two primary ways to subset columns from a data frame:
You can retrieve a named column using the dollar sign. This method searches for a column in the data frame with a matching name. If found, it will print out the contents of the column. For example:
grazing_dat$abundance
## [1] 9 11 6 14 17 19 28 31 32 7 6 5 14 17 15 44 38 37
returns the contents of the abundance
column.
How would you retrieve the pasture
column?
You can use the square brackets to retrieve one or more columns by their position:
the following retrieves the first column of
grazing_dat
grazing_dat[, 1]
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
while this syntax retrieves the second and fourth columns:
grazing_dat[, c(2, 4)]
## abundance grass
## 1 9 short
## 2 11 short
## 3 6 short
## 4 14 med
## 5 17 med
## 6 19 med
## 7 28 tall
## 8 31 tall
## 9 32 tall
## 10 7 short
## 11 6 short
## 12 5 short
## 13 14 med
## 14 17 med
## 15 15 med
## 16 44 tall
## 17 38 tall
## 18 37 tall
data.frame
rows with logical testsWe’ll use the penguins data for an example. Run the following code to
turn the data into a data.frame
before you start:
require(palmerpenguins)
penguins = data.frame(penguins)
Recall that we can use the $ or [] to extract entire columns from a data frame. For example I can pull out the flipper length column.
penguins$flipper_length_mm
I can also use the subset()
function along with a
logical test to pull out rows that meet criteria that
we specify.
For example, I can pull out all the penguins that were measured on Torgersen island:
subset(penguins, island == "Torgersen")
Then I could plot a histogram of their body masses:
torger_penguins = subset(penguins, island == "Torgersen")
hist(
x = torger_penguins$body_mass_g,
main = "Body mass of penguins on Torgersen Island",
xlab = "body mass (g)"
)
Things to note from the example:
subset()
is the data frame I want
to subset==
Your group will submit code that accomplishes the following tasks:
data.frame
object called grazing_dat
using
only the functions here()
and read.csv()
.
read_csv
(with an underscore) or
file.choose()
grazing_dat
.grass
column
from grazing_dat
.Submit your answers as a knitted html file on Moodle.
Challenge: Using two successive calls to subset()
can
you create a histogram of the body masses of only the male penguins on
Dream island?