For this exercise, you’re going to load an external dataset into R.
You’ll perform numerical and graphical analyses, and try to visually fit some models to data.
As a group, work through the following sections.
Everyone in your group should run the code on their own computer. Please work with your group members to work through any difficulties!
Make sure you keep track of any R code that works well for you. It’ll be helpful to have an arsenal of code examples you can use for other analyses!
You’ll be working with bird census data collected in Oregon. You can find information about the data in the birds_metadata.pdf file.
All the course data files are housed on the GitHub site under the Course Assignment Data tab of the Course Materials section.
You need to save all the data files to the data subdirectory of your main course RProject folder.
The data files you’ll work with are:
The metadata (data about data) file is:
You should generally use the read.csv()
in conjunction
with here()
function to read the data files into
data.frame
objects.
If your data files are saved in your data subdirectory, the syntax should be very similar to that which you used in the in-class reading data exercise. Super easy!
If you’re using file.choose()
you’re doing it wrong.
You’ll lose points if you submit work using file.choose()
in this course.
data.frame
called dat_bird
.data.frame
called dat_habitat
.Look at the column names in the two habitat datasets. You can consult the metadata file to see what each column represents.
The names()
function will return the column names of a
data.frame
object. It also works with other data structures
which can have named elements, like lists.
Here’s a quick detour to show you how to create simple pair plots on the Palmer penguin dataset:
require(palmerpenguins)
pairs(penguins)
That created a pair plot of all the columns in penguins
.
It’s a little bit unwieldy, so let’s simplify:
I can use the syntax below (subsetting by name) to include only bill, flipper, and body mass characteristics:
pairs(penguins[, c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g")])
At this point, you should be able to answer the following questions
data.frame
object?data.frame
?Your group should explore the habitat data using pair plots of
subsets of the columns in dat_habitat
. Note that there are
too many columns to create a meaningful pairplot of all of them - you’ll
have to use subsetting to make a selection of columns to plot.
## [1] "basin" "sub" "sta" "lat" "long" "elev" "slope" "aspect" "s.id" "s.edge" "p.edge"
## [12] "p.edge.1" "p.cwedge" "ba.con" "ba.hard" "ba.snag" "ba.tot" "ba.ratio" "snag.sml" "snag.ml" "snag.l" "snag.dc1"
## [23] "snag.dc2" "snag.dc4"
Use the hist()
function to explore the distributions of
counts of birds at the various study sites.
I used data the column CBCH
to create the following:
I used the arguments xlab = "Number of birds counted"
and breaks = 0:7 - 0.5
with hist()
to produce
the following:
Create some histograms of bird counts. Experiment with the
breaks
argument.
You may want to try to re-create my histogram above as a first step.
breaks
argument does not span the range of
count values you will get an error. You may need to try a higher upper
limit.max()
function to determine the
highest number of counts of your bird species column.Proceed as far as you can during the class session and include a report on your group’s progress in the Moodle assignment page. Only one group member needs to submit an answer.
Respond to the following questions: