Objectives

  • Practice R coding
  • More numerical and graphical data exploration
  • Pair plots and histograms

Overview

For this exercise, you’re going to load an external dataset into R.

You’ll perform numerical and graphical analyses, and try to visually fit some models to data.

As a group, work through the following sections.

Everyone in your group should run the code on their own computer. Please work with your group members to work through any difficulties!

Make sure you keep track of any R code that works well for you. It’ll be helpful to have an arsenal of code examples you can use for other analyses!

Assignment Data

You’ll be working with bird census data collected in Oregon. You can find information about the data in the birds_metadata.pdf file.

Obtaining the Data

All the course data files are housed on the GitHub site under the Course Assignment Data tab of the Course Materials section.

You need to save all the data files to the data subdirectory of your main course RProject folder.

The data files you’ll work with are:

  • bird.sta.csv
  • hab.sta.csv

The metadata (data about data) file is:

  • birds_metadata.pdf

Reading the data

You should generally use the read.csv() in conjunction with here() function to read the data files into data.frame objects.

If your data files are saved in your data subdirectory, the syntax should be very similar to that which you used in the in-class reading data exercise. Super easy!

If you’re using file.choose() you’re doing it wrong. You’ll lose points if you submit work using file.choose() in this course.

  • Read the file bird.sta.csv into a data.frame called dat_bird.
  • Read the file hab.sta.csv into a data.frame called dat_habitat.

Look at the column names in the two habitat datasets. You can consult the metadata file to see what each column represents.

How do you check the column names? Click to show/hide a hint.

The names() function will return the column names of a data.frame object. It also works with other data structures which can have named elements, like lists.

Pair plot demo

Here’s a quick detour to show you how to create simple pair plots on the Palmer penguin dataset:

require(palmerpenguins)
pairs(penguins)

That created a pair plot of all the columns in penguins. It’s a little bit unwieldy, so let’s simplify:

I can use the syntax below (subsetting by name) to include only bill, flipper, and body mass characteristics:

pairs(penguins[, c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g")])

Self-quiz.

At this point, you should be able to answer the following questions

  • Which two R functions do I use to read a .csv file into a data.frame object?
  • How can I determine the column names in a data.frame?
  • What kind of subsetting can I use to pull out a subset of named columns?
    • What syntax can I use?

Bird habitat data exploration

Your group should explore the habitat data using pair plots of subsets of the columns in dat_habitat. Note that there are too many columns to create a meaningful pairplot of all of them - you’ll have to use subsetting to make a selection of columns to plot.

##  [1] "basin"    "sub"      "sta"      "lat"      "long"     "elev"     "slope"    "aspect"   "s.id"     "s.edge"   "p.edge"  
## [12] "p.edge.1" "p.cwedge" "ba.con"   "ba.hard"  "ba.snag"  "ba.tot"   "ba.ratio" "snag.sml" "snag.ml"  "snag.l"   "snag.dc1"
## [23] "snag.dc2" "snag.dc4"

Bird count data

Use the hist() function to explore the distributions of counts of birds at the various study sites.

I used data the column CBCH to create the following:

  • That looks odd because there are gaps between some of the bars, and the x-axis label is cryptic.

I used the arguments xlab = "Number of birds counted" and breaks = 0:7 - 0.5 with hist() to produce the following:

Bird Count Histograms

Create some histograms of bird counts. Experiment with the breaks argument.

You may want to try to re-create my histogram above as a first step.

  • If your breaks argument does not span the range of count values you will get an error. You may need to try a higher upper limit.
  • You could also use the max() function to determine the highest number of counts of your bird species column.

Instructions

  • As a group you should work through the following:
  1. Import the datasets, and take a look at the metadata.
  2. Explore the habitat data using pair plots.
  3. Explore the distributions of counts of bird species using histograms.

Deliverable

Proceed as far as you can during the class session and include a report on your group’s progress in the Moodle assignment page. Only one group member needs to submit an answer.

Respond to the following questions:

  • Q1 (3 pts.):Upload a single pair plot of selected columns in the habitat data.
    • (1 pt.) Qualitatively describe what kinds of patterns you see in the pair plot.
    • (1 pt.) Do any of the variables seem to be associated? Why or why not?
    • (1 pt.) Include the R-code you used to create the plot.
  • Q2 (3 pts.): Upload a histogram of counts for one of the bird species.
    • (2 pts.) Qualitatively describe two insights you can learn from the histogram. Consider, for example:
      • Is the distribution of counts skewed?
      • Are there lots of sites with zero observations?
      • Are the bird counts best described in terms of presence/absence, or abundance?
    • (1 pt.) Include the R-code you used to create the plot.