In this assignment you’ll:
here
package.R is very extensible. That is one of its greatest strengths! There are hundreds of R packages that contain functions for performing analyses and creating graphics beyond what is included in the base R.
I’ll walk through the process of installing and loading an R package in the following sections.
I’m going to use the here
packages as an example.
The
here
package is designed to make file import/export easier. It works in conjunction with an RProject.
install.packages()
functionMost R packages can be installed with the
install.packages()
function.
The syntax for basic usage is simple: just type in the name of the
package you want to install (in quotes). By default
install.packages()
searches the CRAN repositories for a
matching package.
To install here
you can just type:
install.packages("here")
Advanced package installation
There are a lot of options for installing packages. You should
check out the help entry for install.packages()
to learn
about the arguments.
You can also install packages from other repositories, including bio conductor, and GitHub.
Some packages come as pre-complied binaries while some others must be compiled from source code. If you aren’t sure what these terms mean, don’t worry. R will let you know if you have to install a package from source, and you’ll be prompted to install the necessary packages and other tools.
The syntax to install here
is simple:
install.packages("here")
Depending on which operating system you are using, R and RStudio
versions you have, and the packages you already have installed, you may
get a a message about installing dependencies. You should click ‘ok’ to
install any of the additional packages that here
might
need.
You may also get a popup asking you to choose a repository. You can select the cloud option.
If R is able to install the package successfully, you’ll see a message in the console that looks something like this:
package ‘here’ successfully unpacked and MD5 sums checked
The
downloaded binary packages are in C:_packages
When R first starts, it loads functions and data from the base packages. These objects are always available.
R does not load the extra packages you may have installed by default, so you need to tell it you want to use them!
There are two functions to accomplish this:
library()
require()
Both of these functions will load a package into memory, making it directly available to you.
The difference is if a package is already loaded,
library()
will re-load it, while require()
will check first. If the package is already loaded,
require()
will not re-load it.
This difference isn’t usually important and it’s up to you to choose which method you want to use.
I prefer to use require()
because some packages take a
long time to load. If you plan to run a script file many times, it can
save a lot of time if you only load packages once.
On the other hand, if you have updated a package while you are using
R and you need to load the updated version, then library()
is the way to go.
Importing data from files can be a pain. Believe it or not, for any project a lot of time is spent on data import and cleaning/screening.
Any tools we can use to speed up the data import process are very helpful!
here
package to the rescue!Some background:
setwd()
. You should almost never use the
setwd()
function.However changing your working directory means that R will look in the
new location for files. If you had previously been working with a file
called my_data.csv
that was in your working directory, but
you changed working directories R will no longer be able to find
my_data.csv
.
🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨
Relying on setwd()
in your R code is an indicator you
may be veering into code smell
territory.
🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨
Understanding the working directory and locating data files are two of the biggest sources of frustration for new R users.
We’ll try to save some time and headaches by using RProjects and the package here.
Whenever find yourself using the setwd()
function, it
should raise red flags.
You should seriously consider using here()
instead.
here
packageThe package here
is designed to work with
RProjects.
One of the most frustrating aspects for new users of R is understanding the concept of a working directory and importing data files.
Here
simplifies these tasks when used within an
RProject.
here()
For this example, I’m going to assume:
data
.my_data.csv
containing data in
the comma separated value format within your data
subfolder.To follow along with the example, you can download the data file from the Data Files section of the ECO 602 page.
The function here()
returns the absolute path
to the base directory of your RProject.
For example, on my computer when I’m working in the RProject for the ECo 602 course, here produces:
here()
## [1] "C:/git/environmental_data"
Note that here()
will always point to this directory,
even if my working directory is set to a different location. For
example, I might have set my working directory to
assignments
using getwd()
:
getwd()
here()
## [1] "C:/git/environmental_data/assignments"
## [1] "C:/git/environmental_data"
here()
Here’s why here()
is so useful.
Recall that my file is located in the
data
subdirectory of my RProject folder.
If my working directory were set to the main RProject directory I could just type:
read.csv("data/my_data.csv")
But we know my working directory is pointed to a different folder so I get the following:
read.csv("data/my_data.csv")
## Error: <text>:1:1: unexpected INCOMPLETE_STRING
## 1: 'Error in file(file, "rt") : cannot open the connection
## ^
Here is here()
to the rescue:
read.csv(here("data", "my_data.csv"))
## basin sub sta
## 1 D AL 1
## 2 D AL 2
## 3 D AL 3
here()
syntax.You’ll notice I typed:
read.csv(here("data", "my_data.csv"))
When you call here()
you should include the
subdirectories (in the correct order) and filename as character values
(i.e. with quotations marks). The function will assemble the arguments
into an absolute path to the file:
here("data", "my_data.csv")
## [1] "C:/git/environmental_data/data/my_data.csv"
NOTE: if your file is located several subdirectories in, you have to list the directory names in the order in which they are nested.
For example if my data file were located within a subdirectory of
data
called data_sets
I would type
here("data", "data_sets", "my_data.csv")
## [1] "C:/git/environmental_data/data/data_sets/my_data.csv"
here()
for a more
detailed description.file.exists()
here()
is not foolproof. If you don’t tell it the
correct subdirectory or filename to search for, it won’t find your
file!
An easy way to tell whether you are looking in the right spot for
your file is the function file.exists()
:
file.exists(here("data", "data_sets", "my_data.csv"))
## [1] FALSE
## [1] FALSE
Oops, I forgot that my data file is one directory back in the
data
folder:
file.exists(here("data", "my_data.csv"))
## [1] TRUE
And I’m good to go!
Now that you know all about using here()
, you’re ready
to work with the assignment data.
You will be working with the bird census habitat data for this
assignment. Download the data file and save them to the
data
sub directory of your main ECo 602 repository
directory. You can find the file ‘hab.sta.csv’ in Assignment Data
Files in the Course Materials section of the class GitHub
page.
The metadata is in the file ‘birds_metadata.pdf’, which describes the data and decodes the names of the columns.
You have saved the data file to a sub directory called
data
and you now know how to use the here()
function to make finding it easy.
Use here()
and read.csv()
to read
hab.sta.csv
into a data.frame
called
dat_habitat
.
Let’s focus on the terrain variables at the sampling locations:
and the tree cover, as measured by basal area.
Examine histograms of the three terrain variables.
This is how my basic histogram of slope looks:
Next, create scatterplots of the three terrain variables (on the x-axis) and basal area (on the y axis).
Hint: use the plot()
function to make scatterplots.
main
, xlab
, and
ylab
arguments to plot()
to customize your
scatterplots.Here’s my plot of basal are and slope:
Recall the visual estimation of linear models from the in-class activity. Here is the code again to visually parameterize a linear function. Try to estimate linear function parameters using your scatterplots. Add the lines to your scatterplots to judge the fit visually.
Here are the linear parameterization functions again:
# Calculates the value of y for a linear function, given the coordinates
# of a known point (x1, y1) and the slope of the line.
line_point_slope = function(x, x1, y1, slope)
{
get_y_intercept =
function(x1, y1, slope)
return(-(x1 * slope) + y1)
linear =
function(x, yint, slope)
return(yint + x * slope)
return(linear(x, get_y_intercept(x1, y1, slope), slope))
}
Recall how we used them on the Iris data? You could probably fit a better line than the one I show below.
plot(
x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal Length",
ylab = "Petal Width",
main = "Visually-estimated linear model fit\nIris petal length and width"
)
curve(line_point_slope(x, x1 = 3.5, y1 = 1.25, slope = 0.4), add = TRUE)
You should review the week 2 in-class activity instructions if you need a refresher.
You’ll need these for the assignment questions.
Instructions:
You might want to skip ahead and read the read the terrain/basal area scatterplots question below for an idea of how to organize your plots.
Hint: you can use par(mfrow = c(3, 1))
to create a
figure with three panels arranged in a single column.
Hint: par(mfrow = c(1, 3))
will create a figure with
three panels arranged in a single row.
Hint: Choose dimensions for your output file so that the individual histograms have an appropriate aspect ratio.
Hint: You may notice some peculiarities with the rightmost bin in
the aspect histogram. Consider the units in which aspect is measured and
check out the breaks
argument for
hist()
.
Consider the distribution of elevations at the bird census sample sites.
Your answer should be 1-2 short paragraphs in length.
What are the units of slope in this data set?
Hint: Properly curated data often has associated ____data…
Consider the distribution of slopes at the bird census sample sites.
Your answer should be 1-2 short paragraphs in length.
Consider the distribution of aspect at the bird census sample sites.
Your answer should be 1-2 short paragraphs in length.
Instructions:
line_point_slope()
function code
# Calculates the value of y for a linear function, given the coordinates
# of a known point (x1, y1) and the slope of the line.
line_point_slope = function(x, x1, y1, slope)
{
get_y_intercept =
function(x1, y1, slope)
return(-(x1 * slope) + y1)
linear =
function(x, yint, slope)
return(yint + x * slope)
return(linear(x, get_y_intercept(x1, y1, slope), slope))
}
A plot with three panels in a single row, or in a single column can be awkwardly long or tall. What if you combined your histograms and scatterplots into a larger figure with 6 panels?
par(mfrow = c(3, 1))
to create a
figure with three panels arranged in a single column.par(mfrow = c(1, 3))
will create a figure with
three panels arranged in a single row.cex
argument in your
plot()
call.col
argument to plot()
.For each terrain variable (elevation, slope, aspect), describe the relationship you observe and your model fit. You should consider
Compile your answers to all 8 questions into a pdf document and submit via Moodle.