Importing data from files can be a pain. Believe it or not, for any project a lot of time is spent on data import and cleaning/screening.
Any tools we can use to speed up the data import process are very helpful!
here
package to the rescue!Some background:
setwd()
. You should almost never use the setwd()
function.However changing your working directory means that R will look in the new location for files. If you had previously been working with a file called my_data.csv
that was in your working directory, but you changed working directories R will no longer be able to find my_data.csv
.
🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨
Relying on setwd()
in your R code is an indicator you may be veering into code smell territory.
🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨
Understanding the working directory and locating data files are two of the biggest sources of frustration for new R users.
We’ll try to save some time and headaches by using RProjects and the package here.
Whenever find yourself using the setwd()
function, it should raise red flags. You should consider using here()
instead.
here
packageThe package here
is designed to work with RProjects.
One of the most frustrating aspects for new users of R is understanding the concept of a working directory and importing data files.
Here
simplifies these tasks when used within an RProject.
here()
For this example, I’m going to assume:
data
.my_data.csv
containing data in the comma separated value format within your data
subfolder.To follow along with the example, you can download the data file from the Data Files section of the ECO 602 page.
The function here()
returns the absolute path to the base directory of your RProject.
For example, on my computer when I’m working in the RProject for the ECo 602 course, here produces:
here()
## [1] "C:/git/environmental_data"
Note that here()
will always point to this directory, even if my working directory is set to a different location. For example, I might have set my working directory to assignments
using getwd()
:
getwd()
here()
## [1] "C:/git/environmental_data/assignments"
## [1] "C:/git/environmental_data"
here()
Here’s why here()
is so useful.
Recall that my file is located in the
data
subdirectory of my RProject folder.
If my working directory were set to the main RProject directory I could just type:
read.csv("data/my_data.csv")
But we know my working directory is pointed to a different folder so I get the following:
read.csv("data/my_data.csv")
## Error: <text>:1:1: unexpected INCOMPLETE_STRING
## 1: 'Error in file(file, "rt") : cannot open the connection
## ^
Here is here()
to the rescue:
read.csv(here("data", "my_data.csv"))
## basin sub sta
## 1 D AL 1
## 2 D AL 2
## 3 D AL 3
here()
syntax.You’ll notice I typed:
read.csv(here("data", "my_data.csv"))
When you call here()
you should include the subdirectories (in the correct order) and filename as character values (i.e. with quotations marks). The function will assemble the arguments into an absolute path to the file:
here("data", "my_data.csv")
## [1] "C:/git/environmental_data/data/my_data.csv"
NOTE: if your file is located several subdirectories in, you have to list the directory names in the order in which they are nested.
For example if my data file were located within a subdirectory of data
called data_sets
I would type
here("data", "data_sets", "my_data.csv")
## [1] "C:/git/environmental_data/data/data_sets/my_data.csv"
here()
for a more detailed description.file.exists()
here()
is not foolproof. If you don’t tell it the correct subdirectory or filename to search for, it won’t find your file!
An easy way to tell whether you are looking in the right spot for your file is the function file.exists()
:
file.exists(here("data", "data_sets", "my_data.csv"))
## [1] FALSE
## [1] FALSE
Oops, I forgot that my data file is one directory back in the data
folder:
file.exists(here("data", "my_data.csv"))
## [1] TRUE
And I’m good to go!