Importing data from files can be a pain. Believe it or not, for any project a lot of time is spent on data import and cleaning/screening.

Any tools we can use to speed up the data import process are very helpful!

Working directories: the here package to the rescue!

Some background:

  • R utilizes the concept of a working directory to locate file resources.
  • The working directory is simply the directory where R will look for files.
  • You can change the working directory at any time using the function setwd(). You should almost never use the setwd() function.

However changing your working directory means that R will look in the new location for files. If you had previously been working with a file called my_data.csv that was in your working directory, but you changed working directories R will no longer be able to find my_data.csv.

🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨

Relying on setwd() in your R code is an indicator you may be veering into code smell territory.

🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨 🦨

Understanding the working directory and locating data files are two of the biggest sources of frustration for new R users.

We’ll try to save some time and headaches by using RProjects and the package here.

Whenever find yourself using the setwd() function, it should raise red flags. You should consider using here() instead.

About the the here package

The package here is designed to work with RProjects.

One of the most frustrating aspects for new users of R is understanding the concept of a working directory and importing data files.

Here simplifies these tasks when used within an RProject.

Using here()

For this example, I’m going to assume:

  • You are working with RStudio and that you have an RProject loaded.
  • You have a subdirectory of your main RProject directory called data.
  • There is a file called my_data.csv containing data in the comma separated value format within your data subfolder.

To follow along with the example, you can download the data file from the Data Files section of the ECO 602 page.

The function here() returns the absolute path to the base directory of your RProject.

For example, on my computer when I’m working in the RProject for the ECo 602 course, here produces:

here()
## [1] "C:/git/environmental_data"

Note that here() will always point to this directory, even if my working directory is set to a different location. For example, I might have set my working directory to assignments using getwd():

getwd()
here()
## [1] "C:/git/environmental_data/assignments"
## [1] "C:/git/environmental_data"

Opening a file with here()

Here’s why here() is so useful.

Recall that my file is located in the data subdirectory of my RProject folder.

If my working directory were set to the main RProject directory I could just type:

read.csv("data/my_data.csv")

But we know my working directory is pointed to a different folder so I get the following:

read.csv("data/my_data.csv")
## Error: <text>:1:1: unexpected INCOMPLETE_STRING
## 1: 'Error in file(file, "rt") : cannot open the connection
##     ^

Here is here() to the rescue:

read.csv(here("data", "my_data.csv"))
##   basin sub sta
## 1     D  AL   1
## 2     D  AL   2
## 3     D  AL   3

Basic here() syntax.

You’ll notice I typed:

read.csv(here("data", "my_data.csv"))

When you call here() you should include the subdirectories (in the correct order) and filename as character values (i.e. with quotations marks). The function will assemble the arguments into an absolute path to the file:

here("data", "my_data.csv")
## [1] "C:/git/environmental_data/data/my_data.csv"

NOTE: if your file is located several subdirectories in, you have to list the directory names in the order in which they are nested.

For example if my data file were located within a subdirectory of data called data_sets I would type

here("data", "data_sets", "my_data.csv")
## [1] "C:/git/environmental_data/data/data_sets/my_data.csv"
  • You can consult the help entry for here() for a more detailed description.

A reality check: file.exists()

here() is not foolproof. If you don’t tell it the correct subdirectory or filename to search for, it won’t find your file!

An easy way to tell whether you are looking in the right spot for your file is the function file.exists():

file.exists(here("data", "data_sets", "my_data.csv"))
## [1] FALSE
## [1] FALSE

Oops, I forgot that my data file is one directory back in the data folder:

file.exists(here("data", "my_data.csv"))
## [1] TRUE

And I’m good to go!