Learning Objectives

  • Learn basic R syntax and data structures
  • Learn how R stores different data types

Introduction

In previous years, the lecture portion of the course did not include any training in R. Many students, especially students in programs other than ECo do not enroll in the lab and therefore do not learn any R.

After considering feedback from students, faculty, and the general need to learn R for academic and professional careers I have decided to include some basic R in the lecture.

I feel that students who only enroll in the lecture are not served well by learning about data analysis without learning to perform any of the techniques in R.

Students in the lab course will gain much more experience and learn more advanced techniques, but the lecture will now include template R code so everyone can have some hands-on experience.

A note about assignment operators: <- and =

You can assign values to variables using two different operators:

<-

=

You are likely to encounter both. They assign values to variables in slightly different ways, but the difference is only relevant in advanced R programming. THe difference won’t matter in this course.

You’ll find people who are passionate about using one or the other. I personally prefer to use the equals symbol, = because it is more consistent with syntax in other languages that I work in such as Java and C. I also like to reserve the <- symbol for occasions when it truly is needed instead of =

It’s really a matter of style, and I don’t think it’s worth fighting over!

Be prepared to see both symbols. You may use either symbol in your coding work.

Instructions

  • If you haven’t already done so, follow the DataCamp course invite link (see course Moodle) and create an account.
  • Complete the first DataCamp assignment, the course “Introduction to R”, in DataCamp.
  • Answer the assignment questions on Moodle.

List of important functions and operators you need to know:

  • create a vector: c(1, 2, 3)
  • show the object type: str(my_vec)
  • Assignment operators: <- and/or =
  • set or retrieve the names of elements: names(my_vec)
  • calculate the sum of all elements in a vector: sum(my_vec)
  • logical comparisons: >, >, >=, <=, ==, !=
  • subsetting single elements with single brackets: my_vec[3]
  • subsetting multiple elements with single brackets: my_vec[c(1, 2, 4)]
  • sequences with the colon syntax: 2:5
  • subsetting with square brackets and characters: my_vec["element_1"]
  • calculate arethmitkc average, or mean: mean(my_vec)
  • create a vector of Boolean values using a comparator
  • subsetting with square brackets and a vector of Boolean values
  • matrix()
  • cbind() and rbind()
  • rowSums() and colSums()
  • subsetting 2D data (matrices and data.frames)
  • summary() - summary performs different things on different types of objects
  • head() and tail()
  • data.frame()
  • subsetting by ranges
  • subsetting with $
  • conditional subsetting with subset())
  • order()
  • sort()
  • subsetting lists: double brackets and dollar sign

Functions for building multi-element structures

  • c()
  • matrix()
  • data.frame()
  • list()
  • array() - Note: we probably won’t use this function in the course, but you should know that it exists.

Key concepts from the DataCamp course:

  • Arithmetic operators
  • Variable assignment
  • retrieving values using print()
  • arithmetic with variables
  • incompatible data types
  • Data types
  • numeric, integer
  • Boolean
  • strings and characters
  • checking the data type, class of objects
  • variable names
  • use of variables vs ‘hard-coding’
  • vector arithmetic is element-wise
  • element names: names()
  • matrix and data.frame arithmetic is element-wise

Questions

Variables: Q1 - Q6

Create:

  • A variable a that contains the text of your first name.
  • A variable b1 that contains the number 45.6
  • A variable b2 that contains the text “45.6”
  • A variable c1 that contains the sequence of integers from 0 to 3
  • Q1 (1 pt.): What type of data is contained in the variable a?
  • Q2 (1 pt.): What type of data is contained in the variable b1?
  • Q3 (1 pt.): What type of data is contained in the variable b2?
  • Q4 (2 pts.): Explain what happens when you try to add b1 and b2 and why.
  • Q5 (1 pt.): Are the variables b1 and c1 the same type? Why or why not?
  • Q6 (3 pts.): Explain what happens when you add b1 and c1. Consider both the number of elements in each variable and the data types.

Vectors: Q7 - Q9

Create a vector called v1 that contains a sequence of integers from -2 to 2.

When you print the contents of v1, it should look like this:

## [1] -2 -1  0  1  2

Now, use v1 to create a new vector called v2 whose elements are the elements of v1 multiplied by 3. It should look like this:

## [1] -6 -3  0  3  6

Finally, calculate the sum of all the elements in v2.

  • Q7 (1 pt.): Show the R code you used to create v1.
  • Q8 (1 pt.): Show the R code you used to create v2.
  • Q9 (1 pt.): Show the R code you used to calculate the sum of elements in v2.

Matrices: Q10 - Q11

Do you remember the byrow argument to the matrix() function?

Create a vector called vec_4 whose elements are the integers from 1 to 12.

Create a matrix mat_1 from vec_4 that has three rows and four columns. The values in mat_1 should be sequentially increasing by row.

  • For example, the first row of mat_1 should contain the values 1, 2, 3, 4.

Create a matrix mat_2 from vec_4 that has three rows and four columns. The values in mat_2 should be sequentially increasing by column.

  • For example, the first column of mat_1 should contain the values 1, 2, 3.
  • Q10 (1 pt.): Show the code you used to create mat_1.
  • Q11 (1 pt.): Show the code you used to create mat_2.

Lists: Q12 - Q14

Create a list, named my_list_1 with following three elements:

  • first element is numeric: 5.2
  • second element is a string: “five point two”
  • third element is a vector of all integers from 0 to 5 [how do you do this?]

Name the elements in my_list_1:

  • “two”
  • “one”
  • “three”

Make sure the elements in your list are in the order specified. (look at the names closely)

Hint: remember the subsetting operators [[]] and $?

  • Q12 (2 pts.): Show the R code you used to create my_list_1.
  • Q13 (1 pt.): Show valid R code that selects the third element of the list.
  • Q14 (1 pt.): Show the R code that selects the list element with the name “one”. Note: there are at least two ways to do this!

Logical Tests and Subsetting: Q15 - Q16

Run the following code to build a vector called my_vec and print its contents:

my_vec = rep(1:3, 5)
my_vec
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Use the logical equality test operator == to create a vector, my_bool_vec, of Boolean values from my_vec.

  • my_bool_vec should be the same length as my_vec.
  • my_bool_vec should have TRUE values in the positions where my_vec has values of 3.

You can run the following code to check that you have the correct values in my_bool_vec:

data.frame(my_vec, my_bool_vec)
##    my_vec my_bool_vec
## 1       1       FALSE
## 2       2       FALSE
## 3       3        TRUE
## 4       1       FALSE
## 5       2       FALSE
## 6       3        TRUE
## 7       1       FALSE
## 8       2       FALSE
## 9       3        TRUE
## 10      1       FALSE
## 11      2       FALSE
## 12      3        TRUE
## 13      1       FALSE
## 14      2       FALSE
## 15      3        TRUE

Use my_bool_vec to retrieve all of the elements of my_vec that have a value of 3.

Hint: Use the square bracket subsetting operator: [].

Your result should look like this:

## [1] 3 3 3 3 3
  • Q15 (3 pts.): Show the R code that you used to create my_bool_vec.
  • Q16 (2 pts.): Show the R code that you used to subset my_vec using my_bool_vec.