It’s very easy to forget R programming concepts if you don’t use them frequently.
A great way to retain the skills that will be important for you going forward is to create a set of examples using the functions you want to remember.
You’ll create an R Markdown document containing examples of how to use the basic R components that you need to know.
For this portion of the final project, you’ll perform a data analysis on data collected or two species of small mammals in the Atlantic Forest of Brazil.
You’ll find the data file, delomys.csv, in the data tab of the course GitHub page.
It includes data extracted from a larger data set:
You’ll create a new RMarkdown document for your R reference guide:
final_R_reference.Rmd
It needs to be in the docs
subfolder (along with your
index.Rmd
file).
To create a link to the R reference page, you can insert the
following line into your index.Rmd
file:
[Final Project: R Reference Guide](final_R_reference.html)
When you re-knit index.Rmd
your index.html
will contain a link to your r reference guide.
You need to create a document that uses tabs to organize the content. Here’s a template:
# R Reference Guide {.tabset .tabset-pills}
## Loading Data and Packages
## Next Section...
...
A successful code example should communicate what the function or operator does.
The best examples are:
Your code examples need to be written in your own words, in such a way that you will be able to decipher them later.
You should use comments in your code as needed to highlight important points.
The following is a code example for the c()
function.
You may use this example verbatim, however all other examples must be
your own.
The function c()
combines or
concatenates its arguments into a vector (a 1-dimensional data
structure consisting of 1 or more elements).
c()
Here’s two examples using numeric and character data types:
## Create a vector of numbers:
num_vec = c(1, 4, 8, 9, 13)
## Create a vector of characters:
char_vec = c("a", "fish", "data is cool")
I can show the contents of a vector by typing the name of the vector,
or using the print()
function.
## Typing the name of the vector into the console prints the contents
num_vec
## [1] 1 4 8 9 13
## The print() function accomplishes the same task:
print(char_vec)
## [1] "a" "fish" "data is cool"
All functions are denoted with parentheses.
The required function arguments are contained in indented bullet points below the corresponding function.
Use these to show how to load the here
and
palmerpenguins
packages
libary()
and require()
Ginkgo data: use the 2021 ginkgo data to create a
data.frame
called ginkgo
using:
here()
read.csv()
c()
length()
matrix()
data.frame()
Use the ginkgo data.frame
to create examples of:
nrow()
ncol()
dim()
Use the ginkgo data for these examples:
$
Subset a data frame by name: select one of the
columns in the ginkgo data[]
Use subset by position to:
subset()
Use this function to retrieve all the data for
Adelie penguins (in the species column) from the peuguins dataset.You may use the ginkgo or Palmer penguin data to create examples of:
summary()
mean()
sd()
Scatterplot: Using the ginkgo data, reate a scatterplot of max leaf depth (x) and max leaf width (y).
plot()
required arguments:
col =
pch =
cex =
main =
xlab =
ylab =
xlim =
ylim =
hist()
Create a histogram of penguin flipper
lengths. Required arguments:
breaks =
boxplot()
seeds_present
column.Create a 4-panel figure of histograms, arranged in a 2 by 2 grid. You may use any data you like, but each histogram must be different and have appropriate titles and axes.
par()
required arguments:
mfrow =
You’ll perform a complete data analysis on the Delomys species data. You can do your work in a RMarkdown document, or an R script (RMarkdown preferred).
Create a code chunk that includes the following:
Use summary()
on the body mass and body length data
columns in the Delomys data set to display summary statistics.
Perform a test of normality on the body mass and length columns.
You can use shapiro.test()
You can adjust the size of the plots on your rendered document using the following code chunk arguments:
fig.height=
fig.width=
You can adjust the aspect ratio using fig.aspect=
Using the penguins data as an example, here’s an example code chunk
using the fig.width
option.
```{r fig.width=10}
require(palmerpenguins)
plot(bill_length_mm ~ body_mass_g, data = penguins)
```
Producing the following output:
require(palmerpenguins)
plot(bill_length_mm ~ body_mass_g, data = penguins)
You will need to experiment with different width, height, and/or aspect values for each of your figures.
Using code chunks, create the following plots, which you’ll use to answer the report questions:
binomial
)sex
)Answer the following in your report:
We know that the normality assumption applies to the residual values after we fit a model.
Using a code chunk, fit 5 models using lm()
:
body_length ~ body_mass
body_mass ~ sex
body_mass ~ binomial
body_mass ~ sex + binomial
body_mass ~ sex * binomial
binomial
and sex
to predict body mass.Save your model objects to variables called fit1
,
fit2
, fit3
, fit4
,
fit5
.
Let’s check whether our models fulfill the assumption of normality of the residuals.
First, use a graphical approach: plot histograms of the model residuals.
residuals()
function. For example, I could get the
residuals from the first model using residuals(fit1)
.Use a code chunk to create histograms of the residuals of each of the 5 models.
Next, use shapiro.test()
on each model to test the null
hypothesis that the residuals are drawn from a normally-distributed
population.
Answer the following in your report:
You can use the following code within a code chunk to print out a nicely formatted model coefficient table:
knitr::kable(coef(summary(my_model_fit)))
where my_model_fit
is the name of your fitted model
object.
You can use similar syntax to print a nicely formatted ANOVA table:
knitr::kable(anova(my_model_fit))
digits
argument to control how many
decimal digits are printed.Print the model coefficient table using summary()
and
answer the following:
Print the model coefficient tables for each of the body mass model fits.
Answer the following:
Print the ANOVA tables for each of the body mass models.
Answer the following in your report:
You built four different models of body mass. How do you choose the best one?
One option is to choose the model with the lowest AIC. You can
calculate AIC using the appropriately named AIC()
function.
Create a code chunk that calculates the AIC values for each of the body mass models.
Compile your answers to the 18 questions and submit them as a pdf or html document in Moodle.
Your final draft report should include only your figures and answers to the questions. Do not include any extraneous R code that you may have used in your rough draft. I do not need to see the code you used to read the data.
If you are using RMarkdown, you may add the code chunk options
echo=FALSE
and results='hide'
to suppress the
printing of any R code or output you wish to hide.