For this lab, you’ll need the data files:
You’ll need to read the two data files and use merge()
to combine them into a single data.frame
object.
You need to merge based on the columns: ‘basin’, and ‘sub’.
birdhab
. Make sure to call it birdhab
so that
I can run your code!dim(birdhab)
## [1] 30 159
We’ll be working with the standardized Brown Creeper abundance
(column BRCR
) and late successional forest (column
ls
) data within birdhab
The goals and objectives of this lab include
Simulation is sometimes called forward modeling, to emphasize that you pick a model and parameters and work forward to predict patterns in the data. Environmental scientists use simulation to explore what kinds of patterns emerge from proposed models.
Often they use theoretical models without accompanying data, in order to understand qualitative patterns and plan future studies.
Simulation techniques are especially useful when you have data: you can use data both to guide the structure of your simulation and to calibrate/validate your simulation model.
Some powerful uses of simulation modeling include:
It’s always a good idea to test a best-case scenario, where you know that the functions and distributions you’re using are correct, before you proceed to real data.
There are many different types of simulation models, we’ll cover one type in this lab.
Power analysis using simulation models allows you to explore how different experimental designs and sample sizes might affect your ability to get a reasonably precise estimate of your parameters.
We’ll work through an example of a power analysis simulation in the context of a linear regression.
Static environmental processes, where the data represent a snapshot of some environmental system, are relatively easy to simulate.
Keeping in mind the dual model paradigm, we can specify:
Our deterministic models might be very simple, for example: a linear
function.
Alternatively we could use any of the more complex mechanistic or
phenomenological functions we have studied such as the Ricker or
logistic functions.
For our stochastic model, we could use R’s random number generating
functions from distributions like the Normal (rnorm()
),
Poisson (rpois()
), or binomial (rbinom()
).
Here we will illustrate the process of simulating a static environmental process using a simple linear regression model based on the now familiar Oregon birds data set.
For this example, let’s examine the relationship between brown creeper abundance and the extent of late-successional forest across 30 sub-basins in the central Oregon Coast Range.
Recall that a simple linear regression model can be written symbolically as: \(Y \sim Normal(a + bx, \sigma ^2)\)
This means that the ith value of Y, \(y_i\), is equal to \(a + bx_i\) plus a normally distributed error with mean zero and variance \(\sigma^2\).
lm()
. Save your
model to a variable: fit_1
.abline()
.Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 0.0991 | 0.0508 | 1.952 | 0.061 |
ls | 0.0058 | 0.0009 | 6.705 | 0.000 |
Does the model suggest a significant relationship between late-successional forest and Brown Creeper abundance?
Recall that the data-collection effort was one realization of the stochastic process of sampling. There are many, many other possible observations that could have been sampled!
Stochastic simulation can help us explore what the data might look like if we could take another snapshot.
We can use simulation to build new datasets using the parameters estimated from the observed data.
We can also vary the parameter values, or even simulate data under alternative models!
We can simulate the system using our model and see what kind of range of data we might observe.
This can give us great insight into the level of certainty we have in our model.
This is in fact the theoretical basis for the Frequentist approach to statistical inference, in which we evaluate the likelihood of our data given the model, which implicitly means how likely would we observe our original data if we were to repeatedly sample the system.
This is exactly what simulation allows us to do: repeatedly sample the environmental system under the assumption that the model is the truth.
To generate data based on a Group 1 Simple Linear Regression, what model parts do we need to know?
Let’s build an R function to calculate the value of a linear function given a value of x, a slope parameter, and an intercept parameter.
Recall the equation for a line: \(y = \alpha + \beta x\)
linear()
, that accepts three
named arguments:
x
: a vector of x-valuesy_int
: the y-interceptslope
: the slope of the linex
that
are single numbers, or numeric vectors.Have we built a similar function in a prior lab or lecture assignment?
To test your implementation, you can compare your function’s outputs to these reference outputs:
linear(x = 1, y_int = 1, slope = 1)
linear(x = 3:5, y_int = 1, slope = 1)
linear(x = 3:5, y_int = -1, slope = 1)
linear(x = 3:5, y_int = -1, slope = 0.01)
## [1] 2
## [1] 4 5 6
## [1] 2 3 4
## [1] -0.97 -0.96 -0.95
This part is easy, it’s just rnorm()
with appropriate
arguments for mean and standard deviation!
Now you can assemble your deterministic and stochastic functions into a single data simulator.
linear_simulator()
, that will
generate random data based on a linear deterministic function with
normally-distributed errors.x
: a vector of x-valuesy_int
: the y-interceptslope
: the slope of the linest_dev
: the standard deviation of the stochastic
modelThere are many approaches your implementation could use. A simple plan might be:
You are free to implement linear_simulator()
any way you
like…as long as it works correctly.
Since linear_simulator()
will return different values
each time, it’s harder to check your implementation than a strictly
deterministic function. For now, you can use a graphical validation
approach.
Run the code below and compare your plots to the reference plots.
These plots use the following parameter values:
n = 200
par(mfrow = c(2, 2), mar = c(1, 1, 1, 1))
for (i in 1:4)
{
x = runif(n = n)
plot(
x,
linear_simulator(x, y_int = 1, slope = 4.5, st_dev = 0.1),
main = "", xlab = "x", ylab = "y",
pch = 16, col = rgb(0, 0.2, 0, 0.2),
axes = FALSE)
box()
}
Now compare your implementation to the reference using different slope, intercept, and standard deviation parameters:
n = 400
par(mfrow = c(2, 2), mar = c(1, 1, 1, 1))
for (i in 1:4)
{
x = runif(n = n)
plot(
x, linear_simulator(x, y_int = 10, slope = -6.5, st_dev = 1.1),
main = "", xlab = "x", ylab = "y",
pch = 16, col = rgb(0, 0.2, 0, 0.2),
axes = FALSE)
box()
}
Your generated data won’t be exactly the same as mine, but your plots should look similar to the reference plots.
To simulate new data, you need to retrieve the intercept, slope, and
standard deviation from the model you fit to the data. You can use
coefficients()
to extract the intercept and slope values.
Use str()
to examine the output of
coefficients()
to
fit_1_coefs = coefficients(fit_1)
str(fit_1_coefs)
## Named num [1:2] 0.0991 0.00584
## - attr(*, "names")= chr [1:2] "(Intercept)" "ls"
Retrieving the standard deviation parameter is slightly more difficult:
summary()
.sigma
. You can extract it using the dollar sign.fit_1_summary = summary(fit_1)
str(fit_1_summary)
fit_1_summary$sigma
## List of 11
## $ call : language lm(formula = BRCR ~ ls, data = birdhab)
## $ terms :Classes 'terms', 'formula' language BRCR ~ ls
## .. ..- attr(*, "variables")= language list(BRCR, ls)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "BRCR" "ls"
## .. .. .. ..$ : chr "ls"
## .. ..- attr(*, "term.labels")= chr "ls"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(BRCR, ls)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "BRCR" "ls"
## $ residuals : Named num [1:30] -0.0747 -0.1297 0.0256 0.1255 0.1009 ...
## ..- attr(*, "names")= chr [1:30] "1" "2" "3" "4" ...
## $ coefficients : num [1:2, 1:4] 0.099104 0.00584 0.050764 0.000871 1.952266 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "ls"
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:2] FALSE FALSE
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "ls"
## $ sigma : num 0.141
## $ df : int [1:3] 2 28 2
## $ r.squared : num 0.616
## $ adj.r.squared: num 0.602
## $ fstatistic : Named num [1:3] 45 1 28
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:2, 1:2] 0.129129 -0.001909 -0.001909 0.000038
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "ls"
## .. ..$ : chr [1:2] "(Intercept)" "ls"
## - attr(*, "class")= chr "summary.lm"
## [1] 0.1412668
Store the intercept, slope, and standard deviation parameters from the model into the following variables:
int_obs
slope_obs
sd_obs
The first goal of the simulation is to generate some new “data” using our model.
We have several options:
Any of the approaches are legitimate, so for our purpose, let’s keep the original values of x and let brown creeper abundance vary among simulations.
Now that you know the model parameter values and have a strategy for choosing x-values, you can create some sampled Brown Creeper data!
plot(
x = birdhab$ls,
y = linear_simulator(
x = birdhab$ls,
y_int = int_obs,
slope = slope_obs,
st_dev = sd_obs
),
main = "Simulated Data",
xlab = "late-successional forest",
ylab = "Brown Creeper Abundance")
Alternatively, you could plot the observed data first, then add the simulated data to the existing plot:
plot(
birdhab$ls, birdhab$BRCR,
xlab = "late-successional forest extent",
ylab = "Brown Creeper abundance",
pch = 19)
points(
x = birdhab$ls,
y = linear_simulator(
x = birdhab$ls,
y_int = int_obs,
slope = slope_obs,
st_dev = sd_obs
),
col = adjustcolor("red", alpha = 0.3),
pch = 16)
legend(
"topleft",
legend = c("data", "simulation"),
pch = 16,
col = c(1, adjustcolor("red", alpha = 0.3)))
You might want to run the model a few more times to see how variable the results are.
After running the model a few times, do you notice any potential problems with the model?
In other words, does the model reproduce the patterns in the original data perfectly or are there issues with the spread of values or with the generation of illogical values?
One problem with the use of the normal distribution is that it is unbounded on the lower limit.
Thus, negative values are possible.
In this case, because the y-intercept is close to 0, the simulation is likely to produce negative values occasionally when x = 0.
Since brown creeper abundance cannot be negative, this is an undesirable behavior of the model.
linear_simulator()
to correct the
illogical values.
Power analysis in the narrowest sense means figuring out the (frequentist) statistical power, the probably of correctly rejecting the null hypothesis when it is false.
While we are generally less concerned with power analysis in the conventional sense of hypothesis testing, we are very interested in the role of power analysis in addressing a much broader question:
For any real experiment or observation situation, we don’t know what is really going on (the “true” model or parameters), so we don’t have the information required to answer these questions from the data alone.
Historically, questions about statistical power could only be answered by sophisticated analyses, and only for standard statistical models and experimental designs such as one-way ANOVA or linear regression.
Increases in computing power have extended power analysis to many new areas, and R’s capability to run repeated stochastic simulations is a great help.
Here, we will illustrate the use of stochastic simulation for power analysis using the linear regression model above.
Let’s start by finding out whether we can reject the null hypothesis in a single experiment.
To do this we can use the following procedure:
y_sim = linear_simulator(
x = birdhab$ls,
y_int = int_obs,
slope = slope_obs,
st_dev = sd_obs
)
fit_sim = lm(y_sim ~ birdhab$ls)
summary(fit_sim)
##
## Call:
## lm(formula = y_sim ~ birdhab$ls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.245329 -0.082111 0.002297 0.075614 0.263888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0897077 0.0463911 1.934 0.0633 .
## birdhab$ls 0.0057097 0.0007961 7.172 8.33e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1291 on 28 degrees of freedom
## Multiple R-squared: 0.6475, Adjusted R-squared: 0.635
## F-statistic: 51.44 on 1 and 28 DF, p-value: 8.33e-08
Extracting p-values from R analyses can be tricky.
In this case, the coefficients of the summary() of the linear fit are a matrix including the standard error, t statistic, and p-value for each parameter. We can use matrix indexing to pull out the specific value of interest.
Here’s the matrix of coefficients:
sum_1 = summary(fit_sim)
sum_1$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.089707726 0.046391146 1.933725 6.331403e-02
## birdhab$ls 0.005709735 0.000796068 7.172422 8.330308e-08
sum_1$coefficients[2, 4]
## [1] 8.330308e-08
To estimate the probability of successfully rejecting the null hypothesis when it is false (the power), we have to repeat this procedure many times and calculate the proportion of the time that we reject the null hypothesis.
First, we specify the number of simulations to run and set up a vector to hold the p-value for each simulation.
Then, we repeat what we did above saving the p-values to the storage vector.
We can calculate the statistical power as the number of times we correctly rejected the null hypothesis divided by the total number of simulations.
Here’s one possible implementation:
n_sims = 1000
p_vals = numeric(n_sims)
alpha = 0.05
for(i in 1:n_sims)
{
y_sim = linear_simulator(
x = birdhab$ls,
y_int = int_obs,
slope = slope_obs,
st_dev = sd_obs
)
fit_sim = lm(y_sim ~ birdhab$ls)
p_vals[i] = summary(fit_sim)$coefficients[2, 'Pr(>|t|)']
}
sum(p_vals < alpha) / n_sims
## [1] 1
Based on the simulation output:
This is the rate at which our chosen statistical model, a simple linear regression, can detect a slope of roughly b = 0.006 with a sample size of N = 30.
We’re going to be building a lot of models, so let’s write a quick function to save some time and make our simulation loops a little simpler:
linear_sim_fit = function(x, slope, y_int, st_dev)
{
y_sim = linear_simulator(
x = x,
y_int = y_int,
slope = slope,
st_dev = st_dev
)
fit_sim = lm(y_sim ~ x)
return(fit_sim)
}
Since we don’t know the true population values, we often we want to estimate the statistical power over a range of model parameter values (or even different models).
We can repeat the simulation process multiple times while varying some parameters of the simulation such as the model slope or sample size.
This example simulation estimates statistical power as a function of the slope (the effect size).
n_sims
to speed it up
if needed.alpha = 0.05
n_sims = 1000
p_vals = numeric(n_sims)
n_effect_sizes = 20
effect_sizes_1 = seq(-.01, .01, length.out = n_effect_sizes)
effect_size_powers = numeric(n_effect_sizes)
for(j in 1:n_effect_sizes)
{
for(i in 1:n_sims)
{
fit_sim = linear_sim_fit(
x = birdhab$ls,
y_int = int_obs,
slope = effect_sizes_1[j],
st_dev = sd_obs
)
p_vals[i] = summary(fit_sim)$coefficients[2, 'Pr(>|t|)']
}
effect_size_powers[j] = sum(p_vals < alpha) / n_sims
}
sim_effect_size =
data.frame(
effect_size = effect_sizes_1,
power = effect_size_powers)
Note that this code is just an elaboration of the code we used to perform the simulation above:
effect_size_powers
.data.frame
with columns for
effect size and statistical power.We can plot the result and add a vertical line to show the slope of our original data set.
plot(
power ~ effect_size, data = sim_effect_size,
type = 'l', xlab = 'Effect size', ylab = 'Power')
abline(v = slope_obs, lty = 2, col = 'red')
What is the power for our observed effect size of 0.0058?
We can do the same thing for a gradient in sample sizes, as follows
alpha = 0.05
n_sims = 1000
p_vals = numeric(n_sims)
sample_sizes = seq(5, 100)
sample_size_powers = numeric(length(sample_sizes))
# The maximum x value in the simulation.
# Use the maximum observed x-value in the data
max_x = max(birdhab$ls)
for(j in 1:length(sample_sizes))
{
# A sequence of equally-spaced x-values:
x_vals = seq(0, max_x, length.out = sample_sizes[j])
for(i in 1:n_sims)
{
fit_sim = linear_sim_fit(
x = x_vals,
y_int = int_obs,
slope = slope_obs,
st_dev = sd_obs
)
p_vals[i] = summary(fit_sim)$coefficients[2, 'Pr(>|t|)']
}
sample_size_powers[j] = sum(p_vals < alpha) / n_sims
}
sim_sample_size =
data.frame(
sample_size = sample_sizes,
power = sample_size_powers)
plot(
power ~ sample_size, data = sim_sample_size,
type = 'l', xlab = 'Sample size', ylab = 'Power')
abline(v = nrow(birdhab), lty = 2, col = 'red')
How much power is lost if we reduce the sample size from 30 to 20?
We could repeat this process for other parameters such as the error component of the model, but you get the idea.
While we can do these power analysis simulations for one parameter at a time, it might be more interesting to vary combinations of parameters, say of slope and sample size, using yet another loop, saving the results in a matrix, and using contour() or persp() to plot the results.
Try the following:
alpha = 0.01
n_sims = 50
p_vals = numeric(n_sims)
n_effect_sizes = 20
effect_sizes = seq(-.01, .01, length.out = n_effect_sizes)
# The maximum x value in the simulation.
# Use the maximum observed x-value in the data
max_x = max(birdhab$ls)
sample_sizes = seq(10, 50)
sim_output_2 = matrix(nrow = length(effect_sizes), ncol = length(sample_sizes))
for(k in 1:length(effect_sizes))
{
effect_size = effect_sizes[k]
for(j in 1:length(sample_sizes))
{
x_vals = seq(0, max_x, length.out = sample_sizes[j])
for(i in 1:n_sims)
{
fit_sim = linear_sim_fit(
x = x_vals,
y_int = int_obs,
slope = effect_size,
st_dev = sd_obs
)
p_vals[i] = summary(fit_sim)$coefficients[2, 'Pr(>|t|)']
}
sim_output_2[k, j] = sum(p_vals < alpha) / n_sims
}
print(paste0("computing effect size ", k," of ", length(effect_sizes)))
}
sim_n_effect_size =
list(
power = sim_output_2,
effect_size = effect_sizes,
sample_size = sample_sizes
)
Note, the only difference in this code is that we added a third outer loop and created a matrix to store the results, since we have a power result for each combination of slope and sample size.
Our output power data are now stored in a matrix (instead of a vector), so we need a way to visualize 2-dimensional gridded data, i.e. raster data.
I’ve packaged the output power matrix, the effect sizes, and the sample sizes into a list for future use.
image()
is a quick way to plot a matrix as if it were
raster data, that is to plot a grid in which the pixel color is
determined by the value of the matrix element.image(
sim_n_effect_size$power,
xlab = "Effect size",
ylab = "Sample Size",
axes = FALSE)
# add x-axis labels
axis(
1,
at = c(0, 0.5, 1),
labels = c(-.01, 0.0, .01))
# add y=axis labels
axis(
2,
at = c(0, 1),
labels = c(sample_sizes[1], tail(sample_sizes, 1)))
I used larger values for n_sims
,
n_effect_sizes
, and I used a sample range from 2 to
200.
Your plot will look different because you’re using smaller values.
NOTE: your plots may look rougher or more pixellated than the figures in the lab walkthrough.
I suggest you set n_sims
and n_effect_sizes
to small values as you are experimenting with the code. You can set them
to higher values later when you are confident your code works.
You’ll want to make sure you save your simulation output to a file!
Contour plots are similar to topographic maps.
The [iso]lines show interpolated lines at which the value is the same.
Let’s plot the result using the contour()
function:
contour()
expects arguments for x, y, and z.Use contour()
to re-create the following figure:
contour(
x = sim_n_effect_size$effect_size,
y = sim_n_effect_size$sample_size,
z = sim_n_effect_size$power,
xlab = "effect size",
ylab = "sample size",
main = "Contour Plot of Statistical Power",
levels = seq(0, 1, length.out = 9),
drawlabels = TRUE,
# method = "simple")
method = "edge")
There are several functions and packages for creating 3D plots in R.
Perspective plots can plot a 3D surface.
Let’s try a static perspective plot using the persp() function, as follows:
persp(
x = sim_n_effect_size$effect_size,
y = sim_n_effect_size$sample_size,
z = sim_n_effect_size$power,
xlab = "beta", ylab = "n", zlab = "power",
col = 'lightblue',
theta = 30, phi = 30, expand = .75,
ticktype = 'detailed')
The r package rgl
allows you to create similar plots
that are interactive.
rgl
now so that you can use the
persp3d()
function.You can use the same syntax as for persp()
to create a
basic interactive plot.
Try it out and see what you get!
rgl
also has a function for saving an interactive 3D
plot to a html file.
For example, I used the following code to save the linked 3D plot above:
require(htmlwidgets)
saveWidget(
rglwidget(),
file = here(
"docs", "webGL",
"n_effect_size_power_sim_plot.html"),
selfcontained = TRUE
)
You may have noticed that some of the simulations take a long time to run!
When you do a simulation study, it is often convenient to save your simulation results so that you don’t have to re-run your simulation every time.
You can save R objects to files using save()
.
You can save any R object to a .Rdata file. These are binary files that can only be read by R.
If you have data stored in a data.frame
or
matrix
, you can write to a text file. Comma-separated value
(CSV) files are commonly used.
An advantage of text files is that you can open them in other applications.
save(
sim_n_effect_size,
file = here::here("data", "lab_11_n_effect_sizes.Rdata"))
R saved it to a binary file with the extension “.Rdata”.
When I want to load the data again I can use:
load(file = here::here("data", "lab_11_n_effect_sizes.Rdata")))
For the lab exercises, you should save your simulation results to RData files so that you can use them again later without having to re-run the simulations.
n_sim
variable). I suggest you use small values as you build your code.When your simulations are working as expected, you should increase the number of simulations to something large, 1000 or greater.
You can stage your simulations to run and save the results when you plan to be away from your computer for a while. Simulation results using lager numbers of iterations will make much smoother curves.
What does the power surface reveal about the relationship between slope and sample size?
If you wanted say a power of > 0.8 to detect a slope of b = .002, how large would your sample size need to be?
As you can see, stochastic simulation is an extremely powerful tool for examining power, even in this simple linear regression example where canned approaches exist.
For more complex models, the coding is more complex, but the process is the same. The same basic tools learned in this example can be extended to more complex situations.
We varied the sample and effect sizes to examine statistical power, however we didn’t try varying the dispersion in the data.
After you complete the lab walkthrough, you will carry out additional power analysis on the data dispersion, i.e. the population standard deviation.
Review the effect size simulation above. You will carry out a similar simulation on the population standard deviation instead.
To do the simulation you’ll need to:
Here’s a template:
alpha = 0.05
n_sims = 100
p_vals = ...
# What was the observed standard deviation?
sd_obs
# specify the number of different standard deviation values to simulate:
n_sds = 20
pop_sds = seq(from = 0.01, to = 1.5, length.out = n_sds)
pop_sd_powers = numeric(...)
for(j in 1:length(pop_sds))
{
pop_sd_j = ...
for(i in 1:n_sims)
{
fit_sim = linear_sim_fit(...)
p_vals[i] = ...
}
pop_sd_powers[j] = ...
}
sim_output_dispersion = data.frame(
sd = ...,
power = ...)
# You should save your simulation results so you don't have to run it every time.
save(
sim_output_dispersion,
file = here::here("data", "lab_ll_dat_dispersion_sim.RData"))
# Line plot of standard deviation (x-axis) and statistical power (y-axis)
plot(...)
# Add a dotted vertical red line at the observed population standard deviation value.
abline(v = ...)
It looks like statistical power drops off pretty quickly as the population variability increases.
You’ll now modify your simulation of population standard deviation to include sample size. Review the simulation of both sample size and effect size above.
Here’s a template for the dispersion and sample size simulation:
alpha = 0.05
# Start with a small number
n_sims = 10
p_vals = ...
# What was the observed standard deviation?
sd_obs
# specify the number of different standard deviation values to simulate:
# Start with a small number
n_sds = 20
pop_sds = seq(from = 0.05, to = , length.out = n_sds)
# The maximum x value in the simulation.
# Use the maximum observed x-value in the data
max_x = max(birdhab$ls)
pop_sd_powers = numeric(...)
sample_sizes = seq(5, 100)
sim_output_3 = matrix(...)
for(k in 1:length(pop_sds))
{
pop_sd_k = pop_sds[k]
for(j in 1:length(sample_sizes))
{
x_vals = seq(0, max_x, length.out = sample_sizes[j])
for(i in 1:n_sims)
{
fit_sim = ...
p_vals[i] = ...
}
sim_output_3[k, j] = ...
}
print(paste0("Testing standard deviation ", k, " of ", n_sds))
}
image(sim_output_3)
sim_3_dat =
list(
power = sim_output_3,
sample_size = sample_sizes,
pop_sd = pop_sds)
# You should save your simulation results so you don't have to run it every time.
save(
sim_3_dat,
file = here::here("data", "lab_ll_sim_output_dispersion_n_1000.RData"))
You should try out some of the plots from the walkthrough.
Here’s an example of an interactive 3D perspective plot I created from my simulation data.
For the lab questions you’ll have to create plots using your dispersion simulation results.
Create a line plot of dispersion (x-axis) and statistical power (y-axis) with a dotted vertical red line at the observed population standard deviation value.
Create a contour plot of the sample size and population dispersion power simulation.
Create an interactive 3D perspective plot, using
persp3d()
in package rgl
, of the
sample size and population dispersion power
simulation.
You need to save this to an html file using saveWidget()
in package htmlwidgets
.
Check out my example in the lab walkthrough if you need a hint.
To save the 3D plot, first you’ll need to run the
persp3d()
function and the plot will open in a separate
window. Next, run saveWidge()
while the 3D plot
is open to save it to an html file.
Your surface must be a different color than my example here.