Marked Point Patterns

We’ll practice using marked point patterns in this activity.

California Ozone

We’ll use the Southern California Ozone data set.

Original data file from: https://stats.idre.ucla.edu/stat/r/faq/ozone.csv

Read the data from csv.

It’s not in a spatial file format - we’ll have to convert it.

require(sp)
require(here)
ozone = ozone_df = read.csv(
  here("data", "ozone.csv"))

# What is the type

class(ozone)
## [1] "data.frame"
# Examine the data:
head(ozone)
##   Station   Av8top      Lat       Lon
## 1      60 7.225806 34.13583 -117.9236
## 2      69 5.899194 34.17611 -118.3153
## 3      72 4.052885 33.82361 -118.1875
## 4      74 7.181452 34.19944 -118.5347
## 5      75 6.076613 34.06694 -117.7514
## 6      84 3.157258 33.92917 -118.2097

We can use the coordinates() function to create a SpatialPointsDataFrame from a regular data.frame:

# A shortcut to way to convert a data frame to a 
# SpatialPointsDataFrame
coordinates(ozone) = ~ Lon + Lat

Note the unusual use of side effects in this syntax. Unlike most R functions, which might return a modified version of the input, coordinates() actually modifies its argument.

To prove this to yourself, we can check that ozone is no loger a data.frame:

class(ozone)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
class(ozone_df)
## [1] "data.frame"

When we looked at the data above, we should have realized that the coordinates look like degrees, but we don’t know which datum…

Most GPS units use the WGS84 datum, so we’ll assume that’s the right one (a potentially bad assumption)

proj4string(ozone) = 
  "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"

Let’s plot the point locations!

plot(ozone)

Create PPP

Now we need to convert the data to spatstat objects:

spatstat doesn’t work with unprojected coordinates, so we’ll transform into a PCS that works well with California.

California is pretty big, so it’s hard to find a single PCS that works with the entire state.

Although, since it’s tall and skinny, perhaps a transverse mercator would be a good choce… Unfortunately UTM splits the state into two zones.

The state officially has multiple state plane projections.
Here is a website that explains the system:

https://www.conservation.ca.gov/cgs/Pages/Program-RGMP/california-state-plane-coordinate-system.aspx

We’ll use the one for zone 7 (LA county).

I found the proj4string here:

https://epsg.io/26799

la_proj4 = "+proj=lcc +lat_1=34.41666666666666 +lat_2=33.86666666666667 +lat_0=34.13333333333333 +lon_0=-118.3333333333333 +x_0=1276106.450596901 +y_0=1268253.006858014 +datum=NAD27 +units=us-ft +no_defs "
ozone = spTransform(ozone, la_proj4)
require(spatstat)
require(raster)

# Create point pattern and observation window:
ozone_pp = as.ppp(ozone)
ozone_win = as.owin(ozone_ppp)

This creates a marked point pattern since our data had attributes. When we plot, we’ll see the attribute values plotted as relative sizes of points (circles). Note: the pattern may be easier to see in the IDW section below.

plot(ozone_pp, main = "LA Ozone Point Pattern")

Are the station marks meaningful?

Apply the L function

Note the weird syntax for the plotting: .-r~r.

ozone_l = Lest(ozone_pp, ozone_win)
plot(ozone_l,  .-r~r )

What do you notice about the pattern? Does it make sense in the context of the plot of point locations?

L-function significance

The plot by itself doesn’t tell us much about significance, so we can create a significance envelope with the envelope() function:

ozone_env = envelope(ozone_pp, fun = Lest, nsim = 1000)
## Generating 1000 simulations of CSR  ...
## 1, 2, 3, ......10.........20.........30.........40.........50.........60.........70......
## ...80.........90.........100.........110.........120.........130.........140.........150..
## .......160.........170.........180.........190.........200.........210.........220........
## .230.........240.........250.........260.........270.........280.........290.........300....
## .....310.........320.........330.........340.........350.........360.........370.........380
## .........390.........400.........410.........420.........430.........440.........450......
## ...460.........470.........480.........490.........500.........510.........520.........530..
## .......540.........550.........560.........570.........580.........590.........600........
## .610.........620.........630.........640.........650.........660.........670.........680....
## .....690.........700.........710.........720.........730.........740.........750.........760
## .........770.........780.........790.........800.........810.........820.........830......
## ...840.........850.........860.........870.........880.........890.........900.........910..
## .......920.........930.........940.........950.........960.........970.........980........
## .990......... 1000.
## 
## Done.
plot(ozone_env, .-r~r)

The gray blob is a 95% simulation envelope. Based on the plot, is there evidence for clustering or overdispersion with regard to the air quality recording stations?

Marked Points

Since the station mark isn’t very meaningful (it’s just a numeric code for the station ID) we need to remove it. We’ll just replace the marks with the ozone column from the data frame:

marks(ozone_pp) = ozone_df$Av8top
marks(ozone_pp)
##  [1]  7.225806  5.899194  4.052885  7.181452  6.076613  3.157258  5.201613  4.717742
##  [9]  6.532258  7.540323  4.891129  4.149194  7.266129  4.100806  4.737903  4.600806
## [17]  4.358871  8.241935  8.794355  8.899194  7.161290  8.133065  7.858871  9.762097
## [25]  8.149194 11.649194  8.604839  9.947581 10.274194  8.524194  4.447917  8.241935

Now we can apply the mark correlation function:

ozone_mcf = markcorr(ozone_pp)
plot(ozone_mcf)

The mark correlation function asks whether similar values of the marks tend to occur nearby in space. Values above the expected indicate clustering, values below indicate overdispersion.

MCF significance

Again, we can use the envelope() function to create a simulation confidence envelope. I’ll let you figure out the code (it’s very similar to that for the L function).

Note that we might have some weird issues because the points are not distributed throughout the square observation window.

I suggest using not using edge correction (check the help entry for envelope())

Plot it!

plot(ozone_mcf_env)

What do you conclude about clustering of values of ozone?

  • Does this result surprise you?
  • Do you think we can trust this result given the uneven spatial distribution of observation points in the region?

We’ll contrast these results with correlograms and variograms later on.

Inverse Distance Weighting

Now let’s try to determine the values of ozone pollution throughout the LA county.

Recall from the slides that we’ll need a template raster:

nrows = 50; ncols = 100
ozoneTemplateRaster = 
  raster(
    nrow = nrows,
    ncol = ncols,
    ext = extent(ozone))
ozoneTemplateRaster[, ] = 0
ozoneGrid = as(
  ozoneTemplateRaster, 
  'SpatialGridDataFrame')

The IDW function will grumble about mismatched crs, so we can use the following trick since we know they are the same. The issue has to do with comments in the proj4string (that don’t actually affect the PCS parameters)

crs(ozoneTemplateRaster) = crs(ozone)
crs(ozoneGrid) = crs(ozone)

Now we’re ready for IDW!

We’ll use the implementation in gstat package.

require(gstat)
ozoneIDW0 = gstat::idw(
  formula = ozone$Av8top ~ 1, 
  locations = ozone, newdata = ozoneGrid, idp = 0.5)
## [inverse distance weighted interpolation]

Plot it!

plot(ozoneIDW0)
points(ozone)

Now experiment with different weights (the idp parameter) to get a feel for what different weight amounts do.