resample

This code is part of a working paper focused on resampling strategies for categorical data. The premise is that the defacto industry standards of majority rule and nearest neighbor give rise to competing objectives with respect to preserving map structure and map level class proportions.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mikejohnson51/resample")

Example

To define a minimal example:

Lets load our spatial/visualization packages,
Read in the 30 meter resolution, 2016 CONUS National Land Cover Dataset
Define an area of interest (AOI) around Colorado Springs, and re-project
and crop the NLCD to this boundary

library(resample)

library(raster)
library(sf)
library(ggplot2)

# Start with a native high resolution dataset
nlcd = brick(nlcd_conus_path)

# Define an Area of Interest and transform to NLCD CRS
AOI = AOI::aoi_get(list("Colorado Springs", 21, 21), km = TRUE) %>% 
  st_transform(nlcd@crs)

input  = crop(nlcd, AOI, snap = "out")

Three resampling techniques are highlighted which include nearest neighbor, majority rule and a new method that seeks to preserve both areal proportions and map structure. Nearest neighbor and majority rule utilize GDAL near and mode resampling.

For each method, an input raster and desired cell size must be provided. This is the same principle as the -tr flag found in GDAL utilities. Cell size units are given with respect to the input CRS. In our example this is meters (see below).

st_crs(input)$units
#> [1] "m"

(1) Nearest Neighbor

## Nearest Neighbor
system.time({
  n = resampleData(input, cellsize = 1000, method = 'nn')
})
#>    user  system elapsed 
#>   0.271   0.053   0.335

(2) Majority Rule

## Majority Rule
system.time({
  m = resampleData(input, cellsize = 1000,  method = 'maj')
})
#>    user  system elapsed 
#>   0.778   0.041   0.837

(3) Area Preservation

## Areal Proportion
system.time({
  a = resampleData(input, cellsize = 1000,  method = 'area')
})
#>    user  system elapsed 
#>   0.428   0.053   0.544

Visualize

Here is a quick view of the maps produced by each method:

b = brick(n,m,a) %>% 
  resample(input, method = "ngb") %>% 
  addLayer(input) %>% 
  setNames(c("Nearest Neighbor", "Majority Rule", "Areal", "NLCD 30m"))  

plot(b[[c(4,3,2,1)]], 
     breaks = col_lu$nlcd.code, 
     col = col_lu$color, 
     legend = F)

Disagreement across methods

Overall, we can see that in ~46% of the cells, the three methods assigned different categories:

m1 <- cbind(values(a),values(n),values(m))

sum(rowSums(m1==m1[,1])==ncol(m1)) / ncell(a)
#> [1] 0.463138

Looking class by class, it is evdident that not all classes are treated “equally”:

freq(brick(n,a,m), merge = TRUE) %>% 
  setNames(c("lc", "Nearest Neighbor", "Area Preserving", "Majority Rule"))%>% 
  tidyr::pivot_longer(-lc) %>% 
  ggplot(aes(y = value, x = as.factor(lc), fill = name)) + 
  geom_bar(width=0.5, stat='identity', position = position_dodge(width=0.5)) + 
  theme_bw() + 
  theme(legend.position  = "bottom") + 
  labs(y = "Cells", x = "Landcover Class", fill = "Resampling\nMethod") + 
  ggpubr::fill_palette("aaas")

The key take away from this research is that majority rule tends to overpredict majority classes at the expense of minority classes leading to maps that lose diversity. Nearest neighbor does a better job at preserving map diversity but at the expense of map structure. An area preserving method seeks to achieve both through an integration of global and zonal statistics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

resample

Installation

Example

(1) Nearest Neighbor

(2) Majority Rule

(3) Area Preservation

Visualize

Disagreement across methods

Files

README.md

Latest commit

History

README.md

File metadata and controls

resample

Installation

Example

(1) Nearest Neighbor

(2) Majority Rule

(3) Area Preservation

Visualize

Disagreement across methods