Title: | Spatial Parallel Computing by Hierarchical Data Partitioning |
Version: | 0.9.4 |
Description: | Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on 'future' (Bengtsson, 2024 <doi:10.32614/CRAN.package.future>) and 'mirai' (Gao et al., 2025 <doi:10.32614/CRAN.package.mirai>) parallel back-ends, 'terra' (Hijmans et al., 2025 <doi:10.32614/CRAN.package.terra>) and 'sf' (Pebesma et al., 2024 <doi:10.32614/CRAN.package.sf>) functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from par_pad_*() functions to par_grid(), par_hierarchy(), or par_multirasters() functions. Virtually any functions accepting classes in 'terra' or 'sf' packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function extract_at(), which uses 'exactextractr' (Baston, 2023 <doi:10.32614/CRAN.package.exactextractr>), with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (summarize_aw()) and summation of exponentially decaying weights (summarize_sedc()) are also provided. |
License: | MIT + file LICENSE |
URL: | https://docs.ropensci.org/chopin/, https://github.com/ropensci/chopin |
BugReports: | https://github.com/ropensci/chopin/issues |
Depends: | R (≥ 4.1) |
SystemRequirements: | netcdf |
Encoding: | UTF-8 |
LazyData: | true |
LazyDataCompression: | xz |
RoxygenNote: | 7.3.2 |
Imports: | anticlust, cli, dplyr (≥ 1.1.0), exactextractr (≥ 0.8.2), future, future.apply, igraph, methods, rlang, sf (≥ 1.0-10), stars (≥ 0.6-0), terra (≥ 1.7-18), mirai (≥ 1.3.0), collapse |
Suggests: | covr, devtools, targets, DiagrammeR, future.mirai, knitr, lifecycle, rmarkdown, spatstat.random, testthat (≥ 3.0.0), units, withr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-07-15 05:07:03 UTC; isong |
Author: | Insang Song |
Maintainer: | Insang Song <geoissong@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-18 14:40:12 UTC |
Computation of spatial data by hierarchical and objective partitioning of inputs for parallel processing
Description
The chopin
package provides a set of functions to compute on divided
geospatial data.
Basic functionalities
Distribute
terra
,sf
, andchopin
functions to parallel workers set byfuture
ormirai
Set parallelization strategies based on artificial grids, equal-size clusters, hierarchy, and multiple raster files
Convenience functions for raster-vector overlay and weighted summary from vector dataset
chopin
workflow
The simplest way of parallelizing generic geospatial computation is to start from
par_pad_*
functions topar_grid
, or runningpar_hierarchy
, orpar_multirasters
functions at once.
library(chopin) library(terra) library(sf) library(collapse) library(dplyr) library(future) library(future.mirai) library(future.apply) lastpar <- par(mfrow = c(1, 1)) options(sf_use_s2 = FALSE)
Example data
North Carolinian counties and a raster of elevation data are used as example data.
temp_dir <- tempdir(check = TRUE) url_nccnty <- paste0( "https://raw.githubusercontent.com/", "ropensci/chopin/refs/heads/main/", "tests/testdata/nc_hierarchy.gpkg" ) url_ncelev <- paste0( "https://raw.githubusercontent.com/", "ropensci/chopin/refs/heads/main/", "tests/testdata/nc_srtm15_otm.tif" ) nccnty_path <- file.path(temp_dir, "nc_hierarchy.gpkg") ncelev_path <- file.path(temp_dir, "nc_srtm15_otm.tif") # download data download.file(url_nccnty, nccnty_path, quiet = TRUE) download.file(url_ncelev, ncelev_path, quiet = TRUE) nccnty <- terra::vect(nccnty_path) ncelev <- terra::rast(ncelev_path)
To demonstrate chopin functions, we generate 10,000 random points in North Carolina
ncsamp <- terra::spatSample( nccnty, 1e4L ) ncsamp$pid <- 1:nrow(ncsamp)
Creating grids
The example below will generate a regular grid from the random point data.
ncgrid <- par_pad_grid(ncsamp, mode = "grid", nx = 4L, ny = 2L, padding = 10000) plot(ncgrid$original)
Extracting values from raster
Since all
par_*
functions operate onfuture
backends, users should define the future plan before running the functions.multicore
plan supportsterra
objects which may lead to faster computation, but it is not supported in Windows. An alternative isfuture.mirai
'smirai_multisession
plan, which is supported in many platforms and generally faster than plain future multisession plan.-
workers
argument should be defined with an integer value to specify the number of threads to be used.
future::plan(future.mirai::mirai_multisession, workers = 2L)
Then we dispatch multiple
extract_at
runs on the grid polygons.Before we proceed, the terra object should be converted to sf object.
pg <- par_grid( grids = ncgrid, pad_y = FALSE, .debug = TRUE, fun_dist = extract_at, x = ncelev_path, y = sf::st_as_sf(ncsamp), id = "pid", radius = 1e4, func = "mean" )
Hierarchical processing
Here we demonstrate hierarchical processing of the random points using census tract polygons.
nccnty <- sf::st_read(nccnty_path, layer = "county") nctrct <- sf::st_read(nccnty_path, layer = "tracts")
The example below will parallelize summarizing mean elevation at 10 kilometers circular buffers of random sample points by the first five characters of census tract unique identifiers, which are county codes.
This example demonstrates the hierarchy can be defined from any given polygons if the unique identifiers are suitably formatted for defining the hierarchy.
px <- par_hierarchy( # from here the par_hierarchy-specific arguments regions = nctrct, regions_id = "GEOID", length_left = 5, pad = 10000, pad_y = FALSE, .debug = TRUE, # from here are the dispatched function definition # for parallel workers fun_dist = extract_at, # below should follow the arguments of the dispatched function x = ncelev, y = sf::st_as_sf(ncsamp), id = "pid", radius = 1e4, func = "mean" )
Multiraster processing
Here we demonstrate multiraster processing of the random points using multiple rasters.
ncelev <- terra::rast(ncelev_path) tdir <- tempdir(check = TRUE) terra::writeRaster(ncelev, file.path(tdir, "test1.tif"), overwrite = TRUE) terra::writeRaster(ncelev, file.path(tdir, "test2.tif"), overwrite = TRUE) terra::writeRaster(ncelev, file.path(tdir, "test3.tif"), overwrite = TRUE) terra::writeRaster(ncelev, file.path(tdir, "test4.tif"), overwrite = TRUE) terra::writeRaster(ncelev, file.path(tdir, "test5.tif"), overwrite = TRUE) rasts <- list.files(tdir, pattern = "tif$", full.names = TRUE) pm <- par_multirasters( filenames = rasts, fun_dist = extract_at, x = NA, y = sf::st_as_sf(ncsamp)[1:500, ], id = "pid", radius = 1e4, func = "mean", .debug = TRUE ) par(lastpar)
Function selection guide for par_*()
We provide two flowcharts to help users choose the right function for parallel processing. The raster-oriented flowchart is for users who want to start with raster data, and the vector-oriented flowchart is for users with large vector data.
In raster-oriented selection, we suggest four factors to consider:
Number of raster files: for multiple files,
par_multirasters
is recommended. When there are multiple rasters that share the same extent and resolution, consider stacking the rasters into multilayer SpatRaster object by callingterra::rast(filenames)
.Raster resolution: We suggest 100 meters as a threshold. Rasters with resolution coarser than 100 meters and a few layers would be better for the direct call of
exactextractr::exact_extract()
.Raster extent: Using
SpatRaster
inexactextractr::exact_extract()
is often minimally affected by the raster extent.Memory size:
max_cells_in_memory
argument value ofexactextractr::exact_extract()
, raster resolution, and the number of layers inSpatRaster
are multiplicatively related to the memory usage.
For vector-oriented selection, we suggest three factors to consider:
Number of features: When the number of features is over 100,000, consider using
par_grid
orpar_hierarchy
to split the data into smaller chunks.Hierarchical structure: If the data has a hierarchical structure, consider using
par_hierarchy
to parallelize the operation.Data grouping: If the data needs to be grouped in similar sizes, consider using
par_pad_balanced
orpar_pad_grid
withmode = "grid_quantile"
.
Caveats
Why parallelization is slower than the ordinary function run? Parallelization may underperform when the datasets are too small to take advantage of divide-and-compute approach, where parallelization overhead is involved. Overhead here refers to the required amount of computational resources for transferring objects to multiple processes. Since the demonstrations above use quite small datasets, the advantage of parallelization was not as noticeable as it was expected. Should a large amount of data (spatial/temporal resolution or number of files, for example) be processed, users could find the efficiency of this package. A vignette in this package demonstrates use cases extracting various climate/weather datasets.
Notes on data restrictions
chopin
works best with two-dimensional (planar) geometries.
Users should disable s2
spherical geometry mode in sf
by setting
sf::sf_use_s2(FALSE)
.
Running any chopin
functions at spherical or three-dimensional
(e.g., including M/Z dimensions) geometries
may produce incorrect or unexpected results.
Author(s)
Maintainer: Insang Song geoissong@gmail.com (ORCID)
Authors:
Kyle Messier (ORCID) [contributor]
Other contributors:
Alec L. Robitaille (Alec reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>) [reviewer]
Eric R. Scott (Eric reviewed the package version 0.6.3 for rOpenSci, see <https://github.com/ropensci/software-review/issues/638>) [reviewer]
See Also
Useful links:
Report bugs at https://github.com/ropensci/chopin/issues
Check the class of an input object
Description
This function checks the class of an input object and returns "raster" if it is a raster object, or "vector" if it is a vector object.
Usage
.check_character(input)
Arguments
input |
The input object to be checked |
Value
A character string indicating the class of the input object ("raster" or "vector")
Check Raster Input
Description
This function checks the input object to ensure it is a valid raster object or a character path to a raster file. It also provides warnings and informative messages based on the input type.
Usage
.check_raster(input, extent = NULL, ...)
Arguments
input |
The input object to be checked. It can be either a SpatRaster object or a character path to a raster file. |
extent |
The extent of the raster. Defaults to NULL.
Numeric vector should be put in order of
|
... |
Placeholder. |
Value
The validated input object.
Check the subject object and perform necessary conversions if needed.
Description
This function checks the class of the input object and performs necessary conversions if needed.
Usage
.check_vector(
input,
input_id = NULL,
extent = NULL,
out_class = character(1),
...
)
Arguments
input |
sf/SpatVector/character. The input object to be checked. |
input_id |
character(1). ID field of the subject object. |
extent |
numeric(4). The extent of the subject object.
Numeric vector should be put in order of
|
out_class |
character(1). The class of the output object.
Should be one of |
... |
Placeholder. |
Value
The checked and converted subject object.
Get intersection extent
Description
Get intersection extent
Usage
.intersect_extent(input, out_class, ...)
## S4 method for signature 'sf,ANY'
.intersect_extent(input, out_class = NULL, ...)
## S4 method for signature 'SpatExtent,ANY'
.intersect_extent(input, out_class = NULL, ...)
## S4 method for signature 'SpatVector,ANY'
.intersect_extent(input, out_class = NULL, ...)
## S4 method for signature 'numeric,character'
.intersect_extent(input, out_class = NULL, ...)
Arguments
input |
sf/SpatExtent/SpatVector/numeric |
out_class |
character(1). "sf" or "terra" |
... |
other arguments. Placeholder. |
Prescreen input data for parallelization
Description
This function takes input object and type character to ingest the input object to return the object in the desired class.
Usage
.par_screen(type, input, input_id = NULL, out_class = "terra", .window = NULL)
Arguments
type |
character(1). "raster" or "vector". |
input |
object. Input object. |
input_id |
character(1). Default is NULL. If NULL, the function will not check the object with an ID column. |
out_class |
character(1). Default is NULL, but should be one of
|
.window |
numeric(4)/SpatExtent/st_bbox object. Loading window. |
Return the input's GIS data model type
Description
This function returns one of 'vector' or 'raster' depending on the input class.
Usage
datamod(input)
Arguments
input |
Spat*/sf/stars object. |
Value
character(1). One of "vector"
or "raster"
.
Note
Although stars
object is a little ambiguous
whether to classify vector or raster,
it will be considered raster in this package.
Author(s)
Insang Song
See Also
Other Helper functions:
dep_check()
,
dep_switch()
,
get_clip_ext()
,
par_def_q()
,
reproject_std()
,
reproject_to_raster()
Return the package the input object is based on
Description
Detect whether the input object is sf or Spat* object.
Usage
dep_check(input)
Arguments
input |
Spat* in terra or sf object. |
Value
A character object; one of "character"
, "terra"
and "sf"
Author(s)
Insang Song
See Also
Other Helper functions:
datamod()
,
dep_switch()
,
get_clip_ext()
,
par_def_q()
,
reproject_std()
,
reproject_to_raster()
Switch spatial data class
Description
Convert class between sf
/stars
-terra
Usage
dep_switch(input)
Arguments
input |
Spat* in terra or sf object. |
Value
Data converted to the other package class (if sf, terra; if terra, sf)
Author(s)
Insang Song
See Also
Other Helper functions:
datamod()
,
dep_check()
,
get_clip_ext()
,
par_def_q()
,
reproject_std()
,
reproject_to_raster()
Extract raster values with point buffers or polygons
Description
Extract raster values with point buffers or polygons
Usage
extract_at(x, y, ...)
## S4 method for signature 'SpatRaster,sf'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
## S4 method for signature 'character,character'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
## S4 method for signature 'SpatRaster,character'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
## S4 method for signature 'SpatRaster,SpatVector'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
## S4 method for signature 'character,sf'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
## S4 method for signature 'character,SpatVector'
extract_at(
x = NULL,
y = NULL,
id = NULL,
func = "mean",
extent = NULL,
radius = NULL,
out_class = "sf",
kernel = NULL,
kernel_func = stats::weighted.mean,
bandwidth = NULL,
max_cells = 3e+07,
.standalone = TRUE,
...
)
Arguments
x |
|
y |
|
... |
Placeholder. |
id |
character(1). Unique identifier of each point. |
func |
function taking one numeric vector argument.
Default is |
extent |
numeric(4) or SpatExtent. Extent of clipping vector.
It only works with |
radius |
numeric(1). Buffer radius. |
out_class |
character(1). Output class. One of |
kernel |
character(1). Name of a kernel function
One of |
kernel_func |
function.
Kernel function to apply to the extracted values.
Default is |
bandwidth |
numeric(1). Kernel bandwidth. |
max_cells |
integer(1). Maximum number of cells in memory. |
.standalone |
logical(1). Default is |
Details
Inputs are preprocessed in different ways depending on the class.
Vector inputs in
y
:sf
is preferred, thus character andSpatVector
inputs will be converted tosf
object. Ifradius
is not NULL,sf::st_buffer
is used to generate circular buffers as subsequent raster-vector overlay is done withexactextractr::exact_extract
.Raster input in
x
:SpatRaster
is preferred. If the input is notSpatRaster
, it will be converted toSpatRaster
object.
Value
A data.frame object with summarized raster values with respect to the mode (polygon or buffer) and the function.
Author(s)
Insang Song geoissong@gmail.com
See Also
Other Macros for calculation:
kernelfunction()
,
summarize_aw()
,
summarize_sedc()
Examples
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
rastpath <- file.path(tempdir(), "test.tif")
nc <- terra::vect(ncpath)
nc <- terra::project(nc, "EPSG:5070")
rrast <- terra::rast(nc, nrow = 300, ncol = 660)
terra::values(rrast) <- rgamma(1.98e5, 4, 2)
rpnt <- terra::spatSample(rrast, 16L, as.points = TRUE)
rpnt$pid <- sprintf("ID-%02d", seq(1, 16))
extract_at(rrast, rpnt, "pid", "mean", radius = 1000)
extract_at(rrast, nc, "NAME", "mean")
extract_at(rrast, ncpath, "NAME", "mean")
# Using SpatRaster object
suppressWarnings(
extract_at(
rrast, ncpath, "NAME", "mean",
kernel = "epanechnikov",
bandwidth = 1e5
)
)
# Using raster path
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
suppressWarnings(
extract_at(
rastpath, ncpath, "NAME", "mean",
kernel = "epanechnikov",
bandwidth = 1e5
)
)
Setting the clipping extent
Description
Return clipping extent with buffer radius. It assumes the input CRS is projected and linear unit is meters.
Usage
get_clip_ext(pnts, radius, extrusion = 1.1)
Arguments
pnts |
One of sf or SpatVector object. Target points of computation. |
radius |
numeric(1). Buffer radius. It is assumed to be in meters |
extrusion |
numeric(1). The extent extrusion factor.
Default is 1.1, meaning that the actual padding is 10 percent
wider than |
Value
A terra::ext
or sfc_POLYGON object of the computation extent.
Author(s)
Insang Song
See Also
Other Helper functions:
datamod()
,
dep_check()
,
dep_switch()
,
par_def_q()
,
reproject_std()
,
reproject_to_raster()
Subset for nonidentical package class objects
Description
Subset for nonidentical package class objects
Intersect different data model objects
Usage
## S4 method for signature 'SpatVector,bbox,missing,ANY'
x[i, j]
## S4 method for signature 'SpatVector,sf,missing,ANY'
x[i, j]
## S4 method for signature 'SpatVector,sfc,missing,ANY'
x[i, j]
## S4 method for signature 'SpatVector,SpatExtent,missing,ANY'
x[i, j]
.intersect(x, y)
Arguments
x |
SpatVector/sf/SpatRaster object to be intersected. |
i |
Dataset used to subset x. |
j |
Column indices or names. |
y |
SpatVector/sf object. Intersecting object. |
Kernel functions
Description
Kernel functions
Usage
kernelfunction(
d,
bw,
kernel = c("uniform", "quartic", "triweight", "epanechnikov")
)
Arguments
d |
Distance |
bw |
Bandwidth of a kernel |
kernel |
Kernel type. One of
|
Value
numeric. Kernel weights.
References
https://github.com/JanCaha/SpatialKDE
See Also
Other Macros for calculation:
extract_at()
,
summarize_aw()
,
summarize_sedc()
Examples
v_dist <- c(1, 10, 100, 25, 50, 0.1)
bw_dist1 <- 1
bw_dist2 <- 10
kernelfunction(v_dist, bw_dist1, "uniform")
kernelfunction(v_dist, bw_dist1, "quartic")
kernelfunction(v_dist, bw_dist1, "triweight")
kernelfunction(v_dist, bw_dist1, "epanechnikov")
kernelfunction(v_dist, bw_dist2, "uniform")
kernelfunction(v_dist, bw_dist2, "quartic")
kernelfunction(v_dist, bw_dist2, "triweight")
kernelfunction(v_dist, bw_dist2, "epanechnikov")
Mildly clustered points in North Carolina, United States
Description
Mildly clustered points in North Carolina, United States
Usage
ncpoints
Format
A data frame with 2,304 rows and two variables:
- X
X coordinate
- Y
Y coordinate
Note
Coordinates are in EPSG:5070 (Conus Albers Equal Area)
Source
sf package data nc
See Also
Other Dataset:
prediction_grid
Examples
data("ncpoints", package = "chopin")
Map specified arguments to others in literals
Description
This function creates a new function that wraps an existing function,
remapping the argument names based on a user-specified literal mapping.
Specifically, arguments passed to the new function with
names on the left-hand side of the mapping are renamed to
the corresponding right-hand side names before being passed to
the original function. Users map two arguments without x
and/or y
to
standardize the argument names, x and y, to the target function.
This function is particularly useful to parallelize functions for spatial
data outside sf
and terra
packages that do not have arguments
named x and/or y. par_*
functions could detect such functions by
wrapping nonstandardized functions to parallelize the computation.
Usage
par_convert_f(fun, ...)
Arguments
fun |
A function to be wrapped. |
... |
A set of named arguments representing the mapping from
the new function's argument names (left-hand side) to the original
function's argument names (right-hand side).
For example, |
Value
A new function that accepts the remapped arguments and calls the original function.
Examples
# Define an original function that expects arguments 'group' and 'score'
original_fun <- function(group, score, home = FALSE) {
list(group = group, score = score, home = home)
}
# Create a new function that maps 'x' to 'group' and 'y' to 'score'
new_fun <- par_convert_f(original_fun, x = group, y = score)
# Call the new function using the new argument names
result <- new_fun(x = 10, y = 20)
print(result)
Partition coordinates into quantile polygons
Description
Partition coordinates into quantile polygons
Usage
par_cut_coords(x = NULL, y = NULL, quantiles)
Arguments
x |
numeric/sf/SpatVector. x-coordinates (if numeric). |
y |
numeric. y-coordinates. |
quantiles |
numeric vector. Quantiles. |
Value
A SpatVector
object with field CGRIDID
.
Note
This function is only for two-dimensional points.
See Also
Other Parallelization:
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Quantile definition
Description
Quantile definition
Usage
par_def_q(steps = 4L)
Arguments
steps |
integer(1). The number of quantiles. |
Value
numeric vector of quantiles.
See Also
Other Helper functions:
datamod()
,
dep_check()
,
dep_switch()
,
get_clip_ext()
,
reproject_std()
,
reproject_to_raster()
Parallelize spatial computation over the computational grids
Description
future::multicore, future::multisession, future::cluster
future.mirai::mirai_multisession in future::plan will parallelize
the work in each grid. For details of the terminology in future
package,
refer to future::plan
. This function assumes that
users have one raster file and a sizable and spatially distributed
target locations. Each thread will process
the nearest integer of $|N_g| / |N_t|$ grids
where $|N_g|$ denotes the number of grids and $|N_t|$ denotes
the number of threads.
Usage
par_grid(grids, fun_dist, ..., pad_y = FALSE, .debug = FALSE)
Arguments
grids |
List of two sf/SpatVector objects. Computational grids.
It takes a strict assumption that the grid input is
an output of |
fun_dist |
|
... |
Arguments passed to the argument |
pad_y |
logical(1). Whether to filter y with the padded grid.
Should be TRUE when x is where the values are calculated.
Default is |
.debug |
logical(1). Default is |
Value
a data.frame object with computation results.
For entries of the results, consult the documentation of the function put
in fun_dist
argument.
Note
In dynamic dots (...
), fun_dist
arguments should include
x and y where sf/terra class objects or file paths are accepted.
Virtually any sf/terra functions that accept two arguments
can be put in fun_dist
; however, be advised that
some spatial operations do not necessarily give the
exact result from what would have been done with one thread.
For example, distance calculated through this function may return the
lower value than actual because the computational region was reduced.
This would be the case especially where the target features
are spatially sparsely distributed.
Author(s)
Insang Song geoissong@gmail.com
See Also
future::multisession
, future::multicore
, future::cluster
,
future.mirai::mirai_multisession
, future::plan
, par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
plan(mirai_multisession, workers = 2)
ncpath <- system.file("shape/nc.shp", package = "sf")
ncpoly <- sf::st_read(ncpath)
ncpoly <- sf::st_transform(ncpoly, "EPSG:5070")
# sf object
ncpnts <- sf::st_sample(ncpoly, 2000)
ncpnts <- sf::st_as_sf(ncpnts)
ncpnts$pid <- seq_len(nrow(ncpnts))
# file path
rrast <- terra::rast(ncpoly, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
nccompreg <-
chopin::par_pad_grid(
input = ncpnts,
mode = "grid",
nx = 4L,
ny = 2L,
padding = 5e3L
)
res <-
par_grid(
grids = nccompreg,
fun_dist = extract_at,
x = rastpath,
y = ncpnts,
qsegs = 90L,
radius = 5e3L,
id = "pid"
)
future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)
Parallelize spatial computation over the computational grids
Description
mirai::daemons will set the parallel backend then mirai::mirai_map
will parallelize the work in each grid. For details of the terminology
in mirai
package, refer to mirai::mirai
. This function assumes that
users have one raster file and a sizable and spatially distributed
target locations. Each thread will process
the nearest integer of $|N_g| / |N_t|$ grids
where $|N_g|$ denotes the number of grids and $|N_t|$ denotes
the number of threads.
Usage
par_grid_mirai(grids, fun_dist, ..., pad_y = FALSE, .debug = TRUE)
Arguments
grids |
List of two sf/SpatVector objects. Computational grids.
It takes a strict assumption that the grid input is
an output of |
fun_dist |
|
... |
Arguments passed to the argument |
pad_y |
logical(1). Whether to filter y with the padded grid.
Should be TRUE when x is where the values are calculated.
Default is |
.debug |
logical(1). Default is |
Value
a data.frame object with computation results.
For entries of the results, consult the documentation of the function put
in fun_dist
argument.
Note
In dynamic dots (...
), fun_dist
arguments should include
x and y where sf/terra class objects or file paths are accepted.
Virtually any sf/terra functions that accept two arguments
can be put in fun_dist
; however be advised that
some spatial operations do not necessarily give the
exact result from what would have been done with one thread.
For example, distance calculated through this function may return the
lower value than actual because the computational region was reduced.
This would be the case especially where the target features
are spatially sparsely distributed.
Author(s)
Insang Song geoissong@gmail.com
See Also
mirai::daemons
, mirai::mirai_map
, par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
daemons(4, dispatcher = "process")
ncpath <- system.file("shape/nc.shp", package = "sf")
ncpoly <- sf::st_read(ncpath)
ncpoly <- sf::st_transform(ncpoly, "EPSG:5070")
# sf object
ncpnts <-
sf::st_sample(ncpoly, 2000)
ncpnts <- sf::st_as_sf(ncpnts)
ncpnts$pid <- seq_len(nrow(ncpnts))
# file path
rrast <- terra::rast(ncpoly, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
nccompreg <-
chopin::par_pad_grid(
input = ncpnts,
mode = "grid",
nx = 4L,
ny = 2L,
padding = 5e3L
)
res <-
par_grid_mirai(
grids = nccompreg,
fun_dist = extract_at,
x = rastpath,
y = ncpnts,
qsegs = 90L,
radius = 5e3L,
id = "pid"
)
mirai::daemons(0L)
par(lastpar)
Parallelize spatial computation by hierarchy in input data
Description
"Hierarchy" refers to a system,
which divides the entire study region into multiple subregions.
It is oftentimes reflected in an area code system
(e.g., FIPS for US Census geographies and
Nomenclature of Territorial Units for Statistics (NUTS), etc.).
future::multisession
, future::multicore
, future::cluster
,
future.mirai::mirai_multisession
in future::plan
will parallelize the work by splitting lower level features into
several higher level feature group.
For details of the terminology in future
package,
please refer to future::plan
documentation.
Each thread will process the number of lower level features
in each higher level feature. Be advised that
accessing the same file simultaneously with
multiple processes may result in errors.
Usage
par_hierarchy(
regions,
regions_id = NULL,
length_left = NULL,
pad = 0,
pad_y = FALSE,
fun_dist,
...,
.debug = FALSE
)
Arguments
regions |
|
regions_id |
character(1). Name of unique ID field in |
length_left |
integer(1). Length of the first characters of
the |
pad |
numeric(1). Padding distance for each subregion defined
by |
pad_y |
logical(1). Whether to filter y with the padded grid.
Should be TRUE when x is where the values are calculated.
Default is |
fun_dist |
|
... |
Arguments passed to the argument |
.debug |
logical(1). Default is |
Details
In dynamic dots (...
), fun_dist
arguments should include
x and y where sf/terra class objects or file paths are accepted.
Hierarchy is interpreted by the regions_id
argument first.
regions_id
is assumed to be a field name in the x
or y
argument
object. It is expected that regions
represents the higher level
boundaries and x
or y
in fun_dist
is the lower level boundaries.
However, if that is not the case, with trim
argument, the function
will generate the higher level codes from regions_id
by extracting
left-t
Whether x
or y
is searched is determined by pad_y
value.
pad_y = TRUE
will make the function attempt to find regions_id
in x
, whereas pad_y = FALSE
will look for regions_id
at
y
. If the regions_id
doesn't exist in x
or y
, the function
will utilize spatial relationship (intersects) to filter the data.
Note that dispatching computation by subregions based on the spatial
relationship may lead to a slight discrepancy in the result. For
example, if the higher and lower level features are not perfectly
aligned, there may be some features that are not included or duplicated
in the subregions. The function will alert the user if spatial relation-
ship is used to filter the data.
Value
a data.frame object with computation results.
For entries of the results, consult the function used in
fun_dist
argument.
Note
Virtually any sf/terra functions that accept two arguments
can be put in fun_dist
; however, be advised that
some spatial operations do not necessarily give the
exact result from what would have been done with single thread.
For example, distance calculated through this function may return the
lower value than actual because the computational region was reduced.
This would be the case especially where the target features
are spatially sparsely distributed.
Author(s)
Insang Song geoissong@gmail.com
See Also
future::multisession
, future::multicore
, future::cluster
,
future.mirai::mirai_multisession
, future::plan
, par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
future::plan(future.mirai::mirai_multisession, workers = 2)
nccnty <- sf::st_read(
system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccnty <- nccnty[seq_len(30L), ]
nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)
rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
ncsamp <-
sf::st_sample(
nccnty,
size = 1e4L
)
# sfc to sf
ncsamp <- sf::st_as_sf(ncsamp)
# assign ID
ncsamp$kid <- sprintf("K-%05d", seq_len(nrow(ncsamp)))
res <-
par_hierarchy(
regions = nccnty,
regions_id = "FIPS",
fun_dist = extract_at,
y = nccntygrid,
x = rastpath,
id = "GEOID",
func = "mean"
)
future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)
Parallelize spatial computation by hierarchy in input data
Description
"Hierarchy" refers to a system,
which divides the entire study region into multiple subregions.
It is usually reflected in an area code system
(e.g., FIPS for US Census geographies and
Nomenclature of Territorial Units for Statistics (NUTS), etc.).
mirai::daemons will set the parallel backend then mirai::mirai_map
will the work by splitting lower level features into
several higher level feature group. For details of the terminology
in mirai
package, refer to mirai::mirai
.
Each thread will process the number of lower level features
in each higher level feature. Be advised that
accessing the same file simultaneously with
multiple processes may result in errors.
Usage
par_hierarchy_mirai(
regions,
regions_id = NULL,
length_left = NULL,
pad = 0,
pad_y = FALSE,
fun_dist,
...,
.debug = TRUE
)
Arguments
regions |
|
regions_id |
character(1). Name of unique ID field in |
length_left |
integer(1). Length of the first characters of
the |
pad |
numeric(1). Padding distance for each subregion defined
by |
pad_y |
logical(1). Whether to filter y with the padded grid.
Should be TRUE when x is where the values are calculated.
Default is |
fun_dist |
|
... |
Arguments passed to the argument |
.debug |
logical(1). Default is |
Details
In dynamic dots (...
), fun_dist
arguments should include
x and y where sf/terra class objects or file paths are accepted.
Hierarchy is interpreted by the regions_id
argument first.
regions_id
is assumed to be a field name in the x
or y
argument
object. It is expected that regions
represents the higher level
boundaries and x
or y
in fun_dist
is the lower level boundaries.
However, if that is not the case, with trim
argument, the function
will generate the higher level codes from regions_id
by extracting
the code from the left end (controlled by length_left
).
Whether x
or y
is searched is determined by pad_y
value.
pad_y = TRUE
will make the function attempt to find regions_id
in x
, whereas pad_y = FALSE
will look for regions_id
at
y
. If the regions_id
doesn't exist in x
or y
, the function
will utilize spatial relationship (intersects) to filter the data.
Note that dispatching computation by subregions based on the spatial
relationship may lead to a slight discrepancy in the result. For
example, if the higher and lower level features are not perfectly
aligned, there may be some features that are not included or duplicated
in the subregions. The function will alert the user if spatial relation-
ship is used to filter the data.
Value
a data.frame object with computation results.
For entries of the results, consult the function used in
fun_dist
argument.
Note
Virtually any sf/terra functions that accept two arguments
can be put in fun_dist
; however, be advised that
some spatial operations do not necessarily give the
exact result from what would have been done with one thread.
For example, distance calculated through this function may return the
lower value than actual because the computational region was reduced.
This would be the case especially where the target features
are spatially sparsely distributed.
Author(s)
Insang Song geoissong@gmail.com
See Also
mirai::mirai_map
, mirai::daemons
, par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
mirai::daemons(4, dispatcher = "process")
nccnty <- sf::st_read(
system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)
rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
# Using raster path
rastpath <- file.path(tempdir(), "ncelev.tif")
terra::writeRaster(rrast, rastpath, overwrite = TRUE)
ncsamp <-
sf::st_sample(
nccnty,
size = 1e4L
)
# sfc to sf
ncsamp <- sf::st_as_sf(ncsamp)
# assign ID
ncsamp$kid <- sprintf("K-%05d", seq_len(nrow(ncsamp)))
res <-
par_hierarchy_mirai(
regions = nccnty,
regions_id = "FIPS",
fun_dist = extract_at,
y = nccntygrid,
x = rastpath,
id = "GEOID",
func = "mean",
.debug = TRUE
)
mirai::daemons(0L)
par(lastpar)
Generate groups based on balanced clustering
Description
For balancing computational loads, the function uses
the anticlust
package to cluster the input points. The number of clusters
is determined by the num_cluster
argument. Each cluster will have
equal number of points. Grids will be generated based on the cluster
extents. At the lower level, the function uses terra::distance()
function to calculate the Euclidean distance between points.
Usage
par_make_balanced(points_in = NULL, n_clusters = NULL)
Arguments
points_in |
|
n_clusters |
integer(1). The number of clusters. |
Value
SpatVector
object with a field "CGRIDID"
.
Note
This function is only for two-dimensional points. The results will be irregular grids with or without overlapping parts.
Author(s)
Insang Song
Generate grid polygons
Description
Returns a sf object that includes x- and y- index by using two inputs ncutsx and ncutsy, which are x- and y-directional splits, respectively.
Usage
par_make_grid(points_in = NULL, ncutsx = NULL, ncutsy = NULL)
Arguments
points_in |
|
ncutsx |
integer(1). The number of splits along x-axis. |
ncutsy |
integer(1). The number of splits along y-axis. |
Value
A sf
or SpatVector
object of computation grids with
unique grid id (CGRIDID).
Note
Grids are generated based on the extent of points_in
first,
then exhaustive grids will be filtered by the intersection between
these and points_in
. Thus, the number of generated grids may be
smaller than ncutsx * ncutsy
.
Author(s)
Insang Song
See Also
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Merge adjacent grid polygons with given rules
Description
Merge boundary-sharing (in "Rook" contiguity) grids with
fewer target features than the threshold.
This function strongly assumes that the input
is returned from the internal function par_make_grid
,
which has "CGRIDID"
as the unique id field.
Usage
par_merge_grid(
points_in = NULL,
grid_in = NULL,
grid_min_features = NULL,
merge_max = 4L
)
Arguments
points_in |
|
grid_in |
|
grid_min_features |
integer(1). Threshold to merge adjacent grids. |
merge_max |
integer(1).
Maximum number of grids to merge per merged set. Default is 4.
For example, if the number of grids to merge is 20 and |
Value
A sf
or SpatVector
object of computation grids.
Note
This function will not work properly if grid_in
has
more than one million grids.
Author(s)
Insang Song
References
Polsby DD, Popper FJ. (1991). The Third Criterion: Compactness as a Procedural Safeguard Against Partisan Gerrymandering. Yale Law & Policy Review, 9(2), 301–353.
See Also
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(sf)
library(igraph)
library(dplyr)
library(spatstat.random)
options(sf_use_s2 = FALSE)
dg <- sf::st_as_sfc(st_bbox(c(xmin = 0, ymin = 0, xmax = 8e5, ymax = 6e5)))
sf::st_crs(dg) <- 5070
dgs <- sf::st_as_sf(st_make_grid(dg, n = c(20, 15)))
dgs$CGRIDID <- seq(1, nrow(dgs))
dg_sample <- sf::st_sample(dg, kappa = 5e-9, mu = 15,
scale = 15000, type = "Thomas")
sf::st_crs(dg_sample) <- sf::st_crs(dg)
dg_merged <- par_merge_grid(sf::st_as_sf(dg_sample), dgs, 100)
plot(sf::st_geometry(dg_merged))
par(lastpar)
Parallelize spatial computation over multiple raster files
Description
Large raster files usually exceed the memory capacity in size.
This function can be helpful to process heterogenous raster files with
homogeneous summary functions. Heterogenous raster files refer to
rasters with different spatial extents and resolutions.
Cropping a large raster into a small subset even consumes
a lot of memory and adds processing time.
This function leverages terra
SpatRaster
to distribute computation jobs over multiple threads.
It is assumed that users have multiple large raster files
in their disk, then each file path is assigned to a thread.
Each thread will directly read raster values from
the disk using C++ pointers that operate in terra functions.
For use, it is strongly recommended to use vector data with
small and confined spatial extent for computation to avoid
out-of-memory error. y
argument in fun_dist
will be used as-is.
That means no preprocessing or subsetting will be
applied. Please be aware of the spatial extent and size of the
inputs.
Usage
par_multirasters(filenames, fun_dist, ..., .debug = FALSE)
Arguments
filenames |
character. A vector or list of full file paths of raster files. n is the total number of raster files. |
fun_dist |
terra or chopin functions that accept |
... |
Arguments passed to the argument |
.debug |
logical(1). Default is |
Value
a data.frame object with computation results.
For entries of the results,
consult the function used in fun_dist
argument.
Author(s)
Insang Song geoissong@gmail.com
See Also
future::multisession
, future::multicore
, future::cluster
,
future.mirai::mirai_multisession
, future::plan
, par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(future)
library(future.mirai)
options(sf_use_s2 = FALSE)
future::plan(future.mirai::mirai_multisession, workers = 2)
nccnty <- sf::st_read(
system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)
rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
tdir <- tempdir(check = TRUE)
testpath1 <- file.path(tdir, "test1.tif")
testpath2 <- file.path(tdir, "test2.tif")
terra::writeRaster(rrast, testpath1, overwrite = TRUE)
terra::writeRaster(rrast, testpath2, overwrite = TRUE)
testfiles <- list.files(tdir, pattern = "tif$", full.names = TRUE)
res <- par_multirasters(
filenames = testfiles,
fun_dist = extract_at,
x = testpath1,
y = nccnty,
id = "GEOID",
func = "mean"
)
future::plan(future::sequential)
mirai::daemons(0)
par(lastpar)
Parallelize spatial computation over multiple raster files
Description
Large raster files usually exceed the memory capacity in size.
This function can be helpful to process heterogenous raster files with
homogeneous summary functions. Heterogenous raster files refer to
rasters with different spatial extents and resolutions.
Cropping a large raster into a small subset even consumes
a lot of memory and adds processing time.
This function leverages terra
SpatRaster
to distribute computation jobs over multiple threads.
It is assumed that users have multiple large raster files
in their disk, then each file path is assigned to a thread.
Each thread will directly read raster values from
the disk using C++ pointers that operate in terra functions.
For use, it is strongly recommended to use vector data with
small and confined spatial extent for computation to avoid
out-of-memory error. y
argument in fun_dist
will be used as-is.
That means no preprocessing or subsetting will be
applied. Please be aware of the spatial extent and size of the
inputs.
Usage
par_multirasters_mirai(filenames, fun_dist, ..., .debug = TRUE)
Arguments
filenames |
character. A vector or list of full file paths of raster files. n is the total number of raster files. |
fun_dist |
terra or chopin functions that accept |
... |
Arguments passed to the argument |
.debug |
logical(1). Default is |
Value
a data.frame object with computation results.
For entries of the results,
consult the function used in fun_dist
argument.
Author(s)
Insang Song geoissong@gmail.com
See Also
mirai::mirai
, mirai::mirai_map
, mirai::daemons
,
par_convert_f
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_pad_balanced()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
library(mirai)
options(sf_use_s2 = FALSE)
mirai::daemons(4, dispatcher = "process")
nccnty <- sf::st_read(
system.file("shape/nc.shp", package = "sf")
)
nccnty <- sf::st_transform(nccnty, "EPSG:5070")
nccnty <- nccnty[seq_len(30L), ]
nccntygrid <- sf::st_make_grid(nccnty, n = c(200, 100))
nccntygrid <- sf::st_as_sf(nccntygrid)
nccntygrid$GEOID <- sprintf("%05d", seq_len(nrow(nccntygrid)))
nccntygrid <- sf::st_intersection(nccntygrid, nccnty)
rrast <- terra::rast(nccnty, nrow = 600, ncol = 1320)
terra::values(rrast) <- rgamma(7.92e5, 4, 2)
tdir <- tempdir(check = TRUE)
terra::writeRaster(rrast, file.path(tdir, "test1.tif"), overwrite = TRUE)
terra::writeRaster(rrast, file.path(tdir, "test2.tif"), overwrite = TRUE)
testfiles <- list.files(tdir, pattern = "tif$", full.names = TRUE)
res <- par_multirasters_mirai(
filenames = testfiles,
fun_dist = extract_at,
x = rrast,
y = nccnty,
id = "GEOID",
func = "mean"
)
mirai::daemons(0L)
par(lastpar)
Extension of par_make_balanced for padded grids
Description
This function utilizes anticlust::balanced_clustering()
to split the input into equal size subgroups then transform the data
to be compatible with the output of par_pad_grid
, for which
a set of padded grids of the extent of input point subsets
(as recorded in the field named "CGRIDID"
)
is generated out of input points.
Usage
par_pad_balanced(points_in = NULL, ngroups, padding)
Arguments
points_in |
|
ngroups |
integer(1). The number of groups. |
padding |
numeric(1). A extrusion factor to make buffer to clip actual datasets. Depending on the length unit of the CRS of input. |
Value
A list of two,
-
original
: exhaustive and non-overlapping grid polygons in the class of input -
padded
: a square buffer of each polygon inoriginal
. Used for computation.
Author(s)
Insang Song
See Also
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_grid()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
library(terra)
library(sf)
options(sf_use_s2 = FALSE)
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
nc_rp <- terra::spatSample(nc, 1000)
nc_gr <- par_pad_balanced(nc_rp, 10L, 1000)
nc_gr
par(lastpar)
Get a set of computational grids
Description
Using input points, the bounding box is split to the predefined numbers of columns and rows. Each grid will be buffered by the radius.
Usage
par_pad_grid(
input,
mode = c("grid", "grid_advanced", "grid_quantile"),
nx = 10L,
ny = 10L,
grid_min_features = 30L,
padding = NULL,
unit = NULL,
quantiles = NULL,
merge_max = NULL,
return_wkt = FALSE,
...
)
Arguments
input |
sf or Spat* object. |
mode |
character(1). Mode of region construction. One of
|
nx |
integer(1). The number of grids along x-axis. |
ny |
integer(1). The number of grids along y-axis. |
grid_min_features |
integer(1). A threshold to merging adjacent grids |
padding |
numeric(1). A extrusion factor to make buffer to clip actual datasets. Depending on the length unit of the CRS of input. |
unit |
character(1). The length unit for padding (optional).
|
quantiles |
numeric. Quantiles for |
merge_max |
integer(1). Maximum number of grids to merge per merged set. |
return_wkt |
logical(1). Return WKT format. When |
... |
arguments passed to the internal function |
Value
A list of two,
-
original
: exhaustive (filling completely) and non-overlapping grid polygons in the class of input -
padded
: a square buffer of each polygon inoriginal
. Used for computation.
Author(s)
Insang Song
See Also
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_split_list()
Examples
lastpar <- par(mfrow = c(1, 1))
# data
library(sf)
options(sf_use_s2 = FALSE)
ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(ncpath)
nc <- st_transform(nc, "EPSG:5070")
# run: nx and ny should strictly be integers
nc_comp_region <-
par_pad_grid(
nc,
mode = "grid",
nx = 4L, ny = 2L,
padding = 10000)
par(mfcol = c(1, 2))
plot(nc_comp_region$original$geometry)
plot(nc_comp_region$padded$geometry)
nc_comp_region_wkt <-
par_pad_grid(
nc,
mode = "grid",
nx = 4L, ny = 2L,
padding = 10000,
return_wkt = TRUE)
nc_comp_region_wkt$original
nc_comp_region_wkt$padded
par(lastpar)
Split grid list to a nested list of row-wise data frames
Description
Split grid list to a nested list of row-wise data frames
Usage
par_split_list(gridlist)
Arguments
gridlist |
list. Output of |
Details
If the input is a data frame, the function will return a list of
two data frames: original
and padded
. If the input is a WKT vector,
the function will return a list of two WKT strings: original
and padded
.
Value
A nested list of data frames or WKT strings.
See Also
Other Parallelization:
par_cut_coords()
,
par_grid()
,
par_grid_mirai()
,
par_hierarchy()
,
par_hierarchy_mirai()
,
par_make_grid()
,
par_merge_grid()
,
par_multirasters()
,
par_multirasters_mirai()
,
par_pad_balanced()
,
par_pad_grid()
Examples
lastpar <- par(mfrow = c(1, 1))
library(sf)
library(terra)
options(sf_use_s2 = FALSE)
ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(ncpath)
nc <- st_transform(nc, "EPSG:5070")
nc_comp_region <-
par_pad_grid(
nc,
mode = "grid",
nx = 4L, ny = 2L,
padding = 10000)
par_split_list(nc_comp_region)
par(lastpar)
Regular grid points in the mainland United States at 1km spatial resolution
Description
Regular grid points in the mainland United States at 1km spatial resolution
Usage
prediction_grid
Format
A data frame with 8,092,995 rows and three variables:
- site_id
Unique point identifier. Arbitrarily generated.
- lon
Longitude
- lat
Latitude
Note
Coordinates are in EPSG:5070 (Conus Albers Equal Area)
Source
Mainland United States polygon was obtained from the US Census Bureau.
See Also
Other Dataset:
ncpoints
Examples
data("prediction_grid", package = "chopin")
Check coordinate system then reproject
Description
The input is checked whether its coordinate system is
present. If not, it is reprojected to the CRS specified in
crs_standard
.
Usage
reproject_std(input, crs_standard = "EPSG:4326")
Arguments
input |
Input object one of sf or terra::Spat* object |
crs_standard |
character(1). A standard definition of
coordinate reference system. Default is |
Value
A (reprojected) sf
or SpatVector
object.
Note
This function works well with EPSG codes.
Author(s)
Insang Song
See Also
Other Helper functions:
datamod()
,
dep_check()
,
dep_switch()
,
get_clip_ext()
,
par_def_q()
,
reproject_to_raster()
Align vector CRS to raster's
Description
Align vector CRS to raster's
Usage
reproject_to_raster(vector = NULL, raster = NULL)
Arguments
vector |
|
raster |
|
Value
Reprojected object in the same class as vector
Author(s)
Insang Song
See Also
Other Helper functions:
datamod()
,
dep_check()
,
dep_switch()
,
get_clip_ext()
,
par_def_q()
,
reproject_std()
Area weighted summary using two polygon objects
Description
When x
and y
are different classes,
poly_weight
will be converted to the class of x
.
Usage
summarize_aw(x, y, ...)
## S4 method for signature 'SpatVector,SpatVector'
summarize_aw(
x,
y,
target_fields = NULL,
id_x = "ID",
fun = stats::weighted.mean,
extent = NULL,
...
)
## S4 method for signature 'character,character'
summarize_aw(
x,
y,
target_fields = NULL,
id_x = "ID",
fun = stats::weighted.mean,
out_class = "terra",
extent = NULL,
...
)
## S4 method for signature 'sf,sf'
summarize_aw(
x,
y,
target_fields = NULL,
id_x = "ID",
fun = NULL,
extent = NULL,
...
)
Arguments
x |
A sf/SpatVector object or file path of polygons detectable with GDAL driver at weighted means will be calculated. |
y |
A sf/SpatVector object or file path of polygons from which weighted means will be calculated. |
... |
Additional arguments depending on class of |
target_fields |
character. Field names to calculate area-weighted. |
id_x |
character(1).
The unique identifier of each polygon in |
fun |
function(1)/character(1).
The function to calculate the weighted summary.
Default is |
extent |
numeric(4) or SpatExtent object. Extent of clipping |
out_class |
character(1). "sf" or "terra". Output class. |
Value
A data.frame with all numeric fields of area-weighted means.
Note
x
and y
classes should match.
If x
and y
are characters, they will be
read as sf
objects.
Author(s)
Insang Song geoissong@gmail.com
See Also
Other Macros for calculation:
extract_at()
,
kernelfunction()
,
summarize_sedc()
Examples
lastpar <- par(mfrow = c(1, 1))
# package
library(sf)
options(sf_use_s2 = FALSE)
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"))
nc <- sf::st_transform(nc, "EPSG:5070")
pp <- sf::st_sample(nc, size = 300)
pp <- sf::st_as_sf(pp)
pp[["id"]] <- seq(1, nrow(pp))
sf::st_crs(pp) <- "EPSG:5070"
ppb <- sf::st_buffer(pp, nQuadSegs=180, dist = units::set_units(20, "km"))
suppressWarnings(
ppb_nc_aw <-
summarize_aw(
ppb, nc, c("BIR74", "BIR79"),
"id", fun = "sum"
)
)
summary(ppb_nc_aw)
# terra examples
library(terra)
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
pp <- terra::spatSample(nc, size = 300)
pp[["id"]] <- seq(1, nrow(pp))
ppb <- terra::buffer(pp, 20000)
suppressWarnings(
ppb_nc_aw <-
summarize_aw(
ppb, nc, c("BIR74", "BIR79"), "id",
fun = sum
)
)
summary(ppb_nc_aw)
par(lastpar)
Calculate Sum of Exponentially Decaying Contributions (SEDC) covariates
Description
Calculate Sum of Exponentially Decaying Contributions (SEDC) covariates
Usage
summarize_sedc(
point_from = NULL,
point_to = NULL,
id = NULL,
sedc_bandwidth = NULL,
threshold = NULL,
target_fields = NULL,
extent_from = NULL,
extent_to = NULL,
...
)
Arguments
point_from |
|
point_to |
|
id |
character(1). Name of the unique id field in |
sedc_bandwidth |
numeric(1).
Distance at which the source concentration is reduced to
|
threshold |
numeric(1). For computational efficiency,
the nearest points in threshold will be selected.
|
target_fields |
character. Field names to calculate SEDC. |
extent_from |
numeric(4) or SpatExtent. Extent of clipping |
extent_to |
numeric(4) or SpatExtent. Extent of clipping |
... |
Placeholder. |
Details
The SEDC is specialized in vector to vector summary of covariates
with exponential decay. Decaying slope will be defined by sedc_bandwidth
,
where the concentration of the source is reduced to $\exp(-3)$
(approximately 5 \
of the attenuating concentration with the distance from the sources.
It can be thought of as a fixed bandwidth kernel weighted sum of covariates,
which encapsulates three steps:
Calculate the distance between each source and target points.
Calculate the weight of each source point with the exponential decay.
Summarize the weighted covariates.
Value
data.frame object with input field names with
a suffix "_sedc"
where the sums of EDC are stored.
Additional attributes are attached for the EDC information.
-
attr(result, "sedc_bandwidth")
: the bandwidth where concentration reduces to approximately five percent -
attr(result, "sedc_threshold")
: the threshold distance at which emission source points are excluded beyond that
Note
Distance calculation is done with terra
functions internally.
Thus, the function internally converts sf
objects in
point_*
arguments to terra
. Please note that any NA
values
in the input will be ignored in SEDC calculation.
Author(s)
Insang Song
References
Messier KP, Akita Y, Serre ML. (2012). Integrating Address Geocoding, Land Use Regression, and Spatiotemporal Geostatistical Estimation for Groundwater Tetrachloroethylene. Environmental Science & Technology 46(5), 2772-2780.(doi:10.1021/es203152a)
Wiesner C. (n.d.). Euclidean Sum of Exponentially Decaying Contributions Tutorial.
See Also
Other Macros for calculation:
extract_at()
,
kernelfunction()
,
summarize_aw()
Examples
library(terra)
library(sf)
set.seed(101)
ncpath <- system.file("gpkg/nc.gpkg", package = "sf")
nc <- terra::vect(ncpath)
nc <- terra::project(nc, "EPSG:5070")
pnt_from <- terra::centroids(nc, inside = TRUE)
pnt_from <- pnt_from[, "NAME"]
pnt_to <- terra::spatSample(nc, 100L)
pnt_to$pid <- seq(1, 100)
pnt_to <- pnt_to[, "pid"]
pnt_to$val1 <- rgamma(100L, 1, 0.05)
pnt_to$val2 <- rgamma(100L, 2, 1)
vals <- c("val1", "val2")
suppressWarnings(
summarize_sedc(pnt_from, pnt_to, "NAME", 1e5, 2e5, vals)
)