--- title: "Coarse-to-fine spatial GLMM: Application examples" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{spCF_glm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} references: - id: murakami2026cfsm-glm type: report author: - family: Murakami given: Daisuke - family: Comber given: Alexis - family: Yoshida given: Takahiro - family: Tsutsumida given: Narumasa - family: Brunsdon given: Chris - family: Nakaya given: Tomoki title: "Coarse-to-fine spatial GLMM for scalable prediction and multiscale analysis" issued: year: 2026 archive: ArXiv --- ```{=html} ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette demonstrates how to estimate spatial generalized linear mixed models (GLMMs) using the spCF package for smoothing/denoising, prediction, and multiscale analysis. The spatial process in the GLMMs is trained via a coarse-to-fine approach that sequentially learns spatial patterns from coarser to finer scales. This spatial GLMM framework is particularly suitable for moderate-to-large samples. See @murakami2026cfsm-glm for further details. # Example 1: Count data modeling for disease mapping ## Data and setup Let us load the required packages ```{r setup} library(spCF) library(sf) library(CARBayesdata) ``` The *pollutionhealthdata* dataset included in the *CARBayesdata* package is used here. This dataset comprises panel data on respiratory hospitalisations across 271 zones in Greater Glasgow. For simplicity, the center coordinates of each zone are used for spatial modeling. In this example, the observed number of hospitalisations in 2011 (observed) is modeled for smoothing and predicting the latent risk of respiratory disease (i.e., disease mapping). ```{r} data(pollutionhealthdata) dat <- pollutionhealthdata[pollutionhealthdata$year==2011,] ``` The response variable (observed), covariates (pm10, jsa, price), and an offset variable (expected) are specified as follows: ```{r} y <- dat[,"observed"] # count data x <- dat[,c("pm10","jsa","price")]# covariates offset <- log(dat[,"expected"]) # offset variable ``` The center coordinates are extracted as: ```{r} data(GGHB.IZ) # polygons of the 271 zones coords <- st_coordinates(st_centroid(GGHB.IZ))# coordinates ``` observed is spatially plotted as follows: ```{r, fig.width=4.5, fig.height=4} GGHB.IZ$y <- y plot(GGHB.IZ[,"y"],lwd=0.01,axes=TRUE, key.pos=4,nbreaks=50) ``` ## Coarse-to-fine spatial GLMM (CF-GLM) In CF-GLM, the spatial process is defined as a sum of scalewise processes, where the number of spatial scales, R, is optimized via holdout validation. A smaller R, corresponding to early stopping, allows the spatial process to capture only coarse-scale patterns, whereas a larger R enables the spatial process to represent finer-scale patterns. The `cf_glm_hv` function performs holdout validation as follows: ```{r} mod_hv <- cf_glm_hv(y = y, x = x, offset=offset, coords = coords, family=poisson()) ``` As shown in the output, the deviance loss for the validation samples gradually decreases as learning proceeds from the coarsest scale (Scale 1) to finer scales. After holdout validation, the full model is trained using the `cf_glm` function: ```{r} mod <- cf_glm(y = y, x = x, offset=offset, coords = coords, mod_hv = mod_hv) ``` The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows: ```{r} mod ``` ## Disease mapping ### Predictive values The predictive means, representing the denoised risk of respiratory disease, and their standard deviations (SDs) are mapped as follows: ```{r, fig.height=4, fig.width=4.5} # Predictive mean GGHB.IZ$pred <- mod$pred$pred plot(GGHB.IZ[,c("pred")],lwd=0.01,axes=TRUE, key.pos=4,nbreaks=50) # Predictive SD GGHB.IZ$pred_sd<- mod$pred$pred_sd plot(GGHB.IZ[,c("pred_sd")],pal = hcl.colors(9, "Viridis"), lwd=0.01, axes=TRUE, key.pos=4) ``` The result suggests that both disease risk and their uncertainty increase in the central area. ### Multiscale spatial pattern/feature extraction The `sp_scalewise` function extracts scalewise spatial processes with bandwidth values falling within a pre-specified range. For example, the following commands extract the large- and small-scale processes, corresponding to bandwidth ranges of 4000+ and 0–4000, respectively. ```{r} mod_s1 <- sp_scalewise(mod,bw_range=c(4000,Inf)) # Large scale (4000 <= bandwidth) mod_s2 <- sp_scalewise(mod,bw_range=c(0,4000)) # Small scale (bandwidth <= 4000) ``` The extracted scalewise processes are mapped as follows: ```{r, fig.height=3.5, fig.width=7.5} GGHB.IZ$z1 <- mod_s1$pred$pred GGHB.IZ$z2 <- mod_s2$pred$pred plot(GGHB.IZ[,c("z1","z2")],lwd=0.01,axes=TRUE,key.pos=4, nbreaks=50) ``` The `sp_scalewise` function is useful for multiscale spatial pattern analysis (or feature extraction), which is commonly conducted in ecological, epidemiological, and environmental studies. # Example 2: Binary data modeling and prediction ## Data and setup Let us load the required packages ```{r} library(spCF) library(sp) library(sf) ``` The *meuse* dataset from the *sp* package consists of observations at 155 sample sites in a floodplain along the River Meuse. In this example, we consider a binary response variable (`flood`), which takes the value 1 if the flooding frequency class is once every two years (i.e., flood-prone area) and 0 otherwise. The binary response is predicted over 3,103 regularly spaced grid cells (*meuse.grid*). The covariate considered is the distance to the river (`dist`). ```{r} ### Data at samples sites data(meuse) flood <- ifelse(meuse$ffreq==1, 1, 0 )# Binary response variable coords <- meuse[,c("x","y")] # Coordinates x <- meuse[,"dist"] # Covariate ### Data at prediction sites data(meuse.grid) coords0 <- meuse.grid[,c("x","y")] # Coordinates x0 <- meuse.grid[,"dist"] # Covariate ``` `flood` is spatially plotted as follows: ```{r, fig.width=4.5, fig.height=4} obs_s <- st_as_sf( data.frame(coords, flood), coords= c("x","y"), crs=28992) plot(obs_s[,"flood"], pch = 20, key.pos=4, axes=TRUE) ``` ## Coarse-to-fine spatial GLMM (CF-GLM) The code implementing the CF-GLM is essentially the same as in the previous example, except that `family` is set to `binomial()` for modeling binary data: ```{r} set.seed(1234) # For this vignette, training samples are fixed mod_hv <- cf_glm_hv(y = flood, x = x, coords = coords, family=binomial()) ``` In the subsequent full model training using the `cf_glm` function, the covariates (`x0`) and coordinates (`coords0`) at the prediction sites are also specified to enable spatial prediction: ```{r} mod <- cf_glm(y = flood, x=x, coords = coords, x0=x0, coords0 = coords0, mod_hv = mod_hv) ``` The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows: ```{r} mod ``` ## Mapping ### Predictive values The predictive values and their standard deviations at the grid cells are mapped as follows: ```{r, fig.height=4, fig.width=4.5} ### Convert gridded points to gridded polygons (for clear visualization) meuse.grid_sp <- meuse.grid coordinates(meuse.grid_sp)<- c("x", "y") gridded(meuse.grid_sp) <- TRUE meuse.grid_sf <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons")) st_crs(meuse.grid_sf) <- 28992 ### Mapping predictive mean and standard deviations meuse.grid_sf$pred <- mod$pred0$pred # Predictive mean meuse.grid_sf$pred_sd <- mod$pred0$pred_sd# Predictive standard deviations plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE) plot(meuse.grid_sf[,"pred_sd"], pal = hcl.colors(9, "Viridis"), border = NA,key.pos=4,axes=TRUE) ``` ### Multiscale spatial pattern/feature extraction Let us extract the large- and small-scale processes, corresponding to bandwidth ranges of 1000+ and 0–1000, respectively. ```{r} mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth) mod_s2<- sp_scalewise(mod,bw_range=c(0,1000)) # Small scale (0 <= bandwidth <= 1000) ``` The extracted scalewise processes are mapped as follows: ```{r, fig.height=3.5, fig.width=7.5} meuse.grid_sf$z1 <- mod_s1$pred0$pred meuse.grid_sf$z2 <- mod_s2$pred0$pred plot(meuse.grid_sf[,c("z1","z2")], border=NA, nbreaks=20, key.pos=4, axes=TRUE) ``` ### Reference