--- title: "Small Area Estimation Using Spatial Modeling in 'hbsaems'" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Small area estimation using spatial modeling in hbsaems} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` # Introduction In spatial data analysis, spatial effects play an important role in understanding patterns and relationships between locations. Spatial effects arise when the value of a variable in one location is affected by the value of the same variable in another location. This can happen due to natural processes (e.g. the spread of air pollution) or socio-economic phenomena (e.g. house prices in one area affect prices in surrounding areas). This vignette aims to show how this package can be used to add spatial effects in Small Area Estimation Hierarchical Bayesian (SAEHB) modeling. # Model Overview The `hbsaems` package can handle spatial dependence effects (Spatial Autocorrelation) by using the : 1. **CAR (Conditional Autoregressive)** 2. **SAR (Simultaneous Autoregressive)** ## 1. **CAR (Conditional Autoregressive)** Conditional Autoregressive (CAR) model is a spatial model that describes the relationship between adjacent areas through a local condition approach. In this model, the value of a variable in a location is directly influenced by the value of the same variable in the surrounding areas. CAR uses adjacency matrix to determine the relationship between locations. ### Simulated Data Example ```{r} library(hbsaems) # Load data data("data_fhnorm") data <- data_fhnorm head(data) # Load adjacency matrix data("adjacency_matrix_car") adjacency_matrix_car ``` The data used is synthetic data in the `hbsaems` package which can be used to help understand the use of the package ### Fitting the Model #### `hbm` Function ```{r} model_car <- hbm( formula = bf(y ~ x1 + x2 + x3), # Formula model hb_sampling = "gaussian", # Gaussian family for continuous outcomes hb_link = "identity", # Identity link function (no transformation) re = ~(1|group), sre = "sre", # Spatial random effect variable sre_type = "car", car_type = "icar", M = adjacency_matrix_car, data = data) # Dataset summary(model_car) ``` To apply spatial effects to the data, we need a variable that maps observations to spatial locations (`sre`). If not specified, each observation is treated as a separate location. Therefore, in this case the dimension of the adjacency matrix of locations should be equal to the number of observations. It is recommended to always specify a grouping factor to allow for handling of new data in postprocessing methods. Users must specify the type of spatial random effects used in the model (`sre_type`), if they choose `car` type, users must also specify the CAR structure type (`car_type`). Currently implemented are "escar" (exact sparse CAR), "esicar" (exact sparse intrinsic CAR), "icar" (intrinsic CAR), and "bym2". When the `car_type` value is not specifically defined, its defaults to `escar`. We also need information about the adjacency matrix (`M`). In the spatial adjacency matrix, a value of 1 indicates that two spatial groups are neighbors based on their proximity to the city center, while a value of 0 indicates no adjacency relationship. The row names of the adjacency matrix must be the same as the grouping names in the variable `sre`. In the data that will be used this time, the spatial grouping names are given the numbers 1, 2, 3, 4, and 5. #### `hbm_distribution` Function In addition to the `hbm` function, spatial effects can be applied by calling the `hbm_distribution` function as in `hbm_betalogitnorm`, `hbm_binlogitnorm`, and `hbm_lnln`. ```{r} # Load data data("data_betalogitnorm") head(data_betalogitnorm) model_car_beta <- hbm_betalogitnorm(response = "y", predictors = c("x1", "x2", "x3"), sre = "sre", sre_type = "car", car_type = "icar", M = adjacency_matrix_car, data = data_betalogitnorm) summary(model_car_beta) ``` ## 2. **SAR (Simultaneous Autoregressive)** The Simultaneous Autoregressive Model (SAR) handles spatial effects by defining dependence simultaneously across the study area. SAR uses a spatial weight matrix to describe the relationship between regions and can be applied in various forms, such as Spatial Lag Model (SLM) or Spatial Error Model (SEM), depending on how spatial effects affect the dependent variable or error in the model. `M` can be either the spatial weight matrix itself or an object of class listw or nb, from which the spatial weighting matrix can be computed. The **lagsar** structure implements SAR of the response values: $$y = \rho W y + \eta + e$$ The **errorsar** structure implements SAR of the residuals: $$y = \eta + u, \quad u = \rho W u + e$$ In the above equations, $\eta$ is the predictor term and $e$ are independent normally or t-distributed residuals. Currently, only families **gaussian** and **student** support SAR structures. ### Simulated Data Example ```{r} library(hbsaems) # Load data data("data_fhnorm") data <- data_fhnorm head(data) # Load adjacency matrix data("spatial_weight_sar") spatial_weight_sar ``` ### Fitting the Model ```{r} model_sar <- hbm( formula = bf(y ~ x1 + x2 + x3), # Formula model hb_sampling = "gaussian", # Gaussian family for continuous outcomes hb_link = "identity", # Identity link function (no transformation) re = ~(1|group), sre_type = "sar", sar_type = "lag", M = spatial_weight_sar, data = data) # Dataset summary(model_sar) ``` In SAR there is no need for `sre` information, SAR only requires information related to `sre_type`, `sar_type`, and `M`.`sar_type` is type of the SAR structure. Either `lag` (for SAR of the response values) or `error` (for SAR of the residuals). When the `sar_type` value is not specifically defined, its defaults to `lag`. The `M` matrix in SAR is a spatial weighting matrix that shows the spatial relationship between locations with certain weights. # Conclusion The **hbsaems** package supports spatially-informed small-area estimation through both Conditional Autoregressive (CAR) and Simultaneous Autoregressive (SAR) models, enabling improved precision by capturing spatial dependence among areas. CAR emphasizes local adjacency via binary neighborhood structure, while SAR allows varying spatial influence through a weighted matrix. By leveraging spatial correlation, CAR and SAR enhance the reliability of estimates, particularly in data-sparse regions, making them powerful tools for small-area estimation.