Small Area Estimation Using Spatial Modeling in ‘hbsaems’

Introduction

In spatial data analysis, spatial effects play an important role in understanding patterns and relationships between locations. Spatial effects arise when the value of a variable in one location is affected by the value of the same variable in another location. This can happen due to natural processes (e.g. the spread of air pollution) or socio-economic phenomena (e.g. house prices in one area affect prices in surrounding areas).

This vignette aims to show how this package can be used to add spatial effects in Small Area Estimation Hierarchical Bayesian (SAEHB) modeling.

Model Overview

The hbsaems package can handle spatial dependence effects (Spatial Autocorrelation) by using the :

  1. CAR (Conditional Autoregressive)
  2. SAR (Simultaneous Autoregressive)

1. CAR (Conditional Autoregressive)

Conditional Autoregressive (CAR) model is a spatial model that describes the relationship between adjacent areas through a local condition approach. In this model, the value of a variable in a location is directly influenced by the value of the same variable in the surrounding areas. CAR uses adjacency matrix to determine the relationship between locations.

Simulated Data Example

library(hbsaems)

# Load data
data("data_fhnorm")
data <- data_fhnorm
head(data)

# Load adjacency matrix
data("adjacency_matrix_car")
adjacency_matrix_car

The data used is synthetic data in the hbsaems package which can be used to help understand the use of the package

Fitting the Model

hbm Function

model_car <- hbm(
  formula = bf(y ~ x1 + x2 + x3),  # Formula model
  hb_sampling = "gaussian",        # Gaussian family for continuous outcomes
  hb_link = "identity",            # Identity link function (no transformation)
  re = ~(1|group),
  sre = "sre",                    # Spatial random effect variable
  sre_type = "car",
  car_type = "icar",
  M = adjacency_matrix_car,
  data = data)                    # Dataset

summary(model_car)

To apply spatial effects to the data, we need a variable that maps observations to spatial locations (sre). If not specified, each observation is treated as a separate location. Therefore, in this case the dimension of the adjacency matrix of locations should be equal to the number of observations. It is recommended to always specify a grouping factor to allow for handling of new data in postprocessing methods.

Users must specify the type of spatial random effects used in the model (sre_type), if they choose car type, users must also specify the CAR structure type (car_type). Currently implemented are “escar” (exact sparse CAR), “esicar” (exact sparse intrinsic CAR), “icar” (intrinsic CAR), and “bym2”. When the car_type value is not specifically defined, its defaults to escar.

We also need information about the adjacency matrix (M). In the spatial adjacency matrix, a value of 1 indicates that two spatial groups are neighbors based on their proximity to the city center, while a value of 0 indicates no adjacency relationship. The row names of the adjacency matrix must be the same as the grouping names in the variable sre. In the data that will be used this time, the spatial grouping names are given the numbers 1, 2, 3, 4, and 5.

hbm_distribution Function

In addition to the hbm function, spatial effects can be applied by calling the hbm_distribution function as in hbm_betalogitnorm, hbm_binlogitnorm, and hbm_lnln.

# Load data
data("data_betalogitnorm")
head(data_betalogitnorm)

model_car_beta <- hbm_betalogitnorm(response = "y",
                                    predictors = c("x1", "x2", "x3"),
                                    sre = "sre",
                                    sre_type = "car",
                                    car_type = "icar",
                                    M = adjacency_matrix_car,
                                    data = data_betalogitnorm)
summary(model_car_beta)

2. SAR (Simultaneous Autoregressive)

The Simultaneous Autoregressive Model (SAR) handles spatial effects by defining dependence simultaneously across the study area. SAR uses a spatial weight matrix to describe the relationship between regions and can be applied in various forms, such as Spatial Lag Model (SLM) or Spatial Error Model (SEM), depending on how spatial effects affect the dependent variable or error in the model. M can be either the spatial weight matrix itself or an object of class listw or nb, from which the spatial weighting matrix can be computed.

The lagsar structure implements SAR of the response values:

\[y = \rho W y + \eta + e\]

The errorsar structure implements SAR of the residuals:

\[y = \eta + u, \quad u = \rho W u + e\]

In the above equations, \(\eta\) is the predictor term and \(e\) are independent normally or t-distributed residuals. Currently, only families gaussian and student support SAR structures.

Simulated Data Example

library(hbsaems)

# Load data
data("data_fhnorm")
data <- data_fhnorm
head(data)

# Load adjacency matrix
data("spatial_weight_sar")
spatial_weight_sar

Fitting the Model

model_sar <- hbm(
  formula = bf(y ~ x1 + x2 + x3),  # Formula model
  hb_sampling = "gaussian",        # Gaussian family for continuous outcomes
  hb_link = "identity",            # Identity link function (no transformation)
  re = ~(1|group),
  sre_type = "sar",
  sar_type = "lag",
  M = spatial_weight_sar,    
  data = data)                    # Dataset

summary(model_sar)

In SAR there is no need for sre information, SAR only requires information related to sre_type, sar_type, and M.sar_type is type of the SAR structure. Either lag (for SAR of the response values) or error (for SAR of the residuals). When the sar_type value is not specifically defined, its defaults to lag.

The M matrix in SAR is a spatial weighting matrix that shows the spatial relationship between locations with certain weights.

Conclusion

The hbsaems package supports spatially-informed small-area estimation through both Conditional Autoregressive (CAR) and Simultaneous Autoregressive (SAR) models, enabling improved precision by capturing spatial dependence among areas. CAR emphasizes local adjacency via binary neighborhood structure, while SAR allows varying spatial influence through a weighted matrix. By leveraging spatial correlation, CAR and SAR enhance the reliability of estimates, particularly in data-sparse regions, making them powerful tools for small-area estimation.