Using pspatreg with cross-sectional data

Román Mínguez (UCLM), Roberto Basile (UNIVAQ), María Durbán (UC3M)

2023-10-06

1 Models for cross-sectional data

We start our demo with an application of pspatreg to the analysis of cross-sectional data. In particular, we use Ames dataset (included in the package AmesHousing) which contains data on 2,930 properties in Ames, Iowa. The dataset includes information related to house characteristics (bedrooms, garages, fireplaces, pools, porches, etc.), location characteristics (neighborhood), lot information (zoning, shape, size, etc.), ratings of condition and quality and sale price (from 2006 to 2010). The section is organized as follows:

1.1 Reading the data

The dataset is a spatial point dataset. It contains cross-sectional information on 2,930 properties in Ames, Iowa. The raw dataset (ames) has been transformed in a spatial point dataset of class sf as follows:

library(pspatreg)
library(spdep)
library(sf)
library(ggplot2)
library(dplyr)
ames <- AmesHousing::make_ames()# Raw Ames Housing Data
ames_sf <- st_as_sf(ames, coords = c("Longitude", "Latitude"))
ames_sf$Longitude <- ames$Longitude
ames_sf$Latitude <- ames$Latitude

The dependent variable in the regression analysis is Sale_Price, while we selected the following variables as covariates:

- Lot_Area: Lot size in square feet

- Total_Bsmt_SF: Total square feet of basement area

- Garage_Cars: Size of garage in car capacity

- Gr_Liv_Area: Above grade (ground) living area square feet

- Fireplaces: Number of fireplaces

Due to the skewed distribution of the dependent variable Sale_Price, we use the log-transformation:

ggplot(data = ames_sf) + geom_histogram(mapping = aes(x = Sale_Price))

ggplot(data = ames_sf) + geom_histogram(mapping = aes(x = log(Sale_Price)))

ames_sf$lnSale_Price <- log(ames_sf$Sale_Price)
ames_sf$lnLot_Area <- log(ames_sf$Lot_Area)
ames_sf$lnTotal_Bsmt_SF <- log(ames_sf$Total_Bsmt_SF+1)
ames_sf$lnGr_Liv_Area <- log(ames_sf$Gr_Liv_Area)

1.2 Constructing the spatial weights matrix

Creating spatial weights is a necessary step when using areal data. To do so, it is necessary to choose a criterion to define the neighbours, and then to assign weights to each of them.

In particular, we have used a graph-based neighbors list (a Delauney triangulation neighbor list) after eliminating duplicates in coordinates values (thus the final sf object used in the analysis is ames_sf1):

ames_sf1 <- ames_sf[duplicated(ames_sf$Longitude) == FALSE,]
coord_sf1 <- cbind(ames_sf1$Longitude, ames_sf1$Latitude)
k5nb <- knn2nb(knearneigh(coord_sf1, k = 5, 
longlat = TRUE, use_kd_tree = FALSE), sym = TRUE)
lw_ames <- nb2listw(k5nb, style = "W", 
zero.policy = FALSE)

1.3 Defining formula for parametric and semiparametric models

We define different formula for linear and nonlinear (semiparametric) models with and without a spatial trend. The Durbin formula is used for types “sdm”, “slx” or “sdem”.

In the case of semiparametric terms, in 3d or in 2d (as it is the case of spatial trend), the number of knots used to construct the B-splines basis needs to be specified.

# Linear Model
formlin <- lnSale_Price ~ lnLot_Area + lnTotal_Bsmt_SF + lnGr_Liv_Area +  Garage_Cars + Fireplaces
durbinformlin <- ~ lnLot_Area + lnTotal_Bsmt_SF + lnGr_Liv_Area + Garage_Cars + Fireplaces
# Semiparametric model without spatial trend
formgam <- lnSale_Price ~ Fireplaces + Garage_Cars + 
  pspl(lnLot_Area, nknots = 20) + 
  pspl(lnTotal_Bsmt_SF, nknots = 20) + 
  pspl(lnGr_Liv_Area, nknots = 20) 
# Semiparametric model with spatial trend in 2d
form2d <- lnSale_Price ~ Fireplaces + Garage_Cars + 
  pspl(lnLot_Area, nknots = 20) + 
  pspl(lnTotal_Bsmt_SF, nknots = 20) +
  pspl(lnGr_Liv_Area, nknots = 20) +
  pspt(Longitude,Latitude, 
       nknots = c(10, 10), 
       psanova = FALSE)
# Semiparametric model with PS-ANOVA spatial trend in 2d
form2d_psanova <- lnSale_Price ~ Fireplaces + Garage_Cars + 
  pspl(lnLot_Area, nknots = 20) + 
  pspl(lnTotal_Bsmt_SF, nknots = 20) + 
  pspl(lnGr_Liv_Area, nknots = 20) +
  pspt(Longitude, Latitude, 
       nknots = c(10, 10), 
       psanova = TRUE)
durbinformnonlin <- ~ Fireplaces + Garage_Cars + 
  pspl(lnLot_Area, nknots = 20)

1.4 Estimating parametric linear models

We first estimate standard spatial linear autoregressive models using the function pspatfit() included in the package pspatreg (based, in the default, on the REML estimators) and compare them with the results obtained using the functions provided by the package spatialreg (based on the ML estimators).

1.4.1 Spatial Lag (SAR) model. REML estimates using pspatfit()

The SAR model for cross-sectional data can be specified as: \[y_{i}=\rho \sum_{j=1}^N w_{ij,N} y_{j} + \sum_{k=1}^K \beta_k x_{k,i} + \epsilon_{i}\]

\[\epsilon_{i} \sim i.i.d.(0,\sigma^2_\epsilon)\] To estimate this model, we use the option type="sar":

linsar <- pspatfit(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev", type = "sar")
summary(linsar)
## 
##  Call 
## pspatfit(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     type = "sar", method = "Chebyshev")
## 
##  Parametric Terms 
##                   Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)     2.4596366  0.0991499 24.8073 < 2.2e-16 ***
## lnLot_Area      0.0296596  0.0071596  4.1427 3.536e-05 ***
## lnTotal_Bsmt_SF 0.0488238  0.0029233 16.7019 < 2.2e-16 ***
## lnGr_Liv_Area   0.3773085  0.0130689 28.8706 < 2.2e-16 ***
## Garage_Cars     0.0943737  0.0051695 18.2559 < 2.2e-16 ***
## Fireplaces      0.0486969  0.0058596  8.3106 < 2.2e-16 ***
## rho             0.5019890  0.0123499 40.6471 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:      7 
##  Sigma: 0.201487 
##  AIC:  -6745.73 
##  BIC:  -6704.22

All \(\beta's\) are significant and positive as expected. The estimated spatial autoregressive (0.50) is also positive and significant.

Extract coefficients:

coef(linsar)
##             rho     (Intercept)      lnLot_Area lnTotal_Bsmt_SF 
##      0.50198896      2.45963657      0.02965961      0.04882380 
##   lnGr_Liv_Area     Garage_Cars      Fireplaces 
##      0.37730846      0.09437373      0.04869691

Extract fitted values and residuals:

fits <- fitted(linsar)
plot(fits, ames_sf1$lnSale_Price)

resids <- residuals(linsar)
plot(fits, resids)

Extract log-likelihood and restricted log-likelihood:

logLik(linsar)
## 'log Lik.' 3379.864 (df=7)
logLik(linsar, REML = TRUE)
## 'log Lik.' 3348.361 (df=7)

Extract the covariance matrix of estimated coefficients. Argument bayesian allows to get frequentist (default) or bayesian covariances:

vcov(linsar)
##                   (Intercept)    lnLot_Area lnTotal_Bsmt_SF
## (Intercept)      9.830695e-03 -3.173257e-04   -3.633123e-05
## lnLot_Area      -3.173257e-04  5.125941e-05    3.099487e-07
## lnTotal_Bsmt_SF -3.633123e-05  3.099487e-07    8.545394e-06
## lnGr_Liv_Area   -9.859045e-04 -1.935085e-05   -2.716499e-06
## Garage_Cars      1.931624e-04 -4.065222e-06   -1.994411e-06
## Fireplaces       2.198995e-04 -5.845037e-06   -1.354598e-06
##                 lnGr_Liv_Area   Garage_Cars    Fireplaces
## (Intercept)     -9.859045e-04  1.931624e-04  2.198995e-04
## lnLot_Area      -1.935085e-05 -4.065222e-06 -5.845037e-06
## lnTotal_Bsmt_SF -2.716499e-06 -1.994411e-06 -1.354598e-06
## lnGr_Liv_Area    1.707974e-04 -2.592596e-05 -2.390023e-05
## Garage_Cars     -2.592596e-05  2.672358e-05 -2.711208e-06
## Fireplaces      -2.390023e-05 -2.711208e-06  3.433484e-05
vcov(linsar, bayesian = TRUE)
##                   (Intercept)    lnLot_Area lnTotal_Bsmt_SF
## (Intercept)      9.830695e-03 -3.173257e-04   -3.633123e-05
## lnLot_Area      -3.173257e-04  5.125941e-05    3.099487e-07
## lnTotal_Bsmt_SF -3.633123e-05  3.099487e-07    8.545394e-06
## lnGr_Liv_Area   -9.859045e-04 -1.935085e-05   -2.716499e-06
## Garage_Cars      1.931624e-04 -4.065222e-06   -1.994411e-06
## Fireplaces       2.198995e-04 -5.845037e-06   -1.354598e-06
##                 lnGr_Liv_Area   Garage_Cars    Fireplaces
## (Intercept)     -9.859045e-04  1.931624e-04  2.198995e-04
## lnLot_Area      -1.935085e-05 -4.065222e-06 -5.845037e-06
## lnTotal_Bsmt_SF -2.716499e-06 -1.994411e-06 -1.354598e-06
## lnGr_Liv_Area    1.707974e-04 -2.592596e-05 -2.390023e-05
## Garage_Cars     -2.592596e-05  2.672358e-05 -2.711208e-06
## Fireplaces      -2.390023e-05 -2.711208e-06  3.433484e-05

A print method to get printed coefficients, standard errors and p-values of parametric terms:

print(linsar)
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)       2.4596     0.0991 24.8073        0
## lnLot_Area        0.0297     0.0072  4.1427        0
## lnTotal_Bsmt_SF   0.0488     0.0029 16.7019        0
## lnGr_Liv_Area     0.3773     0.0131 28.8706        0
## Garage_Cars       0.0944     0.0052 18.2559        0
## Fireplaces        0.0487     0.0059  8.3106        0
## rho               0.5020     0.0123 40.6471        0

Computing average direct, indirect and total marginal impacts:

imp_parvar_sar <- impactspar(linsar, list_varpar)
summary(imp_parvar_sar)
## 
##  Total Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0594026  0.0149621  3.9701965    1e-04
## lnTotal_Bsmt_SF  0.0984129  0.0063839 15.4157296    0e+00
## lnGr_Liv_Area    0.7590813  0.0323611 23.4566321    0e+00
## Garage_Cars      0.1895732  0.0117127 16.1853039    0e+00
## Fireplaces       0.0974619  0.0120173  8.1101402    0e+00
## 
##  Direct Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0313872  0.0078361  4.0054681    1e-04
## lnTotal_Bsmt_SF  0.0520102  0.0031283 16.6257789    0e+00
## lnGr_Liv_Area    0.4011615  0.0139701 28.7158051    0e+00
## Garage_Cars      0.1001874  0.0057010 17.5735794    0e+00
## Fireplaces       0.0515082  0.0062243  8.2753239    0e+00
## 
##  Indirect Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0280154  0.0071880  3.8975412    1e-04
## lnTotal_Bsmt_SF  0.0464027  0.0035756 12.9774408    0e+00
## lnGr_Liv_Area    0.3579198  0.0213475 16.7663830    0e+00
## Garage_Cars      0.0893858  0.0066435 13.4545352    0e+00
## Fireplaces       0.0459537  0.0059825  7.6813921    0e+00

As expected, all marginal impacts are strongly significant and spillover impacts are rather high. We compare these results with those obtained using ML estimates with lagsarlm() (package spatialreg):

spatregsar <- spatialreg::lagsarlm(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev") 
summary(spatregsar)
## 
## Call:
## spatialreg::lagsarlm(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     method = "Chebyshev")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.2023877 -0.0829072  0.0089692  0.1009625  0.6366034 
## 
## Type: lag 
## Coefficients: (numerical Hessian approximate standard errors) 
##                  Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     2.4541133  0.1489302 16.4783 < 2.2e-16
## lnLot_Area      0.0296357  0.0071801  4.1275 3.668e-05
## lnTotal_Bsmt_SF 0.0488015  0.0029538 16.5218 < 2.2e-16
## lnGr_Liv_Area   0.3771303  0.0135377 27.8577 < 2.2e-16
## Garage_Cars     0.0942494  0.0057380 16.4255 < 2.2e-16
## Fireplaces      0.0486575  0.0059116  8.2309 2.220e-16
## 
## Rho: 0.50213, LR test value: 1188, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.012436
##     z-value: 40.377, p-value: < 2.22e-16
## Wald statistic: 1630.3, p-value: < 2.22e-16
## 
## Log likelihood: 827.9768 for lag model
## ML residual variance (sigma squared): 0.030598, (sigma: 0.17492)
## Number of observations: 2777 
## Number of parameters estimated: 8 
## AIC: -1640, (AIC for lm: -454)
W <- as(lw_ames, "CsparseMatrix")
trMatc <- spatialreg::trW(W, type="mult")
set.seed(1)
spatialreg::impacts(spatregsar,listw=lw_ames)
## Impact measures (lag, exact):
##                     Direct   Indirect      Total
## lnLot_Area      0.03148261 0.02804245 0.05952506
## lnTotal_Bsmt_SF 0.05184279 0.04617784 0.09802063
## lnGr_Liv_Area   0.40063291 0.35685505 0.75748796
## Garage_Cars     0.10012298 0.08918237 0.18930535
## Fireplaces      0.05168978 0.04604155 0.09773133
SAR.impact <- spatialreg::impacts(spatregsar, tr = trMatc, R=200)
list_varpar <- as.character(names(summary(linsar)$bfixed[-1]))
imp_parvar <- impactspar(linsar, list_varpar)
summary(imp_parvar)
## 
##  Total Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0596211  0.0151611  3.9325067    1e-04
## lnTotal_Bsmt_SF  0.0983107  0.0064457 15.2521203    0e+00
## lnGr_Liv_Area    0.7577776  0.0335347 22.5968356    0e+00
## Garage_Cars      0.1895866  0.0111498 17.0036564    0e+00
## Fireplaces       0.0975581  0.0119594  8.1574186    0e+00
## 
##  Direct Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0314993  0.0079766  3.9489803    1e-04
## lnTotal_Bsmt_SF  0.0519380  0.0031587 16.4427018    0e+00
## lnGr_Liv_Area    0.4003018  0.0138392 28.9252556    0e+00
## Garage_Cars      0.1001584  0.0053392 18.7591373    0e+00
## Fireplaces       0.0515439  0.0062225  8.2835367    0e+00
## 
##  Indirect Parametric Impacts (sar) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0281218  0.0072492  3.8792726    1e-04
## lnTotal_Bsmt_SF  0.0463727  0.0036211 12.8063504    0e+00
## lnGr_Liv_Area    0.3574758  0.0224361 15.9330734    0e+00
## Garage_Cars      0.0894283  0.0064982 13.7620260    0e+00
## Fireplaces       0.0460142  0.0059409  7.7452807    0e+00
# Let's compare direct impacts
round(data.frame(spatialreg_dir = summary(SAR.impact, zstats = TRUE, short = TRUE)$res$direct, 
pspatreg_dir = summary(imp_parvar_sar)$dir_table[,1]), 3)
##                 spatialreg_dir pspatreg_dir
## lnLot_Area               0.031        0.031
## lnTotal_Bsmt_SF          0.052        0.052
## lnGr_Liv_Area            0.401        0.401
## Garage_Cars              0.100        0.100
## Fireplaces               0.052        0.052
# Let's compare indirect impacts
round(data.frame(spatialreg_ind = summary(SAR.impact, zstats = TRUE, short = TRUE)$res$indirect, 
pspatreg_ind = summary(imp_parvar_sar)$ind_table[,1]),3)
##                 spatialreg_ind pspatreg_ind
## lnLot_Area               0.028        0.028
## lnTotal_Bsmt_SF          0.046        0.046
## lnGr_Liv_Area            0.357        0.358
## Garage_Cars              0.089        0.089
## Fireplaces               0.046        0.046
# Let's compare total impacts
round(data.frame(spatialreg_tot = summary(SAR.impact, zstats = TRUE, short = TRUE)$res$total, 
pspatreg_tot = summary(imp_parvar_sar)$tot_table[,1]), 3)
##                 spatialreg_tot pspatreg_tot
## lnLot_Area               0.060        0.059
## lnTotal_Bsmt_SF          0.098        0.098
## lnGr_Liv_Area            0.757        0.759
## Garage_Cars              0.189        0.190
## Fireplaces               0.098        0.097

1.4.2 Spatial Lag in X variables (SLX) model. REML estimates using pspatfit()

We now estimate the SLX model that only captures local spatial spillovers through the spatial lags of the explanatory variables:

\[y_{i}= \sum_{k=1}^K \beta_k x_{k,i} +\sum_{k=1}^K \delta_k \sum_{j=1}^N w_{ij,N}x_{k,j}+ \epsilon_{i}\] \[\epsilon_{i} \sim i.i.d.(0,\sigma^2_\epsilon)\] This model is estimated with the function pspatfit() using the option type = "slx" and specifying the set of spatial lags through Durbin = durbinformlin:

linslx <- pspatfit(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev", 
                   type = "slx", Durbin = durbinformlin)
summary(linslx)
## 
##  Call 
## pspatfit(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     type = "slx", method = "Chebyshev", Durbin = durbinformlin)
## 
##  Parametric Terms 
##                         Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           7.3856396  0.1837991 40.1832 < 2.2e-16 ***
## lnLot_Area            0.0569806  0.0120583  4.7254 2.411e-06 ***
## lnTotal_Bsmt_SF       0.0510487  0.0033685 15.1548 < 2.2e-16 ***
## lnGr_Liv_Area         0.4213139  0.0166068 25.3700 < 2.2e-16 ***
## Garage_Cars           0.0933193  0.0068160 13.6912 < 2.2e-16 ***
## Fireplaces            0.0569122  0.0069704  8.1648 4.836e-16 ***
## Wlag.lnLot_Area      -0.0296692  0.0153838 -1.9286   0.05388 .  
## Wlag.lnTotal_Bsmt_SF  0.0486076  0.0064998  7.4783 1.005e-13 ***
## Wlag.lnGr_Liv_Area    0.0054634  0.0275963  0.1980   0.84308    
## Wlag.Garage_Cars      0.2214429  0.0105664 20.9572 < 2.2e-16 ***
## Wlag.Fireplaces       0.0491468  0.0123350  3.9843 6.942e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:     11 
##  Sigma: 0.194399 
##  AIC:  -6308.59 
##  BIC:  -6243.37

Now we compute impacts for the SLX model. In this case, contrary to the case of the SAR and SDM models, we do not need simulations to make inference on these marginal impacts. We only need to properly compute the variance of total impact using this formula:

\[ Var(\hat{\beta_k})\_{tot} = Var(\hat{\beta_k}) + Var(\hat{\delta_k}) + 2* Cov(\hat{\beta_k}, \hat{\delta_k}) \]

imp_parvar_slx <- impactspar(linslx, listw = lw_ames)
summary(imp_parvar_slx)
## 
##  Parametric Impacts (slx) 
##                  Direct  Indirect   Total
## lnLot_Area      0.05698 -0.029669 0.02731
## lnTotal_Bsmt_SF 0.05105  0.048608 0.09966
## lnGr_Liv_Area   0.42131  0.005463 0.42678
## Garage_Cars     0.09332  0.221443 0.31476
## Fireplaces      0.05691  0.049147 0.10606
## 
##  Standard errors: 
##                   Direct Indirect    Total
## lnLot_Area      0.012058  0.01538 0.010220
## lnTotal_Bsmt_SF 0.003368  0.00650 0.006533
## lnGr_Liv_Area   0.016607  0.02760 0.025436
## Garage_Cars     0.006816  0.01057 0.009474
## Fireplaces      0.006970  0.01234 0.011952
## 
##  Z-values: 
##                 Direct Indirect  Total
## lnLot_Area       4.725   -1.929  2.672
## lnTotal_Bsmt_SF 15.155    7.478 15.254
## lnGr_Liv_Area   25.370    0.198 16.779
## Garage_Cars     13.691   20.957 33.224
## Fireplaces       8.165    3.984  8.873
## 
##  p-values: 
##                     Direct  Indirect      Total
## lnLot_Area       1.148e-06 2.689e-02  3.766e-03
## lnTotal_Bsmt_SF  3.520e-52 3.765e-14  7.707e-53
## lnGr_Liv_Area   2.707e-142 4.215e-01  1.746e-63
## Garage_Cars      5.729e-43 8.064e-98 2.387e-242
## Fireplaces       1.609e-16 3.384e-05  3.546e-19

We compare the non-nested models linsar and linslx using the function anova() with the argument lrtest = FALSE:

anova(linsar, linslx, lrtest = FALSE)
##        logLik rlogLik edf     AIC     BIC
## linsar 3379.9  3348.4   7 -6745.7 -6641.2
## linslx 3165.3  3112.4  11 -6308.6 -6137.6

It emerges that, from a statistical point of view, the SAR model outperforms the SLX model, suggesting a global spatial diffusion of idiosyncratic shocks. ### Spatial Durbin model (SDM). REML estimates using the function pspatfit():

The SDM specification encompasses both SAR and SLX: \[y_{i}=\rho \sum_{j=1}^N w_{ij,N} y_{j} + \sum_{k=1}^K \beta_k x_{k,i} +\sum_{k=1}^K \delta_k \sum_{j=1}^N w_{ij,N}x_{k,j}+ \epsilon_{i}\]

\[\epsilon_{i} \sim i.i.d.(0,\sigma^2_\epsilon)\]

We can estimate this model using the option type = "sdm":

linsdm <- pspatfit(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev", type = "sdm")
summary(linsdm)
## 
##  Call 
## pspatfit(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     type = "sdm", method = "Chebyshev")
## 
##  Parametric Terms 
##                         Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           3.4934849  0.1609954 21.6993 < 2.2e-16 ***
## lnLot_Area            0.0643925  0.0105622  6.0965 1.235e-09 ***
## lnTotal_Bsmt_SF       0.0469503  0.0029506 15.9123 < 2.2e-16 ***
## lnGr_Liv_Area         0.4222274  0.0145464 29.0262 < 2.2e-16 ***
## Garage_Cars           0.0756107  0.0059703 12.6644 < 2.2e-16 ***
## Fireplaces            0.0533765  0.0061056  8.7422 < 2.2e-16 ***
## Wlag.lnLot_Area      -0.0538549  0.0134752 -3.9966 6.594e-05 ***
## Wlag.lnTotal_Bsmt_SF  0.0026182  0.0056934  0.4599    0.6456    
## Wlag.lnGr_Liv_Area   -0.2247411  0.0241724 -9.2974 < 2.2e-16 ***
## Wlag.Garage_Cars      0.0710210  0.0092555  7.6734 2.305e-14 ***
## Wlag.Fireplaces       0.0010748  0.0108046  0.0995    0.9208    
## rho                   0.5316238  0.0188584 28.1902 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:     12 
##  Sigma: 0.194853 
##  AIC:  -6876.84 
##  BIC:  -6805.69

LR test for nested models and ANOVA tables:

anova(linsar, linsdm, lrtest = TRUE)
##        logLik rlogLik edf     AIC     BIC LRtest      p.val
## linsar 3379.9  3348.4   7 -6745.7 -6641.2                  
## linsdm 3450.4  3396.1  12 -6876.8 -6697.0 95.378 4.9718e-19
anova(linslx, linsdm, lrtest = TRUE)
##        logLik rlogLik edf     AIC     BIC LRtest       p.val
## linslx 3165.3  3112.4  11 -6308.6 -6137.6                   
## linsdm 3450.4  3396.1  12 -6876.8 -6697.0 567.33 2.1356e-125

The LR test suggests that the parametric SDM model outperforms both SAR and SLX.

Computing average direct and indirect marginal impacts:

imp_parvar_sdm <- impactspar(linsdm, list_varpar)
summary(imp_parvar_sdm)
## 
##  Total Parametric Impacts (sdm) 
##                  Estimate Std. Error   t value Pr(>|t|)
## lnLot_Area       0.022068   0.018519  1.191658   0.2334
## lnTotal_Bsmt_SF  0.105745   0.013309  7.945635   0.0000
## lnGr_Liv_Area    0.421338   0.052472  8.029746   0.0000
## Garage_Cars      0.314019   0.021179 14.827060   0.0000
## Fireplaces       0.116374   0.022466  5.180063   0.0000
## 
##  Direct Parametric Impacts (sdm) 
##                   Estimate Std. Error    t value Pr(>|t|)
## lnLot_Area       0.0617276  0.0099319  6.2150841        0
## lnTotal_Bsmt_SF  0.0508324  0.0031026 16.3840443        0
## lnGr_Liv_Area    0.4221158  0.0148036 28.5144754        0
## Garage_Cars      0.0910496  0.0059560 15.2870027        0
## Fireplaces       0.0575452  0.0061199  9.4029178        0
## 
##  Indirect Parametric Impacts (sdm) 
##                    Estimate  Std. Error     t value Pr(>|t|)
## lnLot_Area      -0.03965985  0.01989172 -1.99378656   0.0462
## lnTotal_Bsmt_SF  0.05491243  0.01204278  4.55978152   0.0000
## lnGr_Liv_Area   -0.00077776  0.04820256 -0.01613520   0.9871
## Garage_Cars      0.22296923  0.01964790 11.34825000   0.0000
## Fireplaces       0.05882876  0.02080105  2.82816328   0.0047

1.4.3 Spatial Error model (SEM). REML estimates using pspatfit()

The last parametric specification considered here is the SEM. This model that spatial spillovers occurs only for the unobserved random shocks:

\[y_{i}=\sum_{k=1}^K \beta_k x_{k,i} + \epsilon_{i}\] \[\epsilon_{i}=\theta \sum_{j=1}^N w_{ij,N}\epsilon_{j}+u_{i}\] \[u_{i} \sim i.i.d.(0,\sigma^2_u)\] We estimate this model using the option type = "sem":

linsem <- pspatfit(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev", type = "sem")
summary(linsem)
## 
##  Call 
## pspatfit(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     type = "sem", method = "Chebyshev")
## 
##  Parametric Terms 
##                   Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)     7.6430151  0.1236671 61.8032 < 2.2e-16 ***
## lnLot_Area      0.0694715  0.0103835  6.6906  2.68e-11 ***
## lnTotal_Bsmt_SF 0.0462867  0.0029934 15.4631 < 2.2e-16 ***
## lnGr_Liv_Area   0.4490360  0.0147628 30.4168 < 2.2e-16 ***
## Garage_Cars     0.0806123  0.0060501 13.3240 < 2.2e-16 ***
## Fireplaces      0.0554595  0.0062206  8.9155 < 2.2e-16 ***
## delta           0.7191707  0.0133720 53.7818 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:      7 
##  Sigma: 0.250465 
##  AIC:  -6557.27 
##  BIC:  -6515.76
anova(linsem,linsdm, lrtest = TRUE)
##        logLik rlogLik edf     AIC     BIC LRtest      p.val
## linsem 3285.6  3256.3   7 -6557.3 -6457.1                  
## linsdm 3450.4  3396.1  12 -6876.8 -6697.0 279.51 2.5309e-58

The spatial spillover parameter \(\delta\) is rather high (0.72) and statistically significant. As well known, the SEM is also nested in the SDM, so we can use a LR test to compare the two models. The results suggest again that the SDM is the best parametric specification.

Comparing the results with those obtained using ML estimates with errorsarlm() function of package spatialreg:

spatregsem <- spatialreg::errorsarlm(formlin, data = ames_sf1, listw = lw_ames, method = "Chebyshev") 
summary(spatregsem)
## 
## Call:
## spatialreg::errorsarlm(formula = formlin, data = ames_sf1, listw = lw_ames, 
##     method = "Chebyshev")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.133698 -0.075264  0.012563  0.092108  0.739429 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                  Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     7.6503755  0.1233360 62.0287 < 2.2e-16
## lnLot_Area      0.0695242  0.0103557  6.7136 1.899e-11
## lnTotal_Bsmt_SF 0.0461694  0.0029854 15.4652 < 2.2e-16
## lnGr_Liv_Area   0.4482484  0.0147233 30.4448 < 2.2e-16
## Garage_Cars     0.0799139  0.0060340 13.2440 < 2.2e-16
## Fireplaces      0.0552629  0.0062040  8.9077 < 2.2e-16
## 
## Lambda: 0.71915, LR test value: 999.53, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.013838
##     z-value: 51.968, p-value: < 2.22e-16
## Wald statistic: 2700.7, p-value: < 2.22e-16
## 
## Log likelihood: 733.7619 for error model
## ML residual variance (sigma squared): 0.030231, (sigma: 0.17387)
## Number of observations: 2777 
## Number of parameters estimated: 8 
## AIC: NA (not available for weighted model), (AIC for lm: -454)
spatialreg::Hausman.test(spatregsem)# Test OLS vs. SEM
## 
##  Spatial Hausman test (approximate)
## 
## data:  NULL
## Hausman test = 213.49, df = 6, p-value < 2.2e-16
round(data.frame(linsem = summary(linsem)$bfixed, spatregsem = summary(spatregsem)$coefficients), 3)
##                 linsem spatregsem
## (Intercept)      7.643      7.650
## lnLot_Area       0.069      0.070
## lnTotal_Bsmt_SF  0.046      0.046
## lnGr_Liv_Area    0.449      0.448
## Garage_Cars      0.081      0.080
## Fireplaces       0.055      0.055

1.5 Estimating semiparametric nonlinear models with and without a spatial trend

We now provide examples of the estimation of semiparametric models. Let’s start with a simple semiparametric model without spatial trends and without spatially lagged terms: \[y_{i}=\sum_{k=1}^K \beta^*_k x^*_{k,it} +\sum_{\delta=1}^\Delta g_\delta(x_{\delta, it}) + \epsilon_{i}\]

\[\epsilon_{i}\sim i.i.d.(0,\sigma^2_\epsilon)\] In particular, we introduce the discrete variables Fireplaces and Garage_Cars as linear terms and the continuous variables lnLot_Area, lnTotal_Bsmt_SF, and lnGr_Liv_Area as smooth terms, using the function pspl() with 20 knots:

formgam <- lnSale_Price ~ Fireplaces + Garage_Cars +
  pspl(lnLot_Area, nknots = 20) + 
  pspl(lnTotal_Bsmt_SF, nknots = 20) +
  pspl(lnGr_Liv_Area, nknots = 20) 
gam <- pspatfit(formgam, data = ames_sf1)
summary(gam)
## 
##  Call 
## pspatfit(formula = formgam, data = ames_sf1)
## 
##  Parametric Terms 
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           11.3491400  0.2539552 44.6895   <2e-16 ***
## Fireplaces             0.0709486  0.0069692 10.1803   <2e-16 ***
## Garage_Cars            0.1578024  0.0063802 24.7332   <2e-16 ***
## pspl(lnLot_Area)      -0.2743330  0.2469516 -1.1109   0.2667    
## pspl(lnTotal_Bsmt_SF) -0.8790673  0.8920559 -0.9854   0.3245    
## pspl(lnGr_Liv_Area)   -0.7531665  0.5573690 -1.3513   0.1767    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Non-Parametric Terms 
##                            EDF
## pspl(lnLot_Area)      11.4625
## pspl(lnTotal_Bsmt_SF)  6.5728
## pspl(lnGr_Liv_Area)   13.0650
## 
##  Goodness-of-Fit 
##  
##  EDF Total: 37.1002 
##  Sigma: 0.203045 
##  AIC:  -5870.54 
##  BIC:  -5650.57

The EDF numbers clearly suggest that the three continuout variables enter the model nonlinearly.

Now, we introduce the spatial lag of the dependent variable, thus specifying a semiparametric SAR model: \[y_{i}=\rho \sum_{j=1}^N w_{ij,N} y_{j} +\sum_{k=1}^K \beta^*_k x^*_{k,i} + \sum_{\delta=1}^\Delta g_\delta(x_{\delta, i}) +\epsilon_{i}\] \[\epsilon_{i}\sim i.i.d.(0,\sigma^2_\epsilon)\]

gamsar <- pspatfit(formgam, data = ames_sf1, listw = lw_ames, method = "Chebyshev", type = "sar")
summary(gamsar)
## 
##  Call 
## pspatfit(formula = formgam, data = ames_sf1, listw = lw_ames, 
##     type = "sar", method = "Chebyshev")
## 
##  Parametric Terms 
##                          Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)            5.9827860  0.0995430 60.1025 < 2.2e-16 ***
## Fireplaces             0.0469770  0.0056778  8.2739 < 2.2e-16 ***
## Garage_Cars            0.0857851  0.0051871 16.5383 < 2.2e-16 ***
## pspl(lnLot_Area)      -0.3164999  0.1661106 -1.9054  0.056838 .  
## pspl(lnTotal_Bsmt_SF) -1.0126845  0.3090136 -3.2772  0.001062 ** 
## pspl(lnGr_Liv_Area)   -0.2336545  0.3900992 -0.5990  0.549248    
## rho                    0.4643711  0.0126061 36.8369 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Non-Parametric Terms 
##                            EDF
## pspl(lnLot_Area)       7.8271
## pspl(lnTotal_Bsmt_SF)  4.6303
## pspl(lnGr_Liv_Area)   12.8040
## 
##  Goodness-of-Fit 
##  
##  EDF Total: 32.2614 
##  Sigma: 0.185572 
##  AIC:  -6916.85 
##  BIC:  -6725.57
anova(linsar, gamsar, lrtest = TRUE)
##        logLik rlogLik    edf     AIC     BIC LRtest      p.val
## linsar 3379.9  3348.4  7.000 -6745.7 -6641.2                  
## gamsar 3490.7  3473.9 32.261 -6916.8 -6692.4 251.06 4.5467e-39

The spatial spillover parameter is now 0.46, a bit lower than the one estimated with the linear SAR (0.50) and SDM (0.53), confirming the trade off between nonlinearities and spatial dependence (Basile et al. 2014). The log-likelihood of the semiparametric SAR is higher than that of the linear SAR, and the LR test also suggests that this difference is statistically significant (notice that the linear SAR model is nested in the semiparametric SAR). Moreover, the AIC value of the semiparametric model is lower than that of the linear SAR, confirming that the goodness of fit of the semiparametric model is higher that that of the linear model. However, the BIC value works in favor of the linear specification. This is because the BIC penalizes more strongly more complex models than the AIC.

Let’s now introduce also a spatial trend 2d (without the ANOVA decomposition) in order to control for unobserved spatial heterogeneity:

\[y_{i}=\rho \sum_{j=1}^N w_{ij,N} y_{j}+ \sum_{k=1}^K \beta^*_k x^*_{k,i} + \sum_{\delta=1}^\Delta g_\delta(x_{\delta, i}) + \widetilde{ f}(s_{1i},s_{2i})+\epsilon_{i}\]

\[\epsilon_{i}\sim i.i.d.(0,\sigma^2_\epsilon)\] To speed up the computational time, we compute the spatial Jacobian using the Chebyshev transformation.

sp2dsar <- pspatfit(form2d, data = ames_sf1, listw = lw_ames, method = "Chebyshev", type = "sar")
summary(sp2dsar)
## 
##  Call 
## pspatfit(formula = form2d, data = ames_sf1, listw = lw_ames, 
##     type = "sar", method = "Chebyshev")
## 
##  Parametric Terms 
##                          Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)            8.1647437  0.1417285 57.6083 < 2.2e-16 ***
## Fireplaces             0.0555330  0.0058079  9.5617 < 2.2e-16 ***
## Garage_Cars            0.0652676  0.0056843 11.4820 < 2.2e-16 ***
## Xspt.2                -1.4681131  2.0524543 -0.7153  0.474488    
## Xspt.3                -1.0131127  2.0552400 -0.4929  0.622094    
## Xspt.4                -1.4228087  2.7716161 -0.5133  0.607749    
## pspl(lnLot_Area)      -0.5300118  0.1643349 -3.2252  0.001274 ** 
## pspl(lnTotal_Bsmt_SF) -1.0325820  0.2218682 -4.6540 3.412e-06 ***
## pspl(lnGr_Liv_Area)   -0.4111994  0.3758808 -1.0940  0.274069    
## rho                    0.2890200  0.0177353 16.2963 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Non-Parametric Terms 
##                            EDF
## pspl(lnLot_Area)       6.4063
## pspl(lnTotal_Bsmt_SF)  3.4493
## pspl(lnGr_Liv_Area)   13.3106
## 
##  Non-Parametric Spatio-Temporal Trend 
##                EDF
## f(sp1, sp2) 41.106
## 
##  Goodness-of-Fit 
##  
##  EDF Total: 74.2717 
##  Sigma: 0.187716 
##  AIC:  -7009.96 
##  BIC:  -6569.6
anova(gamsar, sp2dsar, lrtest=TRUE)
##         logLik rlogLik    edf     AIC     BIC LRtest      p.val
## gamsar  3490.7  3473.9 32.261 -6916.8 -6692.4                  
## sp2dsar 3579.3  3564.9 74.272 -7010.0 -6542.9 182.04 2.3831e-19

The estimated spatial spillover parameter \(\rho\) (0.29) is much lower than the one estimated above, suggesting that the SAR model without spatial trend (both linear and nonlinear) actually captures spatial autocorrelated unobserved heterogeneity.

The marginal (direct, indirect and total) impacts for parametric terms are computed as usual with the function impactspar():

list_varpar <- c("Fireplaces", "Garage_Cars")
imp_parvar <- impactspar(sp2dsar, list_varpar)
summary(imp_parvar)
## 
##  Total Parametric Impacts (sar) 
##               Estimate Std. Error    t value Pr(>|t|)
## Fireplaces   0.0781327  0.0085805  9.1058616        0
## Garage_Cars  0.0914568  0.0084322 10.8461620        0
## 
##  Direct Parametric Impacts (sar) 
##               Estimate Std. Error    t value Pr(>|t|)
## Fireplaces   0.0564928  0.0060112  9.3978779        0
## Garage_Cars  0.0661306  0.0058588 11.2873770        0
## 
##  Indirect Parametric Impacts (sar) 
##              Estimate Std. Error   t value Pr(>|t|)
## Fireplaces  0.0216399  0.0029914 7.2339720        0
## Garage_Cars 0.0253262  0.0031389 8.0684257        0

As for the three non-parametric terms, we can plot the estimated smooth impact functions using the algorithms described in the vignette A:

list_varnopar <- c("lnLot_Area", "lnTotal_Bsmt_SF", 
                   "lnGr_Liv_Area")
sp2dsar_impnopar <- impactsnopar(sp2dsar, listw = lw_ames, viewplot = TRUE,smooth = FALSE)
plot_impactsnopar(sp2dsar_impnopar, data = ames_sf1, smooth = TRUE)

Now, an example with the ANOVA decomposition of the spatial trend (PS-ANOVA):

\[y_{i}=\rho \sum_{j=1}^N w_{ij,N} y_{j}+ \sum_{k=1}^K \beta^*_k x^*_{k,i} + \sum_{\delta=1}^\Delta g_\delta(x_{\delta, i}) + f_1(s_{1i})+f_2(s_{2i})+f_{1,2}(s_{1i},s_{2i})+\epsilon_{i}\]

\[\epsilon_{i}\sim i.i.d.(0,\sigma^2_\epsilon)\] This model is estimated using the option psanova = TRUE within the function pspt() for the spatial trend:

# Semiparametric model with PS-ANOVA spatial trend in 2d

  form2d_psanova <- lnSale_Price ~ Fireplaces + Garage_Cars +
    pspl(lnLot_Area, nknots = 20) + 
    pspl(lnTotal_Bsmt_SF, nknots = 20) +
    pspl(lnGr_Liv_Area, nknots = 20) +
  pspt(Longitude,Latitude, nknots = c(10, 10), psanova = TRUE)
sp2danovasar <- pspatfit(form2d_psanova, data = ames_sf1, listw = lw_ames, 
                         method = "Chebyshev", type = "sar")
summary(sp2danovasar)
## 
##  Call 
## pspatfit(formula = form2d_psanova, data = ames_sf1, listw = lw_ames, 
##     type = "sar", method = "Chebyshev")
## 
##  Parametric Terms 
##                          Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)            8.4390005  0.1678775 50.2688 < 2.2e-16 ***
## Fireplaces             0.0560447  0.0058059  9.6531 < 2.2e-16 ***
## Garage_Cars            0.0648201  0.0056895 11.3929 < 2.2e-16 ***
## f1_main.1             -0.1495530  0.1170157 -1.2781  0.201339    
## f2_main.1             -0.0123747  0.1736585 -0.0713  0.943197    
## f12_int.1             -0.0873963  0.1578479 -0.5537  0.579848    
## pspl(lnLot_Area)      -0.5363254  0.1785046 -3.0045  0.002684 ** 
## pspl(lnTotal_Bsmt_SF) -0.8975952  0.3640286 -2.4657  0.013735 *  
## pspl(lnGr_Liv_Area)   -0.5223081  0.3965219 -1.3172  0.187876    
## rho                    0.2704922  0.0182464 14.8244 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Non-Parametric Terms 
##                            EDF
## pspl(lnLot_Area)       8.7545
## pspl(lnTotal_Bsmt_SF)  4.9656
## pspl(lnGr_Liv_Area)   12.9427
## 
##  Non-Parametric Spatio-Temporal Trend 
##        EDF
## f1   3.228
## f2   6.987
## f12 36.790
## 
##  Goodness-of-Fit 
##  
##  EDF Total: 83.6677 
##  Sigma: 0.186914 
##  AIC:  -6959.98 
##  BIC:  -6463.9
anova(sp2dsar, sp2danovasar, lrtest=FALSE)
##              logLik rlogLik    edf   AIC     BIC
## sp2dsar      3579.3  3564.9 74.272 -7010 -6542.9
## sp2danovasar 3563.7  3542.6 83.668 -6960 -6424.3

Plot of non-parametric direct, indirect and total impacts:

sp2danovasarimpnopar <- impactsnopar(sp2danovasar, listw = lw_ames, viewplot = FALSE)
plot_impactsnopar(sp2danovasarimpnopar, data = ames_sf1, smooth = TRUE)

Parametric direct, indirect and total impacts:

list_varpar <- as.character(names(summary(sp2danovasar)$bfixed[1]))
imp_parvar <- impactspar(sp2danovasar, list_varpar)
summary(imp_parvar)
## 
##  Total Parametric Impacts (sar) 
##               Estimate Std. Error    t value Pr(>|t|)
## Fireplaces   0.0771730  0.0082109  9.3988323        0
## Garage_Cars  0.0888643  0.0079863 11.1270727        0
## 
##  Direct Parametric Impacts (sar) 
##               Estimate Std. Error    t value Pr(>|t|)
## Fireplaces   0.0570234  0.0058666  9.7200313        0
## Garage_Cars  0.0656660  0.0056736 11.5739292        0
## 
##  Indirect Parametric Impacts (sar) 
##              Estimate Std. Error   t value Pr(>|t|)
## Fireplaces  0.0201496  0.0027949 7.2093673        0
## Garage_Cars 0.0231983  0.0029039 7.9886944        0

References

Basile, Roberto, María Durbán, Román Mínguez, Jose María Montero, and Jesús Mur. 2014. “Modeling Regional Economic Dynamics: Spatial Dependence, Spatial Heterogeneity and Nonlinearities.” Journal of Economic Dynamics and Control 48: 229–45. https://doi.org/10.1016/j.jedc.2014.06.011.