The importance of using cluster-robust variance estimators (i.e., “clustered standard errors”) in panel models is now widely recognized. Less widely recognized is the fact that standard methods for constructing hypothesis tests and confidence intervals based on CRVE can perform quite poorly in when based on a limited number of independent clusters. Furthermore, it can be difficult to determine what counts as a large-enough sample to trust standard CRVE methods, because the finite-sample behavior of the variance estimators and test statistics depends on the configuration of the covariates, not just the total number of clusters.
One solution to this problem is to use bias-reduced linearization (BRL), which was proposed by Bell and McCaffrey (2002) and has recently begun to receive attention in the econometrics literature (e.g., Cameron & Miller, 2015; Imbens & Kolesar, 2015). The idea of BRL is to correct the bias of standard CRVE based on a working model, and then to use a degrees-of-freedom correction for Wald tests based on the bias-reduced CRVE. That may seem silly (after all, the whole point of CRVE is to avoid making distributional assumptions about the errors in your model), but it turns out that the correction can help quite a bit, even when the working model is wrong. The degrees-of-freedom correction is based on a standard Satterthwaite-type approximation, and also relies on the working model.
A problem with Bell and McCaffrey’s original formulation of BRL is
that it does not work in some very common models for panel data, such as
state-by-year panels that include fixed effects for each state and each
year (Angrist and Pischke, 2009, point out this issue in their chapter
on “non-standard standard error issues”; see also Young, 2016). However,
Pustejovsky and Tipton (2016) proposed a generalization of BRL that
works even in models with arbitrary sets of fixed effects, and this
generalization is implemented in clubSandwich as CRVE type
CR2. The package also implements small-sample corrections
for multiple-constraint hypothesis tests based on an approximation
proposed by Pustejovsky and Tipton (2016). For one-parameter
constraints, the test reduces to a t-test with Satterthwaite degrees of
freedom, and so it is a natural extension of BRL.
The following example demonstrates how to use
clubSandwich to do cluster-robust inference for a
state-by-year panel model with fixed effects in both dimensions,
clustering by states.
Carpenter and Dobkin (2011) analyzed the effects of changes in the minimum legal drinking age on rates of motor vehicle fatalities among 18-20 year olds, using state-level panel data from the National Highway Traffic Administration’s Fatal Accident Reporting System. In their new textbook, Angrist and Pischke (2014) developed a stylized example based on Carpenter and Dobkin’s work. The following example uses Angrist and Pischke’s data and follows their analysis because their data are easily available.
The outcome is the incidence of deaths in motor vehicle crashes among 18-20 year-olds (per 100,000 residents), for each state plus the District of Columbia, over the period 1970 to 1983. There were several changes in the minimum legal drinking age during this time period, with variability in the timing of changes across states. Angrist and Pischke (following Carpenter and Dobkin) use a difference-in-differences strategy to estimate the effects of lowering the minimum legal drinking age from 21 to 18. Their specification is
\[y_{it} = \alpha_i + \beta_t + \gamma b_{it} + \delta d_{it} + \epsilon_{it},\]
for \(i\) = 1,…,51 and \(t\) = 1970,…,1983. In this model, \(\alpha_i\) is a state-specific fixed effect, \(\beta_t\) is a year-specific fixed effect, \(b_{it}\) is the current rate of beer taxation in state \(i\) in year \(t\), \(d_{it}\) is the proportion of 18-20 year-olds in state \(i\) in year \(t\) who are legally allowed to drink, and \(\delta\) captures the effect of shifting the minimum legal drinking age from 21 to 18. Following Angrist and Pischke’s analysis, we estimate this model both by (unweighted) OLS and by weighted least squares with weights corresponding to population size in a given state and year. We also demonstrate random effects estimation and implement a cluster-robust Hausman specification test.
The following code does some simple data-munging and the estimates the model by OLS:
library(clubSandwich)
data(MortalityRates)
# subset for deaths in motor vehicle accidents, 1970-1983
MV_deaths <- subset(MortalityRates, cause=="Motor Vehicle" & 
                      year <= 1983 & !is.na(beertaxa))
# fit by OLS
lm_unweighted <- lm(mrate ~ 0 + legal + beertaxa + 
                      factor(state) + factor(year), data = MV_deaths)The coef_test function from clubSandwich
can then be used to test the hypothesis that changing the minimum legal
drinking age has no effect on motor vehicle deaths in this cohort (i.e.,
\(H_0: \delta = 0\)). The usual way to
test this is to cluster the standard errors by state, calculate the
robust Wald statistic, and compare that to a standard normal reference
distribution. The code and results are as follows:
coef_test(lm_unweighted, vcov = "CR1", 
          cluster = MV_deaths$state, test = "naive-t")[1:2,]##     Coef. Estimate   SE t-stat d.f. (naive-t) p-val (naive-t) Sig.
##     legal     7.59 2.44  3.108             49         0.00313   **
##  beertaxa     3.82 5.14  0.743             49         0.46128A better approach would be to use the generalized, bias-reduced
linearization CRVE, together with Satterthwaite degrees of freedom. In
the clubSandwich package, the BRL adjustment is called
“CR2” because it is directly analogous to the HC2 correction used in
heteroskedasticity-robust variance estimation. When applied to an OLS
model estimated by lm, the default working model is an
identity matrix, which amounts to the “working” assumption that the
errors are all uncorrelated and homoskedastic. Here’s how to apply this
approach in the example:
coef_test(lm_unweighted, vcov = "CR2", 
          cluster = MV_deaths$state, test = "Satterthwaite")[1:2,]##     Coef. Estimate   SE t-stat d.f. (Satt) p-val (Satt) Sig.
##     legal     7.59 2.51  3.019       24.58      0.00583   **
##  beertaxa     3.82 5.27  0.725        5.77      0.49663The Satterthwaite degrees of freedom are different for each
coefficient in the model, and so the coef_test function
reports them right alongside the standard error. For the effect of legal
drinking age, the degrees of freedom are about half of what might be
expected, given that there are 51 clusters. The p-value for the
CR2+Satterthwaite test is about twice as large as the p-value based on
the standard Wald test, although the coefficient is still statistically
significant at conventional levels. Note, however, that the degrees of
freedom on the beer taxation rate are considerably smaller because there
are only a few states with substantial variability in taxation rates
over time.
The plm package in R provides another way to estimate
the same model. It is convenient because it absorbs the state and year
fixed effects before estimating the effect of legal. The
clubSandwich package works with fitted plm
models too:
library(plm)
plm_unweighted <- plm(mrate ~ legal + beertaxa, data = MV_deaths, 
                      effect = "twoways", index = c("state","year"))
coef_test(plm_unweighted, vcov = "CR1", cluster = "individual", test = "naive-t")##     Coef. Estimate   SE t-stat d.f. (naive-t) p-val (naive-t) Sig.
##     legal     7.59 2.44  3.108             49         0.00313   **
##  beertaxa     3.82 5.14  0.743             49         0.46128coef_test(plm_unweighted, vcov = "CR2", cluster = "individual", test = "Satterthwaite")##     Coef. Estimate   SE t-stat d.f. (Satt) p-val (Satt) Sig.
##     legal     7.59 2.51  3.019       24.58      0.00583   **
##  beertaxa     3.82 5.27  0.725        5.77      0.49663The difference between the standard method and the new method are not
terribly exciting in the above example. However, things change quite a
bit if the model is estimated using population weights. We go back to
fitting in lm with dummies for all the fixed effects
because plm does not handle weighted least squares.
lm_weighted <- lm(mrate ~ 0 + legal + beertaxa + factor(state) + factor(year), 
                  weights = pop, data = MV_deaths)
coef_test(lm_weighted, vcov = "CR1", 
          cluster = MV_deaths$state, test = "naive-t")[1:2,]##     Coef. Estimate   SE t-stat d.f. (naive-t) p-val (naive-t) Sig.
##     legal     7.78 2.01   3.87             49          <0.001  ***
##  beertaxa    11.16 4.20   2.66             49          0.0106    *coef_test(lm_weighted, vcov = "CR2", 
          cluster = MV_deaths$state, test = "Satterthwaite")[1:2,]##     Coef. Estimate   SE t-stat d.f. (Satt) p-val (Satt) Sig.
##     legal     7.78 2.13   3.64        8.52      0.00588   **
##  beertaxa    11.16 4.37   2.55        6.85      0.03854    *Using population weights slightly reduces the point estimate of the effect, while also slightly increasing its precision. If you were following the standard approach, you would probably be happy with the weighted estimates and wouldn’t think about it any further. However, using the CR2 variance estimator and Satterthwaite correction produces a p-value that is an order of magnitude larger (though still significant at the conventional 5% level). The degrees of freedom are just 8.5—drastically smaller than would be expected based on the number of clusters.
Even with weights, the coef_test function uses an
“independent, homoskedastic” working model as a default for
lm objects. In the present example, the outcome is a
standardized rate and so a better assumption might be that the error
variances are inversely proportional to population size. The following
code uses this alternate working model:
coef_test(lm_weighted, vcov = "CR2", 
          cluster = MV_deaths$state, target = 1 / MV_deaths$pop, 
          test = "Satterthwaite")[1:2,]##     Coef. Estimate   SE t-stat d.f. (Satt) p-val (Satt) Sig.
##     legal     7.78 2.03   3.83       12.64      0.00221   **
##  beertaxa    11.16 4.17   2.68        5.06      0.04333    *The new working model leads to slightly smaller standard errors and a couple of additional degrees of freedom, though they remain in small-sample territory.
If the unobserved effects \(\alpha_1,...,\alpha_{51}\) are uncorrelated with the regressors, then a more efficient way to estimate \(\gamma,\delta\) is by weighted least squares, with weights based on a random effects model. We still treat the year effects as fixed.
plm_random <- plm(mrate ~ 0 + legal + beertaxa + year, data = MV_deaths, 
                  effect = "individual", index = c("state","year"),
                  model = "random")
coef_test(plm_random, vcov = "CR1", test = "naive-t")[1:2,]##     Coef. Estimate   SE t-stat d.f. (naive-t) p-val (naive-t) Sig.
##     legal     7.31 2.39  3.054             49         0.00364   **
##  beertaxa     3.37 5.11  0.661             49         0.51202coef_test(plm_random, vcov = "CR2", test = "Satterthwaite")[1:2,]##     Coef. Estimate   SE t-stat d.f. (Satt) p-val (Satt) Sig.
##     legal     7.31 2.46  2.966       25.18      0.00652   **
##  beertaxa     3.37 5.22  0.647        5.78      0.54258With random effects estimation, the effect of legal drinking age is
smaller by about 1 death per 100,000. As a procedural aside, note that
coef_test infers that state is the clustering
variable because the call to plm includes only one type of effects
(random state effects).
CRVE is also used in specification tests, as in the artificial Hausman-type test for endogeneity of unobserved effects (Arellano, 1993). As noted above, random effects estimation is more efficient than fixed effects estimation, but requires the assumption that the unobserved effects are uncorrelated with the regressors. However, if the unobserved effects covary with \(\mathbf{b}_i, \mathbf{d}_i\), then the random-effects estimator will be biased.
We can test for whether endogeneity is a problem by including group-centered covariates as additional regressors. Let \(\tilde{d}_{it} = d_{it} - \frac{1}{T}\sum_t d_{it}\), with \(\tilde{b}_{it}\) defined analogously. Now estimate the regression
\[y_{it} = \beta_t + \gamma_1 b_{it} + \gamma_2 \tilde{b}_{it} + \delta_1 d_{it} + \delta_2 \tilde{d}_{it} + \epsilon_{it},\]
which does not include state fixed effects. The parameters \(\gamma_2,\delta_2\) represent the differences between the within-groups and between-groups estimands of \(\gamma_1, \delta_1\). If these are both zero, then the random effects estimator is unbiased. Thus, the joint test for \(H_0: \gamma_2 = \delta_2 = 0\) amounts to a test for exogeneity of the unobserved effects.
For efficiency, we estimate this specification using weighted least squares (although OLS would be valid too):
MV_deaths <- within(MV_deaths, {
  legal_cent <- legal - tapply(legal, state, mean)[factor(state)]
  beer_cent <- beertaxa - tapply(beertaxa, state, mean)[factor(state)]
})
plm_Hausman <- plm(mrate ~ 0 + legal + beertaxa + legal_cent + beer_cent + factor(year), 
                   data = MV_deaths,
                   effect = "individual", index = c("state","year"),
                   model = "random")
coef_test(plm_Hausman, vcov = "CR2", test = "Satterthwaite")[1:4,]##       Coef. Estimate   SE  t-stat d.f. (Satt) p-val (Satt) Sig.
##       legal   -9.180 7.62 -1.2042       24.94       0.2398     
##    beertaxa    3.395 9.40  0.3613        6.44       0.7295     
##  legal_cent   16.768 8.53  1.9665       25.44       0.0602    .
##   beer_cent    0.424 9.25  0.0458        6.42       0.9648To conduct a joint test on the centered covariates, we can use the
Wald_test function. The usual way to test this hypothesis
would be to use the CR1 variance estimator to calculate the
robust Wald statistic, then use a \(\chi^2_2\) reference distribution (or
equivalently, compare a re-scaled Wald statistic to an \(F(2,\infty)\) distribution). The
Wald_test function reports the latter version:
Wald_test(plm_Hausman, 
          constraints = constrain_zero(c("legal_cent","beer_cent")), 
          vcov = "CR1", test = "chi-sq")##    test Fstat df_num df_denom  p_val sig
##  chi-sq  2.93      2      Inf 0.0534   .The test is just shy of significance at the 5% level. If we instead
use the CR2 variance estimator and our newly proposed
approximate F-test (which is the default in Wald_test),
then we get:
Wald_test(plm_Hausman, 
          constraints = constrain_zero(c("legal_cent","beer_cent")), 
          vcov = "CR2")##  test Fstat df_num df_denom p_val sig
##   HTZ  2.56      2     11.7  0.12The low degrees of freedom of the test indicate that we’re definitely in small-sample territory and should not trust the asymptotic \(\chi^2\) approximation.
Angrist, J. D., & Pischke, J. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press.
Angrist, J. D., and Pischke, J. S. (2014). Mastering’metrics: the path from cause to effect. Princeton, NJ: Princeton University Press.
Arellano, M. (1993). On the testing of correlated effects with panel data. Journal of Econometrics, 59(1-2), 87-97. doi: 10.1016/0304-4076(93)90040-C
Bell, R. M., & McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2), 169-181.
Cameron, A. C., & Miller, D. L. (2015). A practitioner’s guide to cluster-robust inference. URL: http://cameron.econ.ucdavis.edu/research/Cameron_Miller_JHR_2015_February.pdf
Carpenter, C., & Dobkin, C. (2011). The minimum legal drinking age and public health. Journal of Economic Perspectives, 25(2), 133-156. doi: 10.1257/jep.25.2.133
Imbens, G. W., & Kolesar, M. (2015). Robust standard errors in small samples: Some practical advice. URL: https://doi.org/10.1162/REST_a_00552
Pustejovsky, J. E. & Tipton, E. (2016). Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. arXiv: 1601.01981 [stat.ME]
Young, A. (2016). Improved, nearly exact, statistical inference with robust and clustered covariance matrices using effective degrees of freedom corrections.