geex
In the basic set-up, M-estimation applies to estimators of the p×1 parameter θ=(θ1,θ2,…,θp)⊺ which can be obtained as solutions to an equation of the form
\begin{equation} \label{eq:psi} \sum_{i = 1}^m \psi(O_i, \theta) = 0, \end{equation}
where O_1, \ldots, O_m are m independent and identically distributed (iid) random variables, and the function \psi returns a vector of length p and does not depend on i or m (Stefanski and Boos 2002, hereafter SB). See SB for the case where the O_i are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by \hat \theta. M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of \hat{\theta} can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let \theta_0 be the true parameter value defined by \int \psi(o, \theta_0) dF(o) = 0, where F is the distribution function of O. Let \dot{\psi}(o, \theta) = {\partial \psi(O_i, \theta)/\partial \theta}^{\intercal}, A(\theta_0) = E[-\dot{\psi}(O_1, \theta_0)], and B(\theta_0) = E[\psi(O_1, \theta_0)\psi(O_1, \theta_0)^{\intercal}]. Then under suitable regularity assumptions, \hat{\theta} is consistent and asymptotically Normal, i.e.,
\begin{equation*} \label{eq:variance} \sqrt{m}(\hat{\theta} - \theta_0) \overset{d}{\to} N(0, V(\theta_0)) \text{ as } m \to \infty, \end{equation*}
where V(\theta_0) = A(\theta_0)^{-1} B(\theta_0) \{A(\theta_0)^{-1} \}^{\intercal}. The sandwich form of V(\theta_0) suggests several possible large sample variance estimators. For some problems, the analytic form of V(\theta_0) can be derived and estimators of \theta_0 and other unknowns simply plugged into V(\theta_0). Alternatively, V(\theta_0) can be consistently estimated by the empirical sandwich variance estimator, where the expectations in A(\theta) and B(\theta) are replaced with their empirical counterparts. Let A_i = - \dot{\psi}(O_i, \theta)|_{\theta = \hat{\theta}}, A_m = m^{-1} \sum_{i = 1}^m A_i, B_i = \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^{\intercal}, and B_m = m^{-1} \sum_{i = 1}^m B_i. The empirical sandwich estimator of the variance of \hat{\theta} is:
\begin{equation} \label{eq:esve} \hat{\Sigma} = A_m^{-1} B_m \{A_m^{-1}\}^{\intercal}/m . \end{equation}
The geex
package provides an application programming
interface (API) for carrying out M-estimation. The analyst provides a
function, called estFUN
, corresponding to \psi(O_i, \theta) that maps data O_i to a function of \theta. Numerical derivatives approximate
\dot{\psi} so that evaluating \hat{\Sigma} is entirely a computational
exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean \theta = E[Y_i] using the sample mean \hat{\theta} = m^{-1} \sum_{i=1}^m Y_i of m iid random variables Y_1, \dots, Y_m. The estimator \hat{\theta} can be expressed as the solution to the following estimating equation:
\sum_{i = 1}^m (Y_i - \theta) = 0.
which is equivalent to solving \eqref{eq:psi} where O_i = Y_i and \psi(O_i, \theta) = Y_i - \theta. An
estFUN
is a translation of \psi into a function whose first argument
is data
and returns a function whose first argument is
theta
. An estFUN
corresponding to the
estimating equation for the sample mean of Y is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need
only specify estFUN
. No other programming is required to
obtain point and variance estimates. The remaining sections provide
examples of translating \psi into
an estFUN
.
The three examples of M-estimation from SB are presented here for
demonstration. For these examples, the data are O_i = \{Y_{1i}, Y_{2i}\} where Y_1 \sim N(5, 16) and Y_2 \sim N(2, 1) for m = 100 and are included in the
geexex
dataset.
The first example estimates the population mean (\theta_1) and variance (\theta_2) of Y_1. The solution to the estimating equations below are the sample mean \hat{\theta}_1 = m^{-1} \sum_{i = 1}^m Y_{1i} and sample variance \hat{\theta}_2 = m^{-1} \sum_{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2.
\psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix}
<- function(data){
SB1_estfun <- data$Y1
Y1 function(theta){
c(Y1 - theta[1],
- theta[1])^2 - theta[2])
(Y1
} }
The primary geex
function is m_estimate
,
which requires two inputs: estFUN
(the \psi function), data
(the
data frame containing O_i for i = 1, \dots, m). The package defaults to
rootSolve::multiroot
or estimating roots of \eqref{eq:psi}, though the solver
algorithm can be specified in the root_control
argument.
Starting values for rootSolve::multiroot
are passed via the
root_control
argument; e.g.,
library(geex)
<- m_estimate(
results estFUN = SB1_estfun,
data = geexex,
root_control = setup_root_control(start = c(1,1)))
<- nrow(geexex)
n <- diag(1, nrow = 2)
A
<- with(geexex, {
B <- mean(Y1)
Ybar <- mean( (Y1 - Ybar)^2 )
B11 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) )
B12 <- mean( ((Y1 - Ybar)^2 - B11)^2 )
B22 matrix(
c(B11, B12,
nrow = 2
B12, B22),
)
})
# closed form roots
<- c(mean(geexex$Y1),
theta_cls # since var() divides by n - 1, not n
var(geexex$Y1) * (n - 1)/ n )
# closed form sigma
<- (solve(A) %*% B %*% t(solve(A))) / n
Sigma_cls
<- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
The m_estimate
function returns an object of the
S4
class geex
, which contains an
estimates
slot and vcov
slot for \hat{\theta} and \hat{\Sigma}, respectively. These slots
can be accessed by the functions coef
(or
roots
) and vcov
. The point estimates obtained
for \theta_1 and \theta_2 are analogous to the base R
functions mean
and var
(after multiplying by
m-1/m for the latter). SB gave a
closed form for A(\theta_0) (an
identity matrix) and B(\theta_0)
(not shown here) and suggest plugging in sample moments to compute B(\hat{\theta}). The point and variance
estimates using both geex
and the analytic solutions are
shown below}. The maximum absolute difference between either the point
or variance estimates is 4e-11, thus demonstrating excellent agreement
between the numerical results obtained from geex
and the
closed form solutions for this set of estimating equations and data.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239
##
## $geex$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239
##
## $cls$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of Y_1 and Y_2, labelled \theta_1 and \theta_2, as well as the estimand \theta_3 = \theta_1/ \theta_2.
\psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ Y_{2i} - \theta_2 \\ \theta_1 - \theta_3\theta_2 \end{pmatrix}
The solution to \eqref{eq:psi} for this \psi function yields \hat{\theta}_3 = \bar{Y}_1 / \bar{Y}_2, where \bar{Y}_j denotes the sample mean of Y_{j1}, \ldots,Y_{jm} for j=1,2.
<- function(data){
SB2_estfun <- data$Y1; Y2 <- data$Y2
Y1 function(theta){
c(Y1 - theta[1],
- theta[2],
Y2 1] - (theta[3] * theta[2])
theta[
)
} }
<- m_estimate(
results estFUN = SB2_estfun,
data = geexex,
root_control = setup_root_control(start = c(1, 1, 1)))
# Comparison to an analytically derived sanwich estimator
<- with(geexex, {
A matrix(
c(1 , 0, 0,
0 , 1, 0,
-1, mean(Y1)/mean(Y2), mean(Y2)),
byrow = TRUE, nrow = 3)
})
<- with(geexex, {
B matrix(
c(var(Y1) , cov(Y1, Y2), 0,
cov(Y1, Y2), var(Y2) , 0,
0, 0, 0),
byrow = TRUE, nrow = 3)
})
## closed form roots
<- c(mean(geexex$Y1), mean(geexex$Y2))
theta_cls 3] <- theta_cls[1]/theta_cls[2]
theta_cls[## closed form covariance
<- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex)
Sigma_cls <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB gave closed form expressions for A(\theta_0) and B(\theta_0), into which we plug in
appropriate estimates for the matrix components and compare to the
results from geex
. The point estimates again show excellent
agreement (maximum absolute difference 4.4e-16), while the covariance
estimates differ by the third decimal (maximum absolute difference
1e-03).
comparison
## $geex
## $geex$estimates
## [1] 5.044563 2.012793 2.506250
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.100412389 -0.008718712 0.06074328
## [2,] -0.008718712 0.010693092 -0.01764626
## [3,] 0.060743283 -0.017646263 0.05215103
##
##
## $cls
## $cls$estimates
## [1] 5.044563 2.012793 2.506250
##
## $cls$vcov
## [,1] [,2] [,3]
## [1,] 0.10142666 -0.00880678 0.06135685
## [2,] -0.00880678 0.01080110 -0.01782451
## [3,] 0.06135685 -0.01782451 0.05267781
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean (\theta_1) and variance (\theta_2) of Y_1, but also the standard deviation (\theta_3) and the log of the variance (\theta_4) of Y_1.
\psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \\ \sqrt{\theta_2} - \theta_3 \\ \log(\theta_2) - \theta_4 \end{pmatrix}
<- function(data){
SB3_estfun <- data$Y1
Y1 function(theta){
c(Y1 - theta[1],
- theta[1])^2 - theta[2],
(Y1 sqrt(theta[2]) - theta[3],
log(theta[2]) - theta[4])
} }
## closed form roots
<- numeric(4)
theta_cls 1] <- mean(geexex$Y1)
theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex)
theta_cls[3] <- sqrt(theta_cls[2])
theta_cls[4] <- log(theta_cls[2])
theta_cls[
## Compare to closed form ##
<- theta_cls[2]
theta2 <- moments::moment(geexex$Y1, order = 3, central = TRUE)
mu3 <- moments::moment(geexex$Y1, order = 4, central = TRUE)
mu4 ## closed form covariance
<- matrix(
Sigma_cls c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2,
- theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2,
mu3, mu4 /(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)),
mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) ,
mu3nrow = 4, byrow = TRUE) / nrow(geexex)
## closed form covariance
# Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
<- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB again provided a closed form for A(\theta_0) and B(\theta_0), which we compare to the
geex
results. The maximum absolute difference between
geex
and the closed form estimates for both the parameters
and the covariance is 3.8e-11.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $geex$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $cls$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678