\documentclass{article} \usepackage{amsmath} \usepackage{graphicx} \usepackage[english]{babel} \newcommand{\code}[1]{{\tt #1}} \usepackage{url} \newcommand{\likA}{\code{likelihoodAsy}~} \addtolength{\textwidth}{1.1in} \addtolength{\oddsidemargin}{-.5in} \setlength{\evensidemargin}{\oddsidemargin} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Likelihood Asymptotics in R} %\VignetteDepends{likelihoodAsy} <>= library(knitr) opts_chunk$set(comment="", message=FALSE, warning=FALSE, tidy.opts=list(keep.blank.line=TRUE, width.cutoff=180), fig.width=8, fig.height=6,out.width='1\\textwidth', options(width=180),fig.show='hold',fig.align='center',cache=TRUE) @ \title{Practical likelihood asymptotics with the \likA ~package} \author{Ruggero Bellio, Donald A. Pierce} \date{} \begin{document} \maketitle \section{Introduction} The \likA package is designed to simplify the application of some tools for likelihood asymptotics, namely the $r^*$ statistic and the modified profile likelihood. In order to use the methods, the main requirement for the user is the definition of two functions. The first function should return the log likelihood function at a given parameter model, whereas the second function should generate a data set under the assumed parametric statistical model. The user may decide to supply also a function that evaluates the gradient of the log likelihood function. Providing the gradient is not compulsory, but it may lead to safer computation in some models and, at times, substantial saving in computing time. Both the function for the log likelihood function and the function to generate a data set should have just two arguments. The first argument for either function \code{theta} is a numeric vector, containing the value of the model parameter. The other argument \code{data} is a list, and it should contain all the data for the model at hand, starting from the values of the response variable and including covariate values (if any). Any additional information required to evaluate the log likelihood, such as quadrature nodes and weights, should also be included in the \code{data} list. In the following, we first illustrate the usage of the functions for the $r^*$ formula, and then that for the modified profile likelihood. All the examples are taken from Pierce and Bellio (2015), with the exception of Example 6 which can be found in the help files of the package. \section{Functions for the $r^*$ formula} Before starting with the examples, we load the package. <<>>= library(likelihoodAsy) @ This would load also several dependencies. \subsection{Example 1: Inference on the survival function in Weibull regression} The data used in this example are taken from Feigl and Zelen (1965), and they are included in the \code{MASS} package in the \code{leuk} data frame. <>= library(MASS) data(leuk) @ The function returning the log likelihood at a given parameter point for Weibull regression is given by <>= loglik.Wbl <- function(theta, data) { logy <- log(data$y) X <- data$X loggam <- theta[1] beta <- theta[-1] gam <- exp(loggam) H <- exp(gam * logy + X %*% beta) out <- sum(X %*% beta + loggam + (gam - 1) * logy - H) return(out) } @ This instance of the user-provided function assumes that the \code{data} list would include the components \code{y} and \code{X}. The former contains the survival times, and the latter the design matrix. Here we focus on the data subset corresponding to \code{ag="present"}, and take as regressor $\log_{10}$ of the white blood count for each patient of the subset considered. We then define the required list <>= X <- model.matrix(~log(wbc, base=10), data=leuk[leuk$ag=="present",]) data.fz <-list(X = X, y = leuk$time[leuk$ag=="present"]) @ Since use of such a data object is central to the package, we offer some further comments. The organization of the data list is only required to be compatible with the user-provided functions for the likelihood and for generating a dataset. For those unaccustomed to using \code{R} we note that reading a flat data file with \code{read.table()}, or more simply with \code{read.csv()}, results in a data frame that can be used as above. With \code{read.csv()} it will suffice either to have variable names in a header line, or the option \code{header = FALSE} results in variables named \code{V1, V2,...}. We proceed to define a function for generating a data set from the Weibull regression model. The function returns a copy of the data argument with \code{y} replaced by a simulated vector. <>= gendat.Wbl <- function(theta, data) { X <- data$X n <- nrow(X) beta <- theta[-1] gam <- exp(theta[1]) data$y <- (rexp(n) / exp(X %*% beta)) ^ (1 / gam) return(data) } @ The last function defines the scalar function of inferential interest, here the log survival function for the response at 130, and corresponding to a covariate value of 4. <>= psifcn.Wbl <- function(theta) { beta <- theta[-1] gam <- exp(theta[1]) y0 <- 130 x0 <- 4 psi <- -(y0 ^ gam) * exp(beta[1] + x0 * beta[2]) return(psi) } @ Now everything is in place for computing the $r^*$ statistic for the hypothesis $\psi=\log(0.03)$, which is approximately a 95\% first-order lower confidence limit. We set the random seed to a given value, here equal to 10, in order to get reproducible results. <>= rs <- rstar(data=data.fz, thetainit = c(0, 0, 0), floglik = loglik.Wbl, fpsi = psifcn.Wbl, psival = log(0.03), datagen = gendat.Wbl, trace=FALSE, seed=10, psidesc="Log survival function") rs @ A more detailed set of results is displayed by the \code{summary} method <<>>= summary(rs) @ Providing the code for the gradient of the log likelihood function may lead to some gain in the computational time. This can be done by defining another function, with the same arguments as \code{loglik.Wbl} <<>>= grad.Wbl <- function(theta, data) { logy <- log(data$y) X <- data$X loggam <- theta[1] beta <- theta[-1] gam <- exp(loggam) H <- exp(gam * logy + X %*% beta) score.beta <- t(X) %*% (1 - H) score.nu <- sum(1 + gam * logy - gam * H * logy) out <- c(score.nu, score.beta) return(out) } @ This can be checked against a numerical result by using the \code{grad} function of the \code{pracma} package (Borchers, 2015), which is included in the package dependencies. <>= cbind(pracma::grad(loglik.Wbl, rs$theta.hyp, data=data.fz), grad.Wbl(rs$theta.hyp, data.fz)) @ The computation of confidence intervals based on the $r^*$ statistic can be done by calling the \code{rstar.ci} function. <>= rs.int <- rstar.ci(data=data.fz, thetainit = c(0, 0, 0), floglik = loglik.Wbl, fpsi = psifcn.Wbl, fscore=grad.Wbl, datagen=gendat.Wbl, trace=FALSE, seed=1223, psidesc="Log survival function") rs.int @ There are both \code{summary} and \code{plot} methods for the output. <<>>= summary(rs.int) @ <<>>= plot(rs.int) @ The resulting plot represents the behavior of $r(\psi)$ and $r^*(\psi)$ as a function of the parameter of interest $\psi$, here defined in the \code{psifcn.Wbl} function. Note that both confidence intervals based on the signed likelihood ratio test computed by employing either the first-order or the second-order asymptotic formula are invariant to interest-preserving parameterization. For example, the 95\% confidence interval based on the $r^*$ formula for the survival function (rather than the log survival) at the same point is simply given by <<>>= print(exp(rs.int$CIrs[2,]), digits=3) @ As a final variation, we consider the case where some observations might be censored. This is readily handled by modifying the log likelihood function (and, possibly, the gradient) and the function for generating data sets. The change required in the former case is simple, as we need to introduce a censoring indicator in the data object and do a minor change to the log likelihood computation, namely <>= loglik.Wbl.cens <- function(theta, data) { logy <- log(data$y) X <- data$X f <- data$f ### binary censoring indicator: 0=censored, 1=observed loggam <- theta[1] beta <- theta[-1] gam <- exp(loggam) H <- exp(gam * logy + X %*% beta) out <- sum(f * (X %*% beta + loggam + (gam - 1) * logy) - H) return(out) } @ The change required for the data-generating function is more delicate, as we need to specify a censoring model. Here we assume Type II censoring, assuming that the largest 5 failure times are censored at the just-preceding failure time. This is carried out by the following function <>= gendat.Wbl.cens <- function(theta, data) { X <- data$X n <- nrow(X) beta <- theta[-1] gam <- exp(theta[1]) y <- (rexp(n) / exp(X %*% beta)) ^ (1 / gam) maxv <- n - 5 ### the five largest observation are censored ymaxv <- sort(y)[maxv] data$y <- ifelse(y < ymaxv, y, ymaxv) data$f <- ifelse(y < ymaxv, 1, 0) return(data) } @ For running the example, we also need to modify the data list: <>= data.fz.cens <-list(X = X, y = leuk$time[leuk$ag=="present"], f=rep(1,nrow(X))) data.fz.cens$y <- ifelse(data.fz$y < sort(data.fz$y)[12], data.fz$y, sort(data.fz$y)[12]) data.fz.cens$f <- ifelse(data.fz$y < sort(data.fz$y)[12], 1, 0) @ Finally, we compute the confidence intervals for the same parameter of interest considered without censoring. We note in passing that here the log likelihood function is harder to maximize under the null hypothesis, and we employ for the task the constrained optimizer made available by the \code{alabama} package. This is envoked by setting the argument \code{constr.opt} to \code{"alabama"}. <>= rs.int.cens <- rstar.ci(data=data.fz.cens, thetainit = c(0, 0, 0), floglik = loglik.Wbl.cens, fpsi = psifcn.Wbl, datagen=gendat.Wbl.cens, constr.opt="alabama", trace=FALSE, seed=1223, psidesc="Log survival function") summary(rs.int.cens) @ \subsection{Example 2: Autoregressive model of order 1} For this example, the required functions are given as follows. As no covariates are involved, they are relatively simple to code. The log likelihood function is given by <>= likAR1 <- function(theta, data) { y <- data$y mu <- theta[1] sigma2 <- exp(theta[2] * 2) rho <- theta[3] n <- length(y) Gamma1 <- diag(1 + c(0, rep(rho^2, n-2), 0)) for(i in 2:n) Gamma1[i,i-1]<- Gamma1[i-1,i] <- -rho lik <- -n/2 * log(sigma2) + 0.5 * log(1 - rho^2) - 1 / (2 * sigma2) * mahalanobis(y, rep(mu,n), Gamma1, inverted = TRUE) return(lik) } @ and we note that the inverse of the covariance matrix (\code{Gamma1}) has been coded explicitly. Here \code{theta[2]} corresponds to the log standard deviation of the error term, so the variance is recovered by \code{exp(theta[2] * 2)}. It would be preferable to use a different parameterization for the correlation parameter as well, and actully the help file for the \code{rstar} function illustrates the same example where the Fisher's z-transform is employed. The gradient of the log likelihood function is coded as follows. <>= grAR1 <- function(theta, data) { y <- data$y mu <- theta[1] sigma2 <- exp(theta[2] * 2) rho <- theta[3] n <- length(y) Gamma1 <- diag( 1 + c(0, rep(rho^2, n-2), 0)) DGamma1 <- diag(c(0, rep( 2 * rho, n-2), 0)) for(i in 2:n) { Gamma1[i,i-1]<- Gamma1[i-1,i] <- -rho DGamma1[i,i-1] <- DGamma1[i-1,i] <- -1 } out <- rep(0, length(theta)) out[1] <- 1 / sigma2 * t(rep(1,n)) %*% Gamma1 %*% (y-mu) out[2] <- -n / (2 * sigma2) + 1 / (2 * sigma2^2) * mahalanobis(y, rep(mu,n), Gamma1, inverted = TRUE) out[2] <- out[2] * sigma2 * 2 out[3] <- -rho / (1 - rho^2) - 1 / (2 * sigma2) * mahalanobis(y, rep(mu,n), DGamma1, inverted = TRUE) return(out) } @ Finally, the following function generates a data set. <>= genDataAR1 <- function(theta, data) { out <- data mu <- theta[1] sigma <- exp(theta[2]) rho <- theta[3] n <- length(data$y) y <- rep(0,n) y[1] <- rnorm(1, mu, s = sigma * sqrt(1 / (1 - rho^2))) for(i in 2:n) y[i] <- mu + rho * (y[i-1] - mu) + rnorm(1) * sigma out$y <- y return(out) } @ For an illustrative example, we consider the \code{lh} data set from the \code{MASS} library, like done in Lozada-Can and Davison (2010). <>= data.AR1 <- list( y = as.numeric(lh) ) @ We proceed to test the hypothesis $H_0: \rho=0.765$ by means of the \code{rstar} function, with the value under testing being the upper limit of a 1st-order 95\% level confidence level based on $r$. <>= rsAR1 <- rstar(data=data.AR1, thetainit = c(0, 0, 0), floglik = likAR1, fpsi = function(theta) theta[3], fscore=grAR1, psival = 0.765, datagen=genDataAR1, trace=FALSE, seed=10121, psidesc="Autocorrelation parameter") summary(rsAR1) @ For a comparison, we can also test the same hypothesis by means of parametric bootstrap, which is a practical route to validate the result of likelihood asymptotics. For example, we may use 50,000 bootstrap trials, employing the \code{rstar} function with argument \code{ronly} set to \code{TRUE}. <>= rvals <- rep(0, 50000) set.seed(107) for(i in 1:length(rvals)) { data.boot <- genDataAR1(rsAR1$theta.hyp, data.AR1) if(i%%1000==0) cat("i=",i,"\n") r.boot <- rstar(data=data.boot, thetainit = rsAR1$theta.hyp, floglik = likAR1, fpsi = function(theta) theta[3], fscore=grAR1, psival = 0.765, datagen=genDataAR1, trace=FALSE, ronly=TRUE) rvals[i] <- r.boot$r } @ The computation (not shown) takes a few minutes on most machines. The bootstrap-based $p$-value agrees with that based on $r^*$, as can be found by running the command <>= c(mean(rvals < rsAR1$r), pnorm(rsAR1$rs) ) @ which returns, with the given random seeed, the values 0.1313 and 0.1240 respectively. \subsection{Example 3: Binomial overdispersion} The data for this example (Finney, 1947) are included in the \code{finndat} data frame. After loading it, we can define the \code{data.binOD} list, required for usage of the package routines. The log likelihood function will be approximated by Gauss-Hermite quadrature, therefore the quadrature nodes and weights are also included in the \code{data.binOD} list. The quadrature nodes and weights are obtained from the function \code{gauss.quad} from the \code{statmod} package (Smyth et al., 2015). <>= data(finndat) z <- scale(finndat$z * 10, scale=FALSE) X <- cbind(rep(1,length(z)), z) data.binOD <- list(X=X, den = finndat$den, y = finndat$y, gq=gauss.quad(40,"hermite")) @ Now we can define the log likelihood function deriving from the assumption of a Gaussian random effect on the linear predictor with logit link function. <>= loglik.binOD <- function(theta, data) { p.range<- function(p, eps=2.22e-15) { out <- p out[p(1-eps)] <- (1-eps) return(out) } y <- data$y den <- data$den X <- data$X gq <- data$gq n <- length(y) p <- ncol(X) beta <- theta[1:p] sigma <- exp(theta[p+1]) linpred <- X %*% beta L <- rep(0,n) for (i in 1:n) { prob <- p.range(plogis(linpred[i] + gq$nodes * sqrt(2)*sigma)) likq <- y[i] * log(prob) + (den[i] - y[i]) * log(1-prob) L[i] <- sum(gq$weights * exp(likq) ) / sqrt(2 * pi) } return(log(prod(L))) } @ This Gaussian quadrature with 40 points yields results that are adequate for numerical differentiation (and many of the more standard routines do not). Anyway, this is an example where gradient code could be essential, though it is not in this instance because of the attention paid to the quadrature method. The gradient is coded as follows. <>= grad.binOD <- function(theta,data) { p.range<- function(p, eps=2.22e-15) { out <- p out[p(1-eps)] <- (1-eps) return(out) } y <- data$y den <- data$den X <- data$X gq <- data$gq n <- length(y) p <- ncol(X) beta <- theta[1:p] sigma <- exp(theta[p+1]) linpred <- X %*% beta L <- rep(0,n) LB <- matrix(0, nrow=n, ncol=p+1) out <- rep(0,p+1) for (i in 1:n) { prob <- p.range(plogis(linpred[i]+gq$nodes*sqrt(2)*sigma)) likq <- y[i] * log(prob) + (den[i] - y[i]) * log(1-prob) score <- (y[i] - den[i] * prob) L[i] <- sum(gq$weights * exp(likq) ) / sqrt(2 * pi) LB[i,1] <- sum(gq$weights * exp(likq) * score) / sqrt(2 * pi) LB[i,2] <- sum(gq$weights * exp(likq) * score * X[i,2] ) / sqrt(2 * pi) LB[i,3] <- sum(gq$weights * exp(likq) * score * gq$nodes * sqrt(2)) / sqrt(2 * pi) * sigma out <- out + LB[i,] / L[i] } return(out) } @ The function that generates a data set is as follows. <>= gendat.binOD <- function(theta, data) { out <- data den <- data$den X <- data$X p <- ncol(X) n <- length(data$y) beta <- theta[1:p] sigma <- exp(theta[p+1]) u <- rnorm(n) * sigma linpred <- X %*% beta + u out$y <- rbinom(n, size=den, prob=plogis(linpred)) return(out) } @ Now we can apply the \code{rstar} function for testing the hypothesis that the slope is equal to one. For this model, it seems that 500 Monte Carlo trials are enough for stable results, meaning that the variation in the results is rather limited across repetitions with different random seed. <>= rs <- rstar(data=data.binOD, thetainit=c(0, 0, 0), floglik=loglik.binOD, fscore=grad.binOD, fpsi=function(theta) return(theta[2]), seed=110, trace=FALSE, R=500, psival=1 ,datagen=gendat.binOD, psidesc="Regression slope") summary(rs) @ \subsection{Example 4: $2 \times 2$ contingency table} This further example shows a simple application to Poisson models for counts, in the special case of a $2 \times 2$ table. The log likelihood and the function to simulate a data set are easily defined, by representing the table as a vector of length 4. Note the usage of a continuity correction in the log likelihood function, with each cell of the table perturbed by 0.5. <>= loglik.Pois <- function(theta, data) { y <- data$y y <- y + 0.50 * c(-1,1,1,-1) ### continuity correction mu <- exp(data$X %*% theta) el <- sum(y * log(mu) - mu) return(el) } gendat.Pois <- function(theta, data) { out <- data mu <- exp(data$X %*% theta) out$y <- rpois(n=4, lam=mu) return(out) } @ Let us now define the list representing the observed table \code{c(15, 9, 7, 13)}. <>= rowf <- c(1, 0, 1, 0) colf <- c(1, 1, 0, 0) intf <- c(0, 0, 0, 1) X <- cbind( rep(1, 4), rowf, colf, intf) data.2x2 <- list(y = c(15, 9, 7, 13), X=X) @ The $p$-value for independence based on the $r^*$ statistic is quickly obtained. <>= rs <- rstar(data=data.2x2, thetainit = c(0, 0, 0, 0), floglik = loglik.Pois, fpsi = function(theta) theta[4], psival = 0, datagen=gendat.Pois, trace=FALSE, R=50, psidesc="Independence test") summary(rs) @ Here it is important to note that the model of this example is a full exponential family model, for which the $r^*$ statistic has a close-form analytic expression, and Monte Carlo computation is not required for its computation. The \code{rstar} function however does not attempt to detect such instances, and the general algorithm (employing Monte Carlo computation) is used nevertheless, even if the outcome of the Monte Carlo computation will cancel out at the end. In such cases, we recommend to set the value of the \code{R} argument to a small yet not null value, such as the value 50 used here, in order to avoid any numerical problem that may occur in the Monte Carlo computation. \subsection{Example 5: logistic regression} This example features a large-dimensional nuisance parameter. The data are the famous \lq\lq crying babies dataset\rq\rq, already employed by many authors. The data can be found in the \code{cond} package (Brazzale, Davison and Reid, 2007). <>= library(cond) data(babies) @ We first fit a standard logistic regression model, with fixed effects for \code{lull} and \code{day}, and proceed with the definition of the list with all the data information. <>= mod.glm <- glm(formula = cbind(r1, r2) ~ day + lull - 1, family = binomial, data = babies) data.obj <- list(y = babies$r1, den = babies$r1 + babies$r2, X = model.matrix(mod.glm)) @ The data definition is compatible with the functions providing the log likelihood and data simulation. For this model, coding the gradient of the log likelihood is straightforward. <>= loglik.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) l <- sum(y * log(p) + (den - y) * log(1-p)) return(l) } grad.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) out <- t(y - p * den) %*% X return(drop(out)) } gendat.logit<- function(theta, data) { X <- data$X eta <- X %*% theta p <- plogis(eta) out <- data out$y <- rbinom(length(data$y), size = data$den, prob = p) return(out) } @ Here we obtain confidence intervals for the coefficient of \code{lull}. For the sake of comparison, we do it twice, with and without employing the coded gradient, and record the time spent for the computation. <>= time.with <- system.time( rs.int <- rstar.ci(data=data.obj, thetainit = coef(mod.glm), floglik = loglik.logit, fpsi = function(theta) theta[19], fscore=grad.logit, datagen=gendat.logit, trace=FALSE, psidesc="Coefficient of lull") ) time.without <- system.time( rs.int.no <- rstar.ci(data=data.obj, thetainit = coef(mod.glm), floglik = loglik.logit, fpsi = function(theta) theta[19], datagen=gendat.logit, trace=FALSE, psidesc="Coefficient of lull") ) @ We now have a look at the obtained intervals, and also at computing times. <

>= summary(rs.int) @ <>= time.with time.without @ There is a close agreement between the results obtained here and those provided by the \code{cond} package. The latter are readily computed. <>= res.cond <- cond(object = mod.glm, offset = lullyes) summary(res.cond) @ \section{Functions for the Modified Profile Likelihood (MPL)} The \likA package has also two functions for the Modified Profile Likelihood and the Profile Likelihood, the \code{logMPL} and \code{logPL} functions respectively. Both the two functions evaluate the value of the target log likelihood at a given value of the parameter of interest, of dimension possibly larger than one. The two functions return the value of the log likelihood at a given value parameter of interest. Either function value can be multiplied by -1 to ease usage with general-purposes optimizers, which typically performs minimization rather than maximization. Indeed, differently from the functions for the $r^*$ statistic, their optimization and graphical display are left to the user, who can employ for this task the functionality available within \code{R}. The only difference in the design of the functions with respect to the functions for the $r^*$ computation lies in the way the parameter of interest are represented in the input argument. These functions handle only the case where the parameter of interest is a subset of the vector representing the model parameters, with no need to define a specific function for the parameter of interest like in the \code{rstar} function. A numeric vector \code{indpsi} containing the indexes (coordinates) of the parameter of interest is used instead. \subsection{Example 5: logistic regression} Let us consider again the crying babies dataset. Here the parameter is scalar, so we are able to plot the two profile log likelihoods. The data list is the same \code{data.obj} defined above. We first proceed to the numerical optimization of both functions, by means of the \code{nlminb} optimizer. <>= max.prof <- nlminb(0, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19, trace=FALSE, minus=TRUE) max.mpl <- nlminb(0, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, R=50, seed=2020, trace=FALSE, minus=TRUE) c(max.prof$par, max.mpl$par) @ Like before, theory of exponential family models suggests that the MPL formula would not require any Monte Carlo computation, but the software does not recognize this fact. Once again, there is no need to employ a large simulation sample size. The final lines of code obtain the plot the two log likelihoods. <>= psi.vals <- seq(-0.3, 3.7, l=30) obj.prof <- sapply(psi.vals, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19) obj.mpl <- sapply(psi.vals, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, R=50, seed=2020) @ <<>>= par(pch="s") plot(psi.vals, obj.prof - max(obj.prof), type="l", xlab=expression(psi), ylab="log likelihood", lwd=2, las=1) lines(psi.vals, obj.mpl - max(obj.mpl), col="red", lwd=2) legend("topright", col=c(1, 2), lty=1, lwd=2, legend=c("Profile","MPL"), bty="n") @ Again, by plotting the \code{res.cond} object it is possible to verify the agreement with the results provided by the \code{cond} package. \subsection{Example 6: random intercept model} As a further example we consider a simple linear mixed model, with only random intercepts. The log likelihood function is taken from Wood (2006), and for speeding up the computation we code the gradient as well. The data generation function is the simplest of the three. <>= logLikLme<- function(theta, data) { X <- data$X Z <- data$Z y <- data$y beta <- theta[1:ncol(X)] sigma.b <- theta[ncol(X)+1] sigma <- theta[ncol(X)+2] n <- nrow(X) V <- tcrossprod(Z) * sigma.b^2 + diag(n) * sigma^2 L <- chol(V) XL <- backsolve(L, X, transpose=TRUE) yL <- backsolve(L, y, transpose=TRUE) out<- - sum(log(diag(L))) - sum( (yL-XL %*% beta)^2) / 2 return(out) } gradLikLme <- function(theta, data) { X <- data$X Z <- data$Z y <- data$y beta <- theta[1:ncol(X)] sigma.b <- theta[ncol(X)+1] sigma <- theta[ncol(X)+2] n <- nrow(X) V <- tcrossprod(Z) * sigma.b^2 + diag(n) * sigma^2 L <- chol(V) XL<- backsolve(L, X, transpose=TRUE) yL<- backsolve(L, y, transpose=TRUE) out <- rep(0, length(theta)) out[1:ncol(X)] <- t(yL-XL %*% beta) %*% XL ni<- as.vector(t(Z) %*% rep(1,n)) Zv<- matvec(Z, sqrt(1/(sigma^2 + sigma.b^2 * ni))) V1 <- diag(n) / sigma^2 - tcrossprod(Zv) * sigma.b^2 / sigma^2 Vb <- tcrossprod(Z) * 2 * sigma.b Vs <- diag(n) * 2 * sigma Mb <- V1 %*% Vb Ms <- V1 %*% Vs r <- as.vector(y - X %*% beta) out[ncol(X)+1] <- -sum(diag(Mb)) / 2 + as.numeric( t(r) %*% Mb %*% V1 %*% r) / 2 out[ncol(X)+2] <- -sum(diag(Ms)) / 2 + as.numeric( t(r) %*% Ms %*% V1 %*% r) / 2 return(out) } genDataLme <- function(theta, data) { out <- data X <- data$X Z <- data$Z y <- data$y beta <- theta[1:ncol(X)] sigma.b <- theta[ncol(X)+1] sigma <- theta[ncol(X)+2] n <- nrow(X) mu <- X %*% beta b <- rnorm(ncol(Z), s=sigma.b) e <- rnorm(nrow(Z), s=sigma) out$y <- mu + e + Z %*% b return(out) } @ We take the \code{sleepstudy} data from the \code{lme4} package (Bates et al., 2014) for this example. <>= library(lme4) fm1R <- lmer(Reaction ~ Days + (1|Subject), sleepstudy) sleepdata <- list(X=model.matrix(Reaction ~ Days, sleepstudy), Z=model.matrix(Reaction ~ factor(Subject)-1, sleepstudy), y=sleepstudy$Reaction) @ We start by computing the maximum likelihood estimates. <>= mleFull <- optim( c(250, 10, 30, 30), logLikLme, gr=gradLikLme, data=sleepdata, method="BFGS", control=list(fnscale=-1)) @ Then we maximize the MPL, employing just 100 Monte Carlo simulations for its computation. The number of trials seems to suffice, due to the curved exponential family structure of linear mixed models. <>= mleM <- optim(mleFull$par[3:4], logMPL, data=sleepdata, mle=mleFull$par, floglik=logLikLme, fscore=gradLikLme, minus=TRUE, indpsi=3:4, datagen=genDataLme, trace=FALSE, seed=11, R=100) @ The optimization takes up to a few minutes on most machines. The result (not shown) agrees well with what found by the \code{lmer} function, using the default REML estimation method. \begin{thebibliography}{9} \bibitem{bates2014} Bates, D., Maechler, M., Bolker, B. and Walker, S. (2014). \texttt{lme4}: Linear mixed-effects models using \texttt{Eigen} and \texttt{S4}. \texttt{R} package version 1.1-7. \url{http://CRAN.R-project.org/package=lme4}. \bibitem{pracma2014} Borchers, H. W. (2015). \texttt{pracma}: Practical Numerical Math Functions. \texttt{R} package version 1.8.3. \url{http://CRAN.R-project.org/package=pracma} \bibitem{brazz2007} Brazzale, A.R., Davison, A.C. and Reid, N. (2007). \emph{Applied Asymptotics: Case Studies in Small-Sample Statistics}. Cambridge University Press, Cambridge. \url{http://statwww.epfl.ch/AA/} \bibitem{loz2010} Lozada-Can, C. and Davison, A.C. (2010). Three examples of accurate likelihood inference. \emph{The American Statistician}, \textbf{64}, 131--139. \bibitem{feigl1965} Feigl, P. and Zelen, M. (1965). Estimation of exponential survival probabilities with concomitant information. \emph{Biometrics}, \textbf{21}, 826--838. \bibitem{finney1947} Finney, D.J. (1947). \emph{Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve}. Cambridge University Press, London and New York. \bibitem{pierce2015} Pierce, D.A. and Bellio, R. (2017). Modern likelihood-frequentist inference. \emph{International Statistical Review}, \textbf{85}, 519--541. \bibitem{statmod2015} Smyth, G., Hu, Y., Dunn, P., Phipson, B. and Chen, Y. (2015). \texttt{statmod}: Statistical Modeling. \texttt{R} package version 1.4.21. \url{http://CRAN.R-project.org/package=statmod} \bibitem{wood2006generalized} Wood, S. (2006). \emph{Generalized Additive Models: An Introduction with R}. Chapman \& Hall/CRC, Boca Raton. \end{thebibliography} \end{document}