Main functions
The two main functions provided by the package are
glmmsel()
and cv.glmmsel()
, responsible for
model fitting and cross-validation, respectively.
The glmmsel()
function provides a convenient way of
fitting the model for a path of \(\lambda\) values. To demonstrate this
functionality, let’s simulate some clustered data.
set.seed(1234)
n <- 100 # Number of observations
m <- 4 # Number of clusters
p <- 5 # Number of predictors
s.fix <- 2 # Number of nonzero fixed effects
s.rand <- 1 # Number of nonzero random effects
x <- matrix(rnorm(n * p), n, p) # Predictor matrix
beta <- c(rep(1, s.fix), rep(0, p - s.fix)) # True fixed effects
u <- cbind(matrix(rnorm(m * s.rand), m, s.rand), matrix(0, m, p - s.rand)) # True random effects
cluster <- sample(1:m, n, replace = TRUE) # Cluster labels
xb <- rowSums(x * sweep(u, 2, beta, '+')[cluster, ]) # x %*% (beta + u) matrix
y <- rnorm(n, xb) # Response vector
Of the five candidate predictors, the first two have nonzero fixed effects. Only the first predictor has a nonzero random effect.
The values of \(\lambda\) are
automatically computed from the data, providing a path of solutions from
the null model (intercept only) to the full model (all predictors
included). The fixed effects and random effects from the path of fits
can be extracted using the fixef()
and ranef()
functions.
fixef(fit)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.2922322 -0.1167263 -0.11498018 -0.11484951 -0.10708102
#> [2,] 0.0000000 1.1104188 1.11240541 1.12407682 1.12770012
#> [3,] 0.0000000 1.1609182 1.16109489 1.17013772 1.17421123
#> [4,] 0.0000000 0.0000000 0.00000000 0.00000000 -0.04771683
#> [5,] 0.0000000 0.0000000 0.00000000 -0.05988147 -0.05850389
#> [6,] 0.0000000 0.0000000 0.04929141 0.04396767 0.04550589
ranef(fit)
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 -1.14669569 0 0 0 0
#> [2,] 0 0.03435146 0 0 0 0
#> [3,] 0 0.75530136 0 0 0 0
#> [4,] 0 0.35703619 0 0 0 0
#>
#> , , 3
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 -1.15635158 0 0 0 0
#> [2,] 0 0.03829802 0 0 0 0
#> [3,] 0 0.75971247 0 0 0 0
#> [4,] 0 0.35850014 0 0 0 0
#>
#> , , 4
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 -1.1514971 0 0 0 0
#> [2,] 0 0.0327921 0 0 0 0
#> [3,] 0 0.7502685 0 0 0 0
#> [4,] 0 0.3687197 0 0 0 0
#>
#> , , 5
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 -1.14421868 0 0 0 0
#> [2,] 0 0.03769651 0 0 0 0
#> [3,] 0 0.74853993 0 0 0 0
#> [4,] 0 0.35795599 0 0 0 0
Each column in the output of fixef()
corresponds to a
set of fixed effects for a particular value of \(\lambda\), with the first row containing
intercept terms. In the output of ranef()
, each slice
corresponds to a set of random effects for a particular value of \(\lambda\), with each row containing the
random effects for a given cluster.
When making predictions, it is often useful to add the fixed and
random effects to get the cluster-specific coefficients. The
coef()
function provides this functionality.
coef(fit)
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.2922322 0 0 0 0 0
#> [2,] -0.2922322 0 0 0 0 0
#> [3,] -0.2922322 0 0 0 0 0
#> [4,] -0.2922322 0 0 0 0 0
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.1167263 -0.03627693 1.160918 0 0 0
#> [2,] -0.1167263 1.14477022 1.160918 0 0 0
#> [3,] -0.1167263 1.86572012 1.160918 0 0 0
#> [4,] -0.1167263 1.46745495 1.160918 0 0 0
#>
#> , , 3
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.1149802 -0.04394616 1.161095 0 0 0.04929141
#> [2,] -0.1149802 1.15070343 1.161095 0 0 0.04929141
#> [3,] -0.1149802 1.87211788 1.161095 0 0 0.04929141
#> [4,] -0.1149802 1.47090555 1.161095 0 0 0.04929141
#>
#> , , 4
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.1148495 -0.02742033 1.170138 0 -0.05988147 0.04396767
#> [2,] -0.1148495 1.15686892 1.170138 0 -0.05988147 0.04396767
#> [3,] -0.1148495 1.87434529 1.170138 0 -0.05988147 0.04396767
#> [4,] -0.1148495 1.49279648 1.170138 0 -0.05988147 0.04396767
#>
#> , , 5
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.107081 -0.01651856 1.174211 -0.04771683 -0.05850389 0.04550589
#> [2,] -0.107081 1.16539663 1.174211 -0.04771683 -0.05850389 0.04550589
#> [3,] -0.107081 1.87624005 1.174211 -0.04771683 -0.05850389 0.04550589
#> [4,] -0.107081 1.48565611 1.174211 -0.04771683 -0.05850389 0.04550589
Each row in each of these slices represents the fixed effects plus the random effects for a given cluster, e.g., the second row represents \(\hat{\boldsymbol{\beta}}+\hat{\mathbf{u}}_2\).
The predict()
function is available for making
predictions on new data.
x.new <- x[1:3, ]
cluster.new <- cluster[1:3]
predict(fit, x.new, cluster.new)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.2922322 0.40829027 0.3588954 0.3840867 0.35454510
#> [2,] -0.2922322 -0.35024290 -0.3451526 -0.2907129 -0.31701760
#> [3,] -0.2922322 -0.07945345 -0.1067836 -0.0751470 -0.06503485
Again, the columns represent predictions for different values of \(\lambda\).
In practice, \(\lambda\) usually
needs to be cross-validated. The cv.glmmsel()
function
provides a convenient way to perform cross-validation.
glmmsel()
does not need to be run after using
cv.glmmsel()
, as the latter calls the former and saves the
result as fit$fit
.
The coef()
and predict()
functions applied
to the output of cv.glmmsel()
return the result
corresponding to the value of \(\lambda\) that minimises the
cross-validation loss.