--- title: "An Introduction to glmtlp" author: "Chunlin Li, Yu Yang, Chong Wu" date: "`r format(Sys.time(), '%B %d, %Y')`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{glmtlp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r include=FALSE} # to control the output hook_output <- knitr::knit_hooks$get("output") knitr::knit_hooks$set(output = function(x, options) { lines <- options$output.lines if (is.null(lines)) { return(hook_output(x, options)) # pass to default hook } x <- unlist(strsplit(x, "\n")) more <- "..." if (length(lines)==1) { # first n lines if (length(x) > lines) { # truncate the output, but add .... x <- c(head(x, lines), more) } } else { x <- c(more, x[lines], more) } # paste these lines together x <- paste(c(x, ""), collapse = "\n") hook_output(x, options) }) ``` ## Introduction `glmtlp` fits generalized linear models via penalized maximum likelihood. It currently supports linear and logistic regression models. The regularization path is computed for the l0, l1, and TLP penalty at a grid of values for the regularization parameter lambda $\lambda$ (for l1 and TLP penalty) or kappa $\kappa$ (for l0 penalty). In addition, the package provides methods for prediction and plotting, and functions for cross-validation. The authors of `glmtlp` are Chunlin Li, Yu Yang, and Chong Wu, and the R package is maintained by Chunlin Li and Yu Yang. A Python version is under development. This vignette describes basic usage of `glmtlp` in R. ## Installation Install the package from CRAN. ```{r, eval=FALSE} install.packages("glmtlp") ``` ## Quick Start In this section, we go over the main functions and outputs in the package. First, we load the `glmtlp` package: ```{r} library(glmtlp) ``` We load a simulated data set with continuous response to illustrate the usage of linear regression. ```{r} data(gau_data) X <- gau_data$X y <- gau_data$y ``` We fit three models by calling `glmtlp` with `X`, `y`, `family="gaussian"` and three different `penalty`. The returned `fit` is an object of class `glmtlp` that contains all relevant information of the fitted model for further use. Users can apply `plot`, `coef` and `predict` methods to the fitted objects to get more detailed results. ```{r} fit <- glmtlp(X, y, family = "gaussian", penalty = "tlp") fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0") fit3 <- glmtlp(X, y, family = "gaussian", penalty = "l1") ``` We can visualize the coefficients and the solution path by executing the `plot` method. The output is a `ggplot` object. Therefore, the users are allowed to customize the plot to suit their own needs. The plot shows the solution path of the model, with each curve corresponding to a variable. Users may also choose to annotate the curves by setting `label=TRUE`. `xvar` is the index variable to plot against. Note that for "l1" or "tlp" penalty, `xvar` could be chosen from `c("lambda", "log_lambda", "deviance", "l1_norm")`, and for "l0" penalty, `xvar` could be chosen as `"kappa"`. ```{r} #| fig.alt: > #| Solution Path Plot: coefficients against parameter lambda. plot(fit, xvar = "lambda") ``` We can use the `coef` function to obtain the fitted coefficients. By default, the results would be a matrix, with each column representing the coefficients for every $\lambda$ or $\kappa$. The users may also choose to input the desired value of $\lambda$ or $\kappa$. Note that the user-supplied $\lambda$ or $\kappa$ parameter should be in the range of the parameter sequence used in the fitted model. ```{r output.lines = 1:10} coef(fit) coef(fit, lambda = 0.1) ``` In addition, we can make predictions by applying the `predict` method. For this, users need to input a design matrix and the type of prediction to be made. Also, users can provide the desired level of regularization parameters or the indices of the parameter sequence. If neither is provided, then the prediction will be made for the whole lambda or kappa sequence. ```{r} predict(fit, X[1:5, ], lambda = 0.1) predict(fit, X[1:5, ], which = 10) # the 10th lambda in the lambda sequence ``` Cross-validation can be implemented by `cv.glmtlp` to find the best regularization parameter. `cv.glmtlp` returns a `cv.glmtlp` object, a list with all the ingredients of the cross-validated fit. Users may use `coef`, `predict`, and `plot` to further check the cross-validation results. ```{r} cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp") ``` The `plot` method will plot the deviance against the parameter sequence. The vertical dashed line shows the position of the index where the smallest CV error is achieved, and users may also choose to omit it by setting `vertical.line = FALSE`. Again, the output is a `ggplot` object, so users are free to make modifications to it. ```{r} #| fig.alt: > #| Cross-Validation Plot: deviance against log(lambda). plot(cv.fit) ``` The `coef` and `predict` method by default use the parameter that gives the smallest CV error, namely, `which = cv.fit$idx.min`. ```{r output.lines = 10} coef(cv.fit) predict(cv.fit, X[1:5, ]) ``` ## References Li, C., Shen, X., & Pan, W. (2021). Inference for a large directed graphical model with interventions. *arXiv preprint* arXiv:2110.03805. . Shen, X., Pan, W., & Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. *Journal of the American Statistical Association*, 107(497), 223-232. . Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013). On constrained and regularized high-dimensional regression. *Annals of the Institute of Statistical Mathematics*, 65(5), 807-832. . Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., & Tibshirani, R. J. (2012). Strong rules for discarding predictors in lasso‐type problems. *Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 74(2), 245-266. . Yang, Y. & Zou, H. A coordinate majorization descent algorithm for l1 penalized learning. *Journal of Statistical Computation and Simulation* 84.1 (2014): 84-95. . Zhu, Y., Shen, X., & Pan, W. (2020). On high-dimensional constrained maximum likelihood inference. *Journal of the American Statistical Association*, 115(529), 217-230. . Zhu, Y. (2017). An augmented ADMM algorithm with application to the generalized lasso problem. *Journal of Computational and Graphical Statistics*, 26(1), 195-204. .