--- title: "Large-Scale Poisson Regression with lsReg" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Large-Scale Poisson Regression with lsReg} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview This vignette demonstrates large-scale Poisson regression using `lsReg` with the score test. The score test is computed entirely from the base model fit and does not require fitting the full model for each candidate covariate, making it especially efficient for large-scale screening. Because the full model is never fit, no parameter estimate is produced — only the test statistic. ## Example data We use the same simulated dataset as the other vignettes. The count outcome `ypois` was generated from a Poisson model with `x1`, `x2`, `z5`, and `z9` as predictors. The variables `z1` through `z10` are the candidates to be screened. ```{r load-data} library(lsReg) datafile <- system.file("extdata", "simulated_data.rds", package = "lsReg") dat <- readRDS(datafile) head(dat[, c("ypois", "x1", "x2", "z1", "z2", "z5", "z9")]) ``` ## Step 1: Fit the base model ```{r base-model} basemdl <- glm(ypois ~ x1 + x2, data = dat, family = poisson) summary(basemdl) ``` ## Step 2: Allocate memory ```{r allocate} mem <- lsReg(basemdl, colstoadd = 1, testtype = "score") ``` ## Step 3: Test each candidate covariate After each call to `addcovar()` the score test statistic is in `mem$testvalue`. The score test returns a z-score, so p-values are computed from the standard normal distribution. ```{r run-tests} zvars <- paste0("z", 1:10) results <- data.frame( variable = zvars, statistic = NA_real_ ) for (i in seq_along(zvars)) { xr <- as.matrix(dat[, zvars[i], drop = FALSE]) addcovar(mem, xr) results$statistic[i] <- mem$testvalue[1, 1] } ``` ## Results ```{r results} results$pvalue <- 2 * pnorm(-abs(results$statistic)) print(results, digits = 4) ``` The variables `z5` and `z9` have the largest score statistics and the smallest p-values, consistent with the data-generating model. ## Verification against statmod The `glm.scoretest()` function from the `statmod` package computes the score test statistic for a single candidate covariate added to a fitted GLM. We use it to verify the `lsReg` results for `z5` and `z9`. ```{r verify} library(statmod) statmod_stat_z5 <- glm.scoretest(basemdl, dat[, "z5"]) statmod_stat_z9 <- glm.scoretest(basemdl, dat[, "z9"]) lsreg_stat_z5 <- results$statistic[results$variable == "z5"] lsreg_stat_z9 <- results$statistic[results$variable == "z9"] comparison <- data.frame( variable = c("z5", "z9"), statmod_statistic = c(statmod_stat_z5, statmod_stat_z9), lsreg_statistic = c(lsreg_stat_z5, lsreg_stat_z9) ) print(comparison, digits = 6) ``` The score statistics from `lsReg` match those from `statmod::glm.scoretest()` to numerical precision.