--- title: "Getting Started with achieveGap" author: "[Your Name], Michigan State University" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with achieveGap} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, warning = FALSE, message = FALSE ) ``` ## Overview The **achieveGap** package provides a statistically rigorous framework for estimating achievement gap *trajectories* in longitudinal educational data. Rather than computing the gap as a post hoc difference between two separately estimated curves, the gap function is parameterized directly as a smooth estimand within a joint hierarchical model. This ensures correct uncertainty quantification via simultaneous confidence bands. The package is motivated by the observation that achievement gaps often evolve nonlinearly across grades — widening rapidly in early elementary school, plateauing in middle grades, or narrowing following policy interventions — patterns that standard linear growth models cannot capture. --- ## Installation ```r # From CRAN (once published): install.packages("achieveGap") # From GitHub (development version): # install.packages("devtools") devtools::install_github("yourusername/achieveGap") ``` --- ## Quickstart: Simulated Data The simplest way to get started is with `simulate_gap()`, which generates synthetic longitudinal data with a known true gap function. ```{r simulate} library(achieveGap) # Generate data: 400 students in 30 schools, grades K-7 # Gap shape: monotone widening (true gap increases steadily across grades) sim <- simulate_gap( n_students = 400, n_schools = 30, gap_shape = "monotone", seed = 2024 ) # Preview the data head(sim$data) ``` ```{r true-gap-table} # The true gap at each grade sim$true_gap ``` The data frame has one row per student-grade observation, with columns `student`, `grade`, `school`, `SES_group` (0 = high SES, 1 = low SES), and `score`. --- ## Fitting the Model The main function is `gap_trajectory()`. You specify the column names for the outcome, grade, group indicator, school ID, and student ID. ```{r fit, cache = TRUE} # Formula interface (recommended) — same as lme4/nlme style fit <- achieve_gap( score ~ grade, group = "SES_group", random = ~ 1 | school/student, data = sim$data, k = 6, # spline basis dimension (must be < unique grade values) n_sim = 5000 # posterior draws for simultaneous bands ) ``` Printing the fitted object gives a concise overview: ```{r print} print(fit) ``` --- ## Summarizing Results `summary()` displays estimated gap values with standard errors and simultaneous confidence band bounds at equally spaced grade points. Grades marked with `*` have bands that exclude zero — statistically significant gap with multiplicity control. ```{r summary} summary(fit) ``` --- ## Visualizing the Gap Trajectory `plot()` produces a publication-ready figure. By default, both the simultaneous band (light shading) and pointwise band (dark shading) are shown. In simulation settings where the true gap is known, you can overlay it with the `true_gap` argument. ```{r plot-with-truth, fig.cap = "Estimated gap with both confidence bands and true gap overlaid."} # Grade labels for x-axis grade_labs <- c("K", "G1", "G2", "G3", "G4", "G5", "G6", "G7") # True gap evaluated at the model's grade grid true_gap_vec <- sim$f1_fun(fit$grade_grid) plot( fit, true_gap = true_gap_vec, grade_labels = grade_labs, title = "SES Achievement Gap Trajectory (Simulated Data)" ) ``` To show only the simultaneous band: ```{r plot-sim-only, fig.cap = "Gap trajectory with simultaneous band only."} plot(fit, band = "simultaneous", grade_labels = grade_labs) ``` --- ## Hypothesis Testing `test_gap()` runs two types of tests: 1. **Global test**: Is the gap smooth significantly different from zero anywhere? (Approximate chi-squared test from `mgcv`.) 2. **Simultaneous test**: Which specific grade intervals have a statistically significant gap, with joint error rate control? ```{r test} tryCatch( test_gap(fit, type = "both"), error = function(e) message("test_gap: ", conditionMessage(e)) ) ``` --- ## Comparing to Separate Splines `fit_separate()` provides the comparison approach: two independent spline models, one per group, with the gap computed by post hoc subtraction. As discussed in the paper, this approach underestimates uncertainty. ```{r separate, cache = TRUE} sep <- fit_separate( data = sim$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student" ) # Compare gap estimates side by side cat("Joint model gap at grade 4: ", round(fit$gap_hat[fit$grade_grid >= 3.9][1], 3), "\n") cat("Separate model gap at grade 4:", round(sep$gap_hat[sep$grade_grid >= 3.9][1], 3), "\n") # Separate model CIs are narrower (anti-conservative) cat("\nMean CI width - Joint (pointwise):", round(mean(fit$pw_upper - fit$pw_lower), 3)) cat("\nMean CI width - Separate: ", round(mean(sep$ci_upper - sep$ci_lower), 3), "\n") ``` --- ## Non-Monotone Gap Example The `nonmonotone` gap shape widens early, plateaus, and narrows slightly — a pattern linear models cannot capture. ```{r nonmono, cache = TRUE, fig.cap = "Non-monotone gap trajectory."} sim2 <- simulate_gap(n_students = 400, n_schools = 30, gap_shape = "nonmonotone", seed = 99) fit2 <- gap_trajectory( data = sim2$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student", n_sim = 1000, verbose = FALSE ) plot(fit2, true_gap = sim2$f1_fun(fit2$grade_grid), grade_labels = grade_labs, title = "Non-Monotone SES Gap: Widens Early, Plateaus, Narrows") ``` --- ## Running a Benchmark Simulation `run_simulation()` replicates the simulation study from the paper. Use `n_reps = 5` for a quick test; the paper used `n_reps = 500`. ```{r sim-study, eval = FALSE} results <- run_simulation( n_reps = 5, # increase to 500 for paper results n_sim = 1000 # increase to 5000 for final run ) ``` ```{r sim-summary, eval = FALSE} summarize_simulation(results) ``` --- ## Using Your Own Data To apply `gap_trajectory()` to real data, prepare a long-format data frame with columns for: - **outcome** (standardized test score, numeric) - **grade/time** (numeric, e.g., 0 = kindergarten, 1 = grade 1, ...) - **group** (binary 0/1; 0 = reference group, 1 = focal group) - **school ID** (factor or integer) - **student ID** (factor or integer) ```r # Example with ECLS-K:2011 data (after loading and preparing) fit_real <- achieve_gap( math_score_z ~ grade_numeric, group = "low_ses", random = ~ 1 | school_id/child_id, data = eclsk_clean, k = 6, n_sim = 10000 ) summary(fit_real) plot(fit_real, grade_labels = c("K", "G1", "G2", "G3", "G4", "G5")) test_gap(fit_real) ``` --- ## Session Info ```{r session} sessionInfo() ```