--- title: "Offline Changepoint Detection" author: "José Mauricio Gómez Julián" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Offline Changepoint Detection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Introduction Offline changepoint detection is used when you have a complete dataset and want to identify where regime changes occurred retrospectively. This is the "archaeological" approach to changepoint detection. ## When to Use Offline Detection - Historical data analysis - Research and scientific studies - Batch processing of data - When the complete dataset is available ## PELT: Pruned Exact Linear Time PELT is the gold standard for offline multiple changepoint detection: ```{r message=FALSE, warning=FALSE} library(RegimeChange) # Generate data with multiple changepoints set.seed(42) data <- c( rnorm(100, 0, 1), # Regime 1 rnorm(100, 3, 1), # Regime 2 rnorm(100, 1, 2), # Regime 3 rnorm(100, 4, 0.5) # Regime 4 ) true_cps <- c(100, 200, 300) # Detect with PELT result_pelt <- detect_regimes(data, method = "pelt", penalty = "BIC") print(result_pelt) ``` ### Penalty Selection The penalty controls the trade-off between fit and complexity: ```{r} # Different penalties result_bic <- detect_regimes(data, method = "pelt", penalty = "BIC") result_aic <- detect_regimes(data, method = "pelt", penalty = "AIC") result_mbic <- detect_regimes(data, method = "pelt", penalty = "MBIC") cat("BIC:", result_bic$n_changepoints, "changepoints\n") cat("AIC:", result_aic$n_changepoints, "changepoints\n") cat("MBIC:", result_mbic$n_changepoints, "changepoints\n") ``` - **BIC**: Bayesian Information Criterion (balanced, default) - **AIC**: Akaike Information Criterion (more changepoints) - **MBIC**: Modified BIC (fewer changepoints) - **Manual**: Use a numeric value for custom penalty ### Minimum Segment Length Prevent very short segments: ```{r} result <- detect_regimes(data, method = "pelt", min_segment = 30) ``` ## Binary Segmentation A fast greedy approach: ```{r} result_binseg <- detect_regimes(data, method = "binseg", n_changepoints = 5) print(result_binseg) ``` Binary segmentation finds changepoints recursively but doesn't guarantee global optimum. ## Wild Binary Segmentation More robust than standard binary segmentation: ```{r} result_wbs <- detect_regimes(data, method = "wbs", M = 100) print(result_wbs) ``` WBS uses random intervals making it more robust to closely-spaced changepoints. ## Detecting Different Types of Changes ### Mean Changes ```{r} result_mean <- detect_regimes(data, type = "mean") ``` ### Variance Changes ```{r} # Data with variance change set.seed(123) var_data <- c(rnorm(100, 0, 1), rnorm(100, 0, 3)) result_var <- detect_regimes(var_data, type = "variance") print(result_var) ``` ### Mean and Variance Changes ```{r} result_both <- detect_regimes(data, type = "both") ``` ## Visualization ```{r} # Basic plot with changepoints plot(result_pelt, type = "data") ``` ```{r} # Segment-colored plot plot(result_pelt, type = "segments") ``` ## Segment Analysis Access segment information: ```{r} # Get segment details for (i in seq_along(result_pelt$segments)) { seg <- result_pelt$segments[[i]] cat(sprintf("Segment %d: [%d, %d] - Mean: %.2f, SD: %.2f\n", i, seg$start, seg$end, seg$params$mean, seg$params$sd)) } ``` ## Uncertainty Quantification Get confidence intervals using bootstrap: ```{r} result_ci <- detect_regimes(data, method = "pelt", uncertainty = TRUE, bootstrap_reps = 100) if (length(result_ci$confidence_intervals) > 0) { print(result_ci$confidence_intervals[[1]]) } ``` ## Evaluation Against Ground Truth ```{r} eval_result <- evaluate(result_pelt, true_changepoints = true_cps) print(eval_result) ``` Key metrics: - **Hausdorff distance**: Maximum error in changepoint location - **F1 score**: Balance of precision and recall - **Adjusted Rand Index**: Segmentation agreement corrected for chance ## Comparing Methods ```{r} comparison <- compare_methods( data = data, methods = c("pelt", "binseg", "wbs"), true_changepoints = true_cps ) print(comparison) ``` ## Best Practices 1. **Start with PELT** using BIC penalty 2. **Validate segment length** - use min_segment to avoid short segments 3. **Compare multiple methods** when stakes are high 4. **Use bootstrap CI** for critical applications 5. **Visualize results** to sanity-check detection ## Tips for Difficult Cases ### Closely Spaced Changepoints Use WBS instead of PELT: ```{r eval=FALSE} detect_regimes(data, method = "wbs", M = 200) ``` ### Small Change Magnitudes Lower the penalty: ```{r eval=FALSE} detect_regimes(data, method = "pelt", penalty = "AIC") ``` ### Many Changepoints Use ensemble methods: ```{r eval=FALSE} detect_regimes(data, method = "ensemble", methods = c("pelt", "wbs", "binseg")) ```