--- title: "Bayesian Changepoint Detection" author: "José Mauricio Gómez Julián" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Bayesian Changepoint Detection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Introduction Bayesian changepoint detection offers several advantages over frequentist methods: 1. **Probabilistic uncertainty**: Full posterior distributions over changepoint locations 2. **Prior information**: Incorporate domain knowledge 3. **Natural online updates**: Sequential updating as new data arrives 4. **Existence probability**: Probability that a changepoint exists at each location ## BOCPD: Bayesian Online Changepoint Detection BOCPD (Adams & MacKay, 2007) is the flagship Bayesian method: ```{r message=FALSE, warning=FALSE} library(RegimeChange) # Generate example data set.seed(123) data <- c(rnorm(100, 0, 1), rnorm(100, 3, 1)) # Basic BOCPD result <- detect_regimes(data, method = "bocpd") print(result) ``` ## Prior Specification ### Normal-Gamma Prior For unknown mean and variance (most common case): ```{r} # Define prior prior <- normal_gamma( mu0 = 0, # Prior mean for the mean kappa0 = 1, # Prior precision for the mean alpha0 = 1, # Shape for inverse-gamma on variance beta0 = 1 # Rate for inverse-gamma on variance ) print(prior) ``` Using the prior: ```{r} result <- detect_regimes(data, method = "bocpd", prior = prior) ``` ### Normal Prior with Known Variance When variance is known: ```{r} prior_known <- normal_known_var( mu0 = 0, # Prior mean sigma0 = 1, # Prior standard deviation for mean known_var = 1 # Known variance ) result <- detect_regimes(data, method = "bocpd", prior = prior_known) ``` ### Poisson-Gamma Prior For count data: ```{r} prior_poisson <- poisson_gamma(alpha0 = 1, beta0 = 1) ``` ## Hazard Prior The hazard prior controls the expected frequency of changepoints: ### Geometric Hazard Constant hazard rate: ```{r} # Expected run length = 1/lambda = 100 hazard <- geometric_hazard(lambda = 0.01) print(hazard) ``` ### Constant Hazard Fixed probability per time step: ```{r} hazard_const <- constant_hazard(lambda = 0.05) ``` ### Negative Binomial Hazard For overdispersed run lengths: ```{r} hazard_nb <- negbin_hazard(r = 5, p = 0.1) ``` ## Posterior Analysis ### Probability of Change at Each Point ```{r} result <- detect_regimes(data, method = "bocpd") # Access changepoint probabilities prob_change <- result$prob_change # Plot posterior probability plot(prob_change, type = "l", main = "Posterior Probability of Changepoint", xlab = "Time", ylab = "P(changepoint)") abline(v = 100, col = "red", lty = 2) ``` ### Posterior Visualization ```{r} plot(result, type = "posterior") ``` ### Run Length Distribution ```{r} plot(result, type = "runlength") ``` ## Shiryaev-Roberts Procedure An alternative Bayesian approach optimal for minimizing detection delay: ```{r} result_sr <- detect_regimes(data, method = "shiryaev", mu0 = 0, # Pre-change mean mu1 = 3, # Post-change mean sigma = 1, # Known standard deviation threshold = 100) print(result_sr) ``` The Shiryaev-Roberts statistic: ```{r} plot(result_sr, type = "diagnostic") ``` ## Uncertainty Quantification ### Credible Intervals BOCPD provides natural credible intervals from the posterior: ```{r} # Highest posterior density interval if (length(result$changepoints) > 0) { cat("Changepoint estimate:", result$changepoints[1], "\n") # Find 95% credible interval prob <- result$prob_change cumprob <- cumsum(prob) / sum(prob) lower <- which(cumprob >= 0.025)[1] upper <- which(cumprob >= 0.975)[1] cat("95% Credible interval: [", lower, ",", upper, "]\n") } ``` ### Existence Probability Probability that at least one changepoint exists: ```{r} existence_prob <- max(result$prob_change) cat("Maximum changepoint probability:", round(existence_prob, 3), "\n") ``` ## Online Mode BOCPD is naturally suited for online detection: ```{r} # Create online detector detector <- regime_detector( method = "bocpd", prior = normal_gamma(), hazard = geometric_hazard(0.01), threshold = 0.5 ) # Simulate online processing set.seed(123) stream <- c(rnorm(100, 0, 1), rnorm(50, 3, 1)) alarm_time <- NULL for (i in seq_along(stream)) { detector <- update(detector, stream[i]) if (isTRUE(detector$last_result$alarm) && is.null(alarm_time)) { alarm_time <- i cat("Alarm at observation:", i, "\n") cat("Probability of change:", round(detector$last_result$prob_change, 3), "\n") break } } ``` ## Multivariate BOCPD For multivariate time series: ```{r} # Generate bivariate data set.seed(42) n <- 200 cp <- 100 data_mv <- rbind( matrix(rnorm(cp * 2, 0), ncol = 2), matrix(rnorm((n - cp) * 2, 2), ncol = 2) ) # Normal-Wishart prior for multivariate data prior_mv <- normal_wishart( mu0 = c(0, 0), kappa0 = 1, nu0 = 4, Psi0 = diag(2) ) result_mv <- detect_regimes(data_mv, method = "bocpd", prior = prior_mv) print(result_mv) ``` ## Comparison: BOCPD vs Frequentist ```{r} # Compare BOCPD with PELT comparison <- compare_methods( data = data, methods = c("bocpd", "pelt"), true_changepoints = 100 ) print(comparison) ``` ## Advantages and Limitations ### Advantages - Full posterior uncertainty quantification - Natural sequential updates - Principled prior incorporation - Works well with limited data ### Limitations - Requires prior specification - Computationally more intensive - Sensitive to prior misspecification ## Best Practices 1. **Choose appropriate priors**: Use domain knowledge when available 2. **Calibrate hazard rate**: Set based on expected changepoint frequency 3. **Check sensitivity**: Try different priors to assess robustness 4. **Use truncation**: For long series, truncate run length for efficiency 5. **Visualize posterior**: Always examine the full posterior distribution ## References Adams, R. P., & MacKay, D. J. C. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742. Tartakovsky, A. G., & Moustakides, G. V. (2010). State-of-the-art in Bayesian changepoint detection. Sequential Analysis, 29(2), 125-145.