--- title: "Introduction to Dormancy: Detecting Hidden Patterns in Data" author: "Dany Mukesha" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Dormancy: Detecting Hidden Patterns in Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Overview The **dormancy** package introduces a novel framework for detecting and analyzing *dormant patterns* in multivariate data. Unlike traditional pattern detection methods that focus on currently active relationships, dormancy identifies statistical patterns that exist but remain inactive until specific trigger conditions emerge. ### What are Dormant Patterns? A dormant pattern is a genuine statistical relationship that is currently suppressed by prevailing conditions in your data. Consider these real-world analogies: - **Biological dormancy**: Seeds can remain dormant for years, only germinating when conditions (temperature, moisture) are right - **Geological dormancy**: Fault lines can be dormant for centuries before triggering earthquakes - **Epidemiological latency**: Pathogens can exist in a latent state before becoming active In data analysis, dormant patterns manifest as: 1. Relationships that are strong in specific data regions but weak overall 2. Correlations that only emerge when certain threshold conditions are met 3. Patterns hidden by confounding variables that, when controlled for, reveal strong associations ### Why Dormancy Matters Traditional correlation analysis can miss dormant patterns because: - Aggregate statistics mask conditional relationships - Weak overall correlations may hide strong local correlations - Threshold effects create piecewise relationships - Phase-dependent patterns vary across the data space The dormancy package addresses these limitations with specialized detection methods. ## Getting Started ```{r setup} library(dormancy) ``` ### Basic Detection Let's create a simple example with a dormant pattern: ```{r basic-example} set.seed(42) n <- 500 # Create variables x <- rnorm(n) condition <- sample(c(0, 1), n, replace = TRUE) # Relationship between x and y only exists when condition == 1 y <- ifelse(condition == 1, 0.8 * x + rnorm(n, 0, 0.3), # Strong relationship rnorm(n)) # No relationship data <- data.frame(x = x, y = y, condition = factor(condition)) # Overall correlation is weak cor(data$x, data$y) ``` The overall correlation masks the strong relationship that exists under specific conditions. Let's detect the dormant pattern: ```{r detect} result <- dormancy_detect(data, method = "conditional", verbose = TRUE) print(result) ``` The detection reveals that the relationship between x and y is dormant - it only becomes active when `condition == 1`. ## Detection Methods The package provides four complementary detection methods: ### 1. Conditional Detection Identifies patterns that are conditionally suppressed - active only under specific conditions. ```{r conditional} result_cond <- dormancy_detect(data, method = "conditional") ``` This method works by segmenting the data space and comparing relationship strengths across segments. ### 2. Threshold Detection Identifies patterns that emerge when variables cross specific thresholds. ```{r threshold} set.seed(123) n <- 400 x <- runif(n, -2, 2) # Relationship only emerges for extreme values of x y <- ifelse(abs(x) > 1, 0.9 * sign(x) + rnorm(n, 0, 0.2), rnorm(n, 0, 0.5)) data_thresh <- data.frame(x = x, y = y) result_thresh <- dormancy_detect(data_thresh, method = "threshold") print(result_thresh) ``` ### 3. Phase Detection Identifies patterns that exist in specific phase regions of the data space. ```{r phase} set.seed(456) n <- 300 t <- seq(0, 4*pi, length.out = n) x <- sin(t) + rnorm(n, 0, 0.2) y <- cos(t) + rnorm(n, 0, 0.2) # Add relationship only in certain phases phase <- atan2(y, x) y <- ifelse(phase > 0 & phase < pi/2, 0.7 * x + rnorm(n, 0, 0.2), y) data_phase <- data.frame(x = x, y = y) result_phase <- dormancy_detect(data_phase, method = "phase") ``` ### 4. Cascade Detection Identifies patterns that could trigger chain reactions through other variables. ```{r cascade} set.seed(789) n <- 300 x <- rnorm(n) y <- rnorm(n) z <- rnorm(n) # Relationship x-y emerges when z is extreme y <- ifelse(abs(z) > 1.5, 0.8 * x + rnorm(n, 0, 0.2), y) data_cascade <- data.frame(x = x, y = y, z = z) result_cascade <- dormancy_detect(data_cascade, method = "cascade") ``` ## Trigger Analysis Once dormant patterns are detected, analyze their trigger conditions: ```{r trigger} set.seed(42) n <- 500 x <- rnorm(n) z <- sample(c(0, 1), n, replace = TRUE) y <- ifelse(z == 1, 0.8 * x + rnorm(n, 0, 0.3), rnorm(n)) data <- data.frame(x = x, y = y, z = factor(z)) result <- dormancy_detect(data, method = "conditional") if (nrow(result$patterns) > 0) { triggers <- dormancy_trigger(result, sensitivity = 0.5, n_bootstrap = 50) print(triggers) } ``` The trigger analysis provides: - Specific trigger conditions for each pattern - Confidence intervals via bootstrap - Activation probabilities - Actionable recommendations ## Depth Measurement Measure how "deeply asleep" each dormant pattern is: ```{r depth} if (nrow(result$patterns) > 0) { depths <- dormancy_depth(result, method = "combined") print(depths) } ``` Depth measurements help prioritize monitoring: - **Shallow dormancy**: Easily activated, needs immediate attention - **Deep dormancy**: Requires significant changes to activate, lower priority ## Risk Assessment Evaluate the risk associated with dormant patterns: ```{r risk} if (nrow(result$patterns) > 0) { risk <- dormancy_risk(result, time_horizon = 1, risk_tolerance = 0.3) print(risk) } ``` Risk assessment considers: - Activation probability - Impact magnitude - Cascade potential - Uncertainty in estimates ## Awakening Simulation Simulate what would happen if a dormant pattern activated: ```{r awaken} if (nrow(result$patterns) > 0) { awakening <- awaken(result, pattern_id = 1, intensity = 1, n_sim = 100) print(awakening) } ``` This is valuable for: - Scenario planning - Stress testing - Understanding potential system behaviors ## Hibernation Analysis Identify patterns that were once active but have become dormant: ```{r hibernate} set.seed(42) n <- 400 time <- 1:n x <- rnorm(n) # Relationship that fades over time effect_strength <- exp(-time / 150) y <- effect_strength * 0.8 * x + (1 - effect_strength) * rnorm(n) data_time <- data.frame(time = time, x = x, y = y) hib <- hibernate(data_time, time_var = "time", window_size = 0.15) print(hib) ``` ## Scouting for Potential Map the data space to find regions where dormant patterns might emerge: ```{r scout} set.seed(42) n <- 300 x <- rnorm(n) y <- rnorm(n) # Create a region with hidden pattern mask <- x > 1 & y > 1 y[mask] <- 0.9 * x[mask] + rnorm(sum(mask), 0, 0.1) data_scout <- data.frame(x = x, y = y) scout <- dormancy_scout(data_scout, grid_resolution = 15, scout_method = "density") print(scout) ``` ## Visualization Plot the results: ```{r plot, eval=FALSE} # Overview plot dormancy:::plot.dormancy(result, type = "overview") # Pattern network dormancy:::plot.dormancy(result, type = "patterns") # Risk-focused dormancy:::plot.dormancy(result, type = "risk") # Scout map dormancy:::plot.dormancy_map(scout) ``` ```{r example_visual} dormancy:::plot.dormancy_map(scout) ``` ## Practical Applications ### Financial Risk Analysis Detect dormant correlations that could activate during market stress: ```r # Assume financial_data has returns for multiple assets result <- dormancy_detect(financial_data, method = "threshold") risk <- dormancy_risk(result, time_horizon = 30) # 30-day horizon ``` ### Quality Control Find patterns that only emerge under certain production conditions: ```r # Assume production_data with process variables result <- dormancy_detect(production_data, method = "conditional") triggers <- dormancy_trigger(result) ``` ### Environmental Monitoring Identify dormant patterns that could signal ecological shifts: ```r # Assume environmental_data with sensor readings scout <- dormancy_scout(environmental_data) hib <- hibernate(environmental_data, time_var = "date") ``` ## Summary The dormancy package provides a comprehensive toolkit for: 1. **Detection**: Four methods for finding dormant patterns 2. **Characterization**: Understand trigger conditions and dormancy depth 3. **Risk Assessment**: Quantify activation risk and potential impact 4. **Simulation**: Explore "what-if" scenarios 5. **Monitoring**: Track patterns that have hibernated or scout for future risks By focusing on patterns that *could* become active rather than just currently active patterns, you gain insights into latent risks and opportunities that traditional methods miss.