--- title: "Co-occurrence Networks with Nestimate" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Co-occurrence Networks with Nestimate} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.alt = "Network visualization" ) ``` ## Introduction Co-occurrence networks analyze **binary indicator data**---situations where multiple states can be active (1) or inactive (0) simultaneously. Co-occurrence networks differ from transition networks in that the former methods capture **contemporaneous relationships** (i.e., which states tend to occur together?) whereas the latter capture sequences (i.e., which states occur after one another?) Nestimate provides two methods for binary data: - Simple **Co-occurrence networks** (`method = "co_occurrence"` or `"cna"`) count how often pairs of states are both active at the same time point - **Ising networks** (`method = "ising"`) use L1-regularized logistic regression to estimate conditional dependencies, producing sparse networks This vignette demonstrates both methods using the `learning_activities` dataset---binary indicators of 6 learning activities across 200 students and 30 time points. ## Data The `learning_activities` dataset contains 6,000 observations (200 students x 30 time points). At each time point, each of 6 learning activities is either active (1) or inactive (0). ```{r data} library(Nestimate) data(learning_activities) head(learning_activities, 10) ``` The 6 activities are: ```{r activities} activities <- c("Reading", "Video", "Forum", "Quiz", "Coding", "Review") colSums(learning_activities[, activities]) ``` Multiple activities can be active simultaneously: ```{r concurrent} # How many activities are active at each time point? n_active <- rowSums(learning_activities[, activities]) table(n_active) ``` ## Co-occurrence Network Co-occurrence networks count how often pairs of states are **both active** at the same time point. An edge between A and B indicates they frequently co-occur. ### Basic Co-occurrence ```{r cna} net_cna <- build_network(learning_activities, method = "co_occurrence", codes = activities, actor = "student") net_cna ``` **Interpretation**: Edge weights represent raw co-occurrence counts summed across all students and time points. Higher weights mean those activities frequently happen together. ### Normalized Co-occurrence Raw counts can be misleading---frequent activities will have high co-occurrence simply because they're common. Normalizing by expected co-occurrence (under independence) reveals associations beyond base rates. ```{r cna-weights} # View the co-occurrence matrix round(net_cna$weights, 0) ``` The diagonal shows how often each activity occurs (self-co-occurrence = frequency). ### Windowed Co-occurrence You can aggregate across time windows before computing co-occurrence. This captures activities that occur in the same temporal neighborhood, not just the exact same time point: ```{r cna-window} net_cna_windowed <- build_network(learning_activities, method = "co_occurrence", codes = activities, actor = "student", window_size = 10) net_cna_windowed ``` Larger windows capture broader temporal associations at the cost of temporal precision. ## Ising Network Ising networks use **L1-regularized logistic regression** to estimate conditional dependencies between binary variables. Each variable is regressed on all others, and the resulting coefficients form the network edges. Key advantages over simple co-occurrence: - **Sparsity**: L1 regularization shrinks weak associations to exactly zero - **Conditional**: Edges represent direct relationships controlling for other variables - **Interpretable**: Edge weights reflect log-odds of co-activation ### Basic Ising Network ```{r ising-check, include = FALSE} has_glmnet <- requireNamespace("glmnet", quietly = TRUE) ``` ```{r ising, eval = has_glmnet} # Aggregate to student-level summaries for Ising (requires cross-sectional data) student_summary <- aggregate(learning_activities[, activities], by = list(student = learning_activities$student), FUN = function(x) as.integer(mean(x) > 0.5)) student_summary <- student_summary[, -1] # Remove student column net_ising <- build_network(student_summary, method = "ising", params = list(gamma = 0.25)) net_ising ``` ```{r ising-alt, eval = !has_glmnet, echo = FALSE} cat("Ising networks require the 'glmnet' package.\n") cat("Install with: install.packages('glmnet')\n") ``` ### Ising Parameters The `gamma` parameter controls sparsity via EBIC model selection: - `gamma = 0`: Less sparse (BIC-like selection) - `gamma = 0.25`: Moderate sparsity (default) - `gamma = 0.5`: More sparse ```{r ising-sparse, eval = has_glmnet} net_ising_sparse <- build_network(student_summary, method = "ising", params = list(gamma = 0.5)) # Compare edge counts cat("Gamma = 0.25:", sum(net_ising$weights != 0), "edges\n") cat("Gamma = 0.50:", sum(net_ising_sparse$weights != 0), "edges\n") ``` ### Symmetrization Rules Ising estimation produces asymmetric coefficients (A predicting B may differ from B predicting A). The `rule` parameter controls symmetrization: - `"AND"` (default): Keep edge only if both directions are non-zero - `"OR"`: Keep edge if either direction is non-zero ```{r ising-rules, eval = has_glmnet} net_and <- build_network(student_summary, method = "ising", params = list(gamma = 0.25, rule = "AND")) net_or <- build_network(student_summary, method = "ising", params = list(gamma = 0.25, rule = "OR")) cat("AND rule edges:", sum(net_and$weights != 0), "\n") cat("OR rule edges:", sum(net_or$weights != 0), "\n") ``` `"AND"` is more conservative; `"OR"` retains more edges.