---
title: "Co-occurrence Networks with Nestimate"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Co-occurrence Networks with Nestimate}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.alt = "Network visualization"
)
```

## Introduction

Co-occurrence networks analyze **binary indicator data**---situations where multiple states can be active (1) or inactive (0) simultaneously. Co-occurrence networks differ from transition networks  in that the former methods capture **contemporaneous relationships** (i.e., which states tend to occur together?) whereas the latter capture sequences (i.e., which states occur after one another?)

Nestimate provides two methods for binary data:

- Simple **Co-occurrence networks** (`method = "co_occurrence"` or `"cna"`) count how often pairs of states are both active at the same time point
- **Ising networks** (`method = "ising"`) use L1-regularized logistic regression to estimate conditional dependencies, producing sparse networks

This vignette demonstrates both methods using the `learning_activities` dataset---binary indicators of 6 learning activities across 200 students and 30 time points.

## Data

The `learning_activities` dataset contains 6,000 observations (200 students x 30 time points). At each time point, each of 6 learning activities is either active (1) or inactive (0).

```{r data}
library(Nestimate)
data(learning_activities)
head(learning_activities, 10)
```

The 6 activities are:

```{r activities}
activities <- c("Reading", "Video", "Forum", "Quiz", "Coding", "Review")
colSums(learning_activities[, activities])
```

Multiple activities can be active simultaneously:

```{r concurrent}
# How many activities are active at each time point?
n_active <- rowSums(learning_activities[, activities])
table(n_active)
```

## Co-occurrence Network

Co-occurrence networks count how often pairs of states are **both active** at the same time point. An edge between A and B indicates they frequently co-occur.

### Basic Co-occurrence

```{r cna}
net_cna <- build_network(learning_activities,
                         method = "co_occurrence",
                         codes = activities,
                         actor = "student")
net_cna
```


**Interpretation**: Edge weights represent raw co-occurrence counts summed across all students and time points. Higher weights mean those activities frequently happen together.

### Normalized Co-occurrence

Raw counts can be misleading---frequent activities will have high co-occurrence simply because they're common. Normalizing by expected co-occurrence (under independence) reveals associations beyond base rates.

```{r cna-weights}
# View the co-occurrence matrix
round(net_cna$weights, 0)
```

The diagonal shows how often each activity occurs (self-co-occurrence = frequency).

### Windowed Co-occurrence

You can aggregate across time windows before computing co-occurrence. This captures activities that occur in the same temporal neighborhood, not just the exact same time point:

```{r cna-window}
net_cna_windowed <- build_network(learning_activities,
                                   method = "co_occurrence",
                                   codes = activities,
                                   actor = "student",
                                   window_size = 10)
net_cna_windowed
```


Larger windows capture broader temporal associations at the cost of temporal precision.

## Ising Network

Ising networks use **L1-regularized logistic regression** to estimate conditional dependencies between binary variables. Each variable is regressed on all others, and the resulting coefficients form the network edges.

Key advantages over simple co-occurrence:

- **Sparsity**: L1 regularization shrinks weak associations to exactly zero
- **Conditional**: Edges represent direct relationships controlling for other variables
- **Interpretable**: Edge weights reflect log-odds of co-activation

### Basic Ising Network

```{r ising-check, include = FALSE}
has_glmnet <- requireNamespace("glmnet", quietly = TRUE)
```

```{r ising, eval = has_glmnet}
# Aggregate to student-level summaries for Ising (requires cross-sectional data)
student_summary <- aggregate(learning_activities[, activities],
                              by = list(student = learning_activities$student),
                              FUN = function(x) as.integer(mean(x) > 0.5))
student_summary <- student_summary[, -1]  # Remove student column

net_ising <- build_network(student_summary,
                           method = "ising",
                           params = list(gamma = 0.25))
net_ising
```

```{r ising-alt, eval = !has_glmnet, echo = FALSE}
cat("Ising networks require the 'glmnet' package.\n")
cat("Install with: install.packages('glmnet')\n")
```


### Ising Parameters

The `gamma` parameter controls sparsity via EBIC model selection:

- `gamma = 0`: Less sparse (BIC-like selection)
- `gamma = 0.25`: Moderate sparsity (default)
- `gamma = 0.5`: More sparse

```{r ising-sparse, eval = has_glmnet}
net_ising_sparse <- build_network(student_summary,
                                   method = "ising",
                                   params = list(gamma = 0.5))

# Compare edge counts
cat("Gamma = 0.25:", sum(net_ising$weights != 0), "edges\n")
cat("Gamma = 0.50:", sum(net_ising_sparse$weights != 0), "edges\n")
```

### Symmetrization Rules

Ising estimation produces asymmetric coefficients (A predicting B may differ from B predicting A). The `rule` parameter controls symmetrization:

- `"AND"` (default): Keep edge only if both directions are non-zero
- `"OR"`: Keep edge if either direction is non-zero

```{r ising-rules, eval = has_glmnet}
net_and <- build_network(student_summary, method = "ising",
                         params = list(gamma = 0.25, rule = "AND"))
net_or <- build_network(student_summary, method = "ising",
                        params = list(gamma = 0.25, rule = "OR"))

cat("AND rule edges:", sum(net_and$weights != 0), "\n")
cat("OR rule edges:", sum(net_or$weights != 0), "\n")
```

`"AND"` is more conservative; `"OR"` retains more edges.