---
title: "Link Prediction and Association Rules"
output:
  rmarkdown::html_vignette:
    df_print: kable
vignette: >
  %\VignetteIndexEntry{Link Prediction and Association Rules}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.alt = "Visualization",
  warning = FALSE,
  message = FALSE
)
```

# Introduction

Nestimate provides two complementary tools for discovering hidden structure in networks:

- **Link prediction** --- which transitions are structurally expected but missing?
- **Association rules** --- which activities co-occur within sessions?

Both integrate with `cograph::plot_simplicial()` via the `pathways()` method.

# Data

We use the bundled `group_regulation_long` dataset --- self-regulated learning sequences from 2000 students with `Achiever` (High/Low) grouping.

```{r data}
library(Nestimate)
data(group_regulation_long)
head(group_regulation_long)
```

```{r network}
net <- build_network(
  group_regulation_long, method = "relative",
  actor = "Actor", action = "Action", time = "Time"
)
net
```

---

# Link Prediction

## Running predictions

`predict_links()` runs six methods by default and ranks them by consensus:

```{r predict}
pred <- predict_links(net, exclude_existing = FALSE)
pred
```

The consensus ranking averages each pair's rank across all methods --- pairs consistently ranked high are the strongest predictions:

```{r consensus}
head(pred$consensus, 10)
```

## Choosing a method

You can also run a single method:

```{r single-method}
pred_katz <- predict_links(net, methods = "katz", exclude_existing = FALSE)
pred_katz
```

| Method | Idea | Best for |
|--------|------|----------|
| `common_neighbors` | Shared out- and in-neighbors | Dense local structure |
| `resource_allocation` | Shared neighbors weighted by 1/degree | Hub-dominated networks |
| `adamic_adar` | Shared neighbors weighted by 1/log(degree) | General-purpose |
| `jaccard` | Shared / union neighbors | Cross-network comparison |
| `preferential_attachment` | out-degree(i) x in-degree(j) | Scale-free networks |
| `katz` | All paths weighted by length | Global structure |

## Sparse networks

Link prediction is most useful when edges are missing. Threshold the network to create gaps, then predict what's missing:

```{r sparse}
net_sparse <- build_network(
  group_regulation_long, method = "relative",
  actor = "Actor", action = "Action", time = "Time",
  threshold = 0.05
)
pred_sparse <- predict_links(net_sparse, methods = "resource_allocation")
pred_sparse
```

## Evaluating predictions

Use the full network's edges as ground truth to evaluate predictions from the sparse network:

```{r evaluation}
true_edges <- extract_edges(net)
pred_eval <- predict_links(net_sparse, exclude_existing = FALSE)
evaluate_links(pred_eval, true_edges[, c("from", "to")], k = c(3, 5, 10))
```

## Per-group predictions

Build separate networks per group and predict within each:

```{r per-group-link}
nets <- build_network(
  group_regulation_long, method = "relative",
  actor = "Actor", action = "Action", time = "Time",
  group = "Achiever", threshold = 0.05
)
lapply(nets, function(g) predict_links(g, methods = "katz", top_n = 3))
```

## Extracting pathways

`pathways()` converts predictions into strings for `cograph::plot_simplicial()`. With `evidence = TRUE` (default), each predicted edge is enriched with common neighbors that structurally bridge source to target:

```{r pathways-link}
pathways(pred_sparse, top = 5, evidence = TRUE)
```

`"A cn1 cn2 -> B"` means: A is predicted to connect to B, with cn1 and cn2 as the bridging nodes. Without evidence:

```{r pathways-link-simple}
pathways(pred_sparse, top = 5, evidence = FALSE)
```

---

# Association Rules

## Basic usage

`association_rules()` treats each sequence as a basket of its unique states, then finds co-occurrence patterns via Apriori:

```{r rules-basic}
rules <- association_rules(net, min_support = 0.3,
                           min_confidence = 0.5, min_lift = 1.0)
rules
```

## Understanding the output

```{r rules-summary}
summary(rules)
```

| Metric | Meaning |
|--------|---------|
| **Support** | P(A and B) --- how common is this pattern? |
| **Confidence** | P(B given A) --- how reliable is this implication? |
| **Lift** | > 1 = positive association, < 1 = negative |
| **Conviction** | Departure from independence (higher = stronger) |

## Adjusting thresholds

```{r threshold-comparison}
rules_broad  <- association_rules(net, min_support = 0.1,
                                  min_confidence = 0.3, min_lift = 0)
rules_strict <- association_rules(net, min_support = 0.5,
                                  min_confidence = 0.8, min_lift = 1.0)
data.frame(
  Setting = c("Broad (sup>=0.1, conf>=0.3)", "Strict (sup>=0.5, conf>=0.8, lift>=1)"),
  Rules   = c(rules_broad$n_rules, rules_strict$n_rules)
)
```

## Per-group rules

```{r per-group-rules}
nets <- build_network(
  group_regulation_long, method = "relative",
  actor = "Actor", action = "Action", time = "Time",
  group = "Achiever"
)
lapply(nets, function(g) {
  association_rules(g, min_support = 0.3, min_confidence = 0.5, min_lift = 1.0)
})
```

Rules unique to one group suggest activity combinations associated with that group's outcomes.

## Plotting rules

Support vs confidence scatter, with point size and color encoding lift:

```{r rules-plot, fig.alt = "Association rules scatter plot"}
rules <- association_rules(net, min_support = 0.2,
                           min_confidence = 0.4, min_lift = 1.0)
if (rules$n_rules > 0) plot(rules)
```

## Extracting pathways

Each rule `{A, B} => {C}` becomes `"A B -> C"`:

```{r pathways-rules}
rules <- association_rules(net, min_support = 0.3,
                           min_confidence = 0.5, min_lift = 1.0)
pathways(rules)
```

Filter to the strongest:

```{r pathways-rules-filtered}
pathways(rules, top = 5, min_lift = 1.2)
```

## Other input formats

Transaction list:

```{r rules-list}
transactions <- list(
  c("plan", "discuss", "execute"),
  c("plan", "research", "analyze"),
  c("discuss", "execute", "reflect"),
  c("plan", "discuss", "execute", "reflect"),
  c("research", "analyze", "reflect")
)
association_rules(transactions, min_support = 0.3,
                  min_confidence = 0.5, min_lift = 0)
```

Binary matrix:

```{r rules-matrix}
mat <- matrix(c(
  1, 1, 1, 0, 0,
  1, 0, 0, 1, 1,
  0, 1, 1, 0, 1,
  1, 1, 1, 1, 0,
  1, 0, 1, 0, 1
), nrow = 5, byrow = TRUE)
colnames(mat) <- c("plan", "discuss", "execute", "research", "reflect")
association_rules(mat, min_support = 0.3, min_confidence = 0.5, min_lift = 0)
```

---

# Simplicial Visualization

Both link predictions and association rules integrate with `cograph::plot_simplicial()` for higher-order visualization. Each pathway becomes a blob grouping source nodes, pointing to the target.

```{r simplicial-rules, eval = requireNamespace("cograph", quietly = TRUE), fig.width = 8, fig.height = 6, fig.alt = "Simplicial visualization of association rules"}
library(cograph)
net_sparse <- build_network(
  group_regulation_long, method = "relative",
  actor = "Actor", action = "Action", time = "Time",
  threshold = 0.05
)

# Association rules as simplicial blobs
rules <- association_rules(net_sparse, min_support = 0.3,
                           min_confidence = 0.5, min_lift = 1.0)
plot_simplicial(net_sparse, pathways(rules, top = 5),
                title = "Top Association Rules")
```

```{r simplicial-pred, eval = requireNamespace("cograph", quietly = TRUE), fig.width = 8, fig.height = 6, fig.alt = "Simplicial visualization of predicted links"}
# Predicted links with structural evidence
pred <- predict_links(net_sparse, methods = "resource_allocation")
plot_simplicial(net_sparse, pathways(pred, top = 5, evidence = TRUE),
                title = "Predicted Links with Evidence",
                dismantled = TRUE)
```

The **dismantled** view shows each pathway in its own panel.

Both styles work:

```r
# Via pathways() (works with any cograph version)
plot_simplicial(net, pathways(rules, top = 5))
plot_simplicial(net, pathways(pred, top = 5))

# Direct pass (requires cograph >= 1.9.0)
plot_simplicial(net, rules)
plot_simplicial(net, pred)
```