---
title: "Getting Started with predictset"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with predictset}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## What is conformal prediction?

Machine learning models produce point predictions - a single number for regression, a single class for classification. But in practice, you need to know *how uncertain* that prediction is.

**Conformal prediction** wraps any model in a layer of calibrated uncertainty quantification. Given a target coverage level (say 90%), it produces:

- **Prediction intervals** (regression) guaranteed to contain the true value at least 90% of the time
- **Prediction sets** (classification) guaranteed to contain the true class at least 90% of the time

The key property: this guarantee holds in **finite samples**, without distributional assumptions. The only requirement is that calibration and test data are exchangeable (roughly: drawn from the same distribution).

`predictset` implements the main conformal methods from the recent literature in a lightweight, model-agnostic R package with only two dependencies (`cli` and `stats`).

## Quick start: regression

Split conformal prediction with a linear model using formula shorthand:

```{r}
library(predictset)

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 5), ncol = 5)
y <- x[, 1] * 2 + x[, 2] + rnorm(n)
x_new <- matrix(rnorm(100 * 5), ncol = 5)

result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
print(result)
```

The `alpha` parameter controls the miscoverage rate: `alpha = 0.10` targets 90% coverage.

## Model interface

There are three ways to specify a model:

**1. Formula shorthand** (fits `lm` internally):

```{r}
result <- conformal_split(x, y, model = y ~ ., x_new = x_new)
```

**2. Fitted model** (auto-detected for `lm`, `glm`, `ranger`):

```{r}
fit <- lm(y ~ ., data = data.frame(y = y, x))
result <- conformal_split(x, y, model = fit, x_new = x_new)
```

**3. Custom model** via `make_model()` (works with anything):

```{r, eval = FALSE}
rf <- make_model(
  train_fun = function(x, y) {
    ranger::ranger(y ~ ., data = data.frame(y = y, x))
  },
  predict_fun = function(object, x_new) {
    predict(object, data = as.data.frame(x_new))$predictions
  },
  type = "regression"
)

result <- conformal_split(x, y, model = rf, x_new = x_new)
```

## Classification with Adaptive Prediction Sets

APS produces prediction sets that adapt to the difficulty of each observation. Easy cases get small sets (often a single class), while ambiguous cases get larger sets.

```{r, eval = FALSE}
set.seed(42)
n <- 600
x <- matrix(rnorm(n * 4), ncol = 4)
y <- factor(ifelse(x[, 1] > 0.5, "A", ifelse(x[, 2] > 0, "B", "C")))
x_new <- matrix(rnorm(100 * 4), ncol = 4)

clf <- make_model(
  train_fun = function(x, y) {
    ranger::ranger(y ~ ., data = data.frame(y = y, x), probability = TRUE)
  },
  predict_fun = function(object, x_new) {
    predict(object, data = as.data.frame(x_new))$predictions
  },
  type = "classification"
)

result <- conformal_aps(x, y, model = clf, x_new = x_new, alpha = 0.10)
print(result)

# Most predictions are a single class; ambiguous ones include 2-3
table(set_size(result))
```

## Mondrian conformal: group-conditional coverage

Standard conformal guarantees *marginal* coverage (across all test points), but coverage can vary across subgroups. Mondrian conformal computes a separate quantile for each group, guaranteeing coverage *within* each subgroup. This is critical for fairness and regulatory compliance.

```{r}
set.seed(42)
n <- 600
x <- matrix(rnorm(n * 3), ncol = 3)
groups <- factor(ifelse(x[, 1] > 0, "high", "low"))
y <- x[, 1] * 2 + ifelse(groups == "high", 3, 0.5) * rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
groups_new <- factor(ifelse(x_new[, 1] > 0, "high", "low"))

result <- conformal_mondrian(x, y, model = y ~ ., x_new = x_new,
                              groups = groups, groups_new = groups_new)

# Check per-group coverage
y_new <- x_new[, 1] * 2 + ifelse(groups_new == "high", 3, 0.5) * rnorm(200)
coverage_by_group(result, y_new, groups_new)
```

## Diagnostics

After producing predictions, use diagnostic functions to evaluate calibration and efficiency:

```{r, eval = FALSE}
# Empirical coverage (should be close to 1 - alpha)
coverage(result, y_test)

# Average interval width (regression) - narrower is better
mean(interval_width(result))

# Prediction set sizes (classification) - smaller is better
table(set_size(result))

# Coverage within subgroups (fairness check)
coverage_by_group(result, y_test, groups = groups_test)

# Coverage by prediction quantile bin
coverage_by_bin(result, y_test, bins = 5)

# Conformal p-values for outlier detection
pvals <- conformal_pvalue(result$scores, new_scores)
```

## Comparing methods

Benchmark multiple conformal methods side-by-side with `conformal_compare()`:

```{r}
set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
y_new <- x_new[, 1] * 2 + rnorm(200)

comp <- conformal_compare(x, y, model = y ~ ., x_new = x_new, y_new = y_new,
                           methods = c("split", "cv", "jackknife"))
print(comp)
```

## Choosing a method

| Scenario | Recommended method | Why |
|---|---|---|
| Large dataset, speed matters | `conformal_split()` | Single model fit, fastest |
| Small dataset, need tight intervals | `conformal_cv()` or `conformal_jackknife()`* | Uses all data for training and calibration |
| Heteroscedastic data | `conformal_cqr()` or `conformal_split(..., score_type = "normalized")` | Adaptive interval widths |
| Multi-class classification | `conformal_aps()` | Adaptive set sizes, well-calibrated |
| Classification with many classes | `conformal_raps()` | Regularized APS, smaller sets |
| Coverage must hold per subgroup | `conformal_mondrian()` / `conformal_mondrian_class()` | Group-conditional guarantees |
| Covariate shift between train/test | `conformal_weighted()` | Importance-weighted calibration |
| Sequential/online prediction | `conformal_aci()`** | Adapts to distribution drift |

\*Jackknife+ and CV+ have a theoretical coverage guarantee of 1-2α (Barber et al. 2021), weaker than split conformal's 1-α. In practice, coverage is typically near 1-α.
\*\*ACI provides asymptotic (not finite-sample) coverage guarantees.

## Further reading

- Angelopoulos & Bates (2023). [A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification](https://arxiv.org/abs/2107.07511). The best accessible introduction.
- Barber, Candes, Ramdas, Tibshirani (2021). [Predictive inference with the Jackknife+](https://doi.org/10.1214/20-AOS1965). Jackknife+ and CV+ methods.
- Romano, Sesia, Candes (2020). [Classification with valid and adaptive coverage](https://arxiv.org/abs/2006.02544). APS for classification.