---
title: "Introduction to CausalSpline: Nonlinear Causal Dose-Response Estimation"
author: "Your Name"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to CausalSpline}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5,
  warning = FALSE
)
```

## Why nonlinear causal effects?

Most causal inference software assumes the treatment effect enters linearly:

$$Y = \beta_0 + \beta_1 T + \gamma X + \varepsilon$$

But many real-world relationships are genuinely nonlinear:

- **Dosage effects** in pharmacology (toxicity thresholds)
- **Pollution-health curves** (non-monotone at extreme exposures)
- **Education/income** (diminishing returns)
- **Policy intensity** (minimum effective dose, saturation)

**CausalSpline** replaces the linear $\beta_1 T$ with a spline $f(T)$:

$$Y = \beta_0 + f(T) + \gamma X + \varepsilon, \quad E[Y(t)] = \beta_0 + f(t)$$

under standard unconfoundedness and positivity assumptions.

---

## Installation

```r
# From CRAN (once published)
install.packages("CausalSpline")

# Development version from GitHub
remotes::install_github("yourgithub/CausalSpline")
```

---

## Quick start

```{r setup}
library(CausalSpline)
```

### 1. Simulate data with a threshold effect

```{r simulate}
set.seed(42)
dat <- simulate_dose_response(n = 600, dgp = "threshold", confounding = 0.6)
head(dat)
```

The true dose-response is flat below $T = 3$, then rises linearly:

```{r true-curve, fig.cap="True vs observed relationship"}
plot(dat$T, dat$Y, pch = 16, col = rgb(0, 0, 0, 0.2),
     xlab = "Treatment T", ylab = "Outcome Y",
     main = "Observed data (confounded)")
lines(sort(dat$T), dat$true_effect[order(dat$T)],
      col = "red", lwd = 2)
legend("topleft", legend = "True f(T)", col = "red", lwd = 2)
```

### 2. Fit with IPW

```{r fit-ipw}
fit_ipw <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data       = dat,
  method     = "ipw",
  df_exposure = 5,
  eval_grid  = 100
)
summary(fit_ipw)
```

```{r plot-ipw, fig.cap="IPW estimated dose-response with 95% CI"}
# Build true curve data frame for overlay
truth_df <- data.frame(
  t           = dat$T,
  true_effect = dat$true_effect
)
plot(fit_ipw, truth = truth_df)
```

### 3. Fit with G-computation

```{r fit-gcomp}
fit_gc <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data        = dat,
  method      = "gcomp",
  df_exposure = 5
)
plot(fit_gc, truth = truth_df,
     title = "G-computation — Threshold DGP")
```

### 4. Check overlap (positivity)

```{r overlap}
ov <- check_overlap(dat$T, fit_ipw$weights, plot = TRUE)
cat("ESS:", round(ov$ess), "/ n =", nrow(dat), "\n")
ov$plot
```

---

## Comparing DGPs

```{r compare-dgps, fig.height=8, fig.width=7}
dgps <- c("threshold", "diminishing", "nonmonotone", "sinusoidal")
plots <- lapply(dgps, function(d) {
  dat_d <- simulate_dose_response(500, dgp = d, seed = 1)
  fit_d <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat_d,
                          method = "ipw", df_exposure = 5,
                          verbose = FALSE)
  truth_d <- data.frame(t = dat_d$T, true_effect = dat_d$true_effect)
  plot(fit_d, truth = truth_d,
       title = paste("DGP:", d),
       rug = FALSE)
})

# Combine with patchwork (if available) or print individually
for (p in plots) print(p)
```

---

## Choosing degrees of freedom

The `df_exposure` argument controls spline flexibility. Too few df = underfitting; too many = high variance. As a guide:

| Shape                  | Recommended df |
|------------------------|----------------|
| Linear / simple trend  | 3              |
| One bend / threshold   | 4–5            |
| Inverted-U / hump      | 5–6            |
| Oscillatory            | 7–10           |

You can use AIC/BIC on the outcome model or cross-validation for selection.

---

## Methods summary

| Argument           | Consistent if ...                          |
|--------------------|--------------------------------------------|
| `method = "ipw"`   | GPS model correctly specified              |
| `method = "gcomp"` | Outcome model correctly specified          |
| `method = "dr"`    | At least one of the two models is correct  |

---

## References

- Hirano, K., & Imbens, G. W. (2004). *The propensity score with continuous treatments.* doi:10.1002/0470090456.ch7  
- Imbens, G. W. (2000). *The role of the propensity score in estimating dose-response functions.* Biometrika.  
- Flores et al. (2012). *Estimating the effects of length of exposure to instruction.* Review of Economics and Statistics.