--- title: "Cohort-by-time effect matrices and heatmaps" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cohort-by-time effect matrices and heatmaps} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) has_fect <- requireNamespace("fect", quietly = TRUE) has_dcdh <- requireNamespace("DIDmultiplegtDYN", quietly = TRUE) && requireNamespace("polars", quietly = TRUE) ``` ```{r setup} library(nonabsdid) ``` This is a separate feature line from the event-study workflow in *Getting started*. It supports **DCDH** and the **fect** family only; see [Why not PanelMatch?](#why-not-panelmatch) below. ## Event study vs. effect matrix The main `nonabsdid` workflow ([nabs_event_study()] / [nabs_event_plot()]) collapses every treated cohort onto a single relative-time axis: one curve per estimator. That is the right summary most of the time, but it hides *which* cohorts drive the average. The **effect matrix** keeps the onset cohort as a second dimension. Instead of a curve you get a grid -- rows are onset cohorts, columns are relative (or calendar) time, and the fill is the estimated effect -- drawn as a heatmap. It is the two-dimensional companion to the event-study overlay, built from the same estimator objects. Three user-facing pieces mirror the event-study API: - `nabs_effect_cells()` -- fit one estimator and return its cohort cells. - `as_nabs_effect_cells()` -- coerce an existing estimator object into the cell schema. - `plot_effect_matrix()` -- draw one or more cell tables as heatmaps. ## A toy non-absorbing panel ```{r toy} set.seed(1) N <- 120; TT <- 14 panel <- expand.grid(id = 1:N, t = 1:TT) grp <- panel$id %% 4 # group 0 = never treated onset <- c(`1` = 4L, `2` = 6L, `3` = 8L)[as.character(grp)] # a quarter of switchers turn OFF again 3 periods later (non-absorbing) off <- (panel$id %% 8 == 1) & !is.na(onset) & panel$t >= onset + 3L panel$d <- as.integer(!is.na(onset) & panel$t >= onset & !off) panel$y <- rnorm(N, sd = .5)[panel$id] + 0.15 * panel$t + ifelse(panel$d == 1, 0.4, 0) + rnorm(nrow(panel)) ``` ## One-step fit: `nabs_effect_cells()` `nabs_effect_cells()` wires up what a cohort breakdown needs for each estimator (a unit-level onset cohort for DCDH; `keep.sims = TRUE` for fect bootstrap cell SEs), so you only pass the usual arguments. ```{r fect-fit, eval = has_fect} res_ife <- nabs_effect_cells( panel, outcome = "y", treatment = "d", unit = "id", time = "t", method = "IFE", lags = 4, leads = 6, nboots = 100 ) res_ife$cells ``` ```{r fect-plot, eval = has_fect} plot_effect_matrix(res_ife$cells, show_estimates = TRUE, show_se = TRUE) ``` A single-method call is titled with the method automatically, and `show_se = TRUE` prints the standard error (in parentheses) beneath each estimate. The fect surface only covers **treated** cells, so the matrix starts at `event_time = 0` (the first treated period) and has no pre-period column. For DCDH, `dcdh_strategy = "loop"` (the default) re-estimates the event study once per onset cohort against the never-treated units; `"by"` instead issues a single `did_multiplegt_dyn(..., by = cohort)` call. ```{r dcdh-fit, eval = has_dcdh} res_dcdh <- nabs_effect_cells( panel, outcome = "y", treatment = "d", unit = "id", time = "t", method = "DCDH", lags = 3, leads = 5, dcdh_strategy = "loop" ) plot_effect_matrix(res_dcdh$cells, show_estimates = TRUE, show_se = TRUE) ``` Unlike fect, DCDH reports placebo (pre-period) cells and a reference period (normalized to `0` at `event_time = -1`), so its matrix spans negative event time too. ## Comparing methods The recommended view is **one heatmap per method** (above): each is titled with its method and stays readable. If you do want them in one figure, passing several cell tables facets them with a shared fill scale and legend: ```{r side-by-side, eval = has_fect && has_dcdh} plot_effect_matrix(res_dcdh$cells, res_ife$cells) ``` The faceted view is convenient but gets crowded fast (especially with in-tile labels), which is why per-method plots are the default emphasis. Either way, **read this as triangulation of the *pattern*, not as cell-by-cell equality.** The two estimators line up on the same axes -- both define the cohort as the onset period and anchor `event_time = 0` at the first treated period -- but they do not target identical quantities: - *Different estimands / identification.* fect imputes a counterfactual ($Y - \hat Y(0)$) from a fixed-effects / factor model; DCDH forms long differences from the period before each switch. Level offsets between the two are expected. - *Different controls.* The DCDH `"loop"` strategy compares each cohort to the never-treated; fect's counterfactual is model-based over all controls. - *Different coverage.* fect is post-only; DCDH adds placebos and a reference cell. - *Non-absorbing wrinkle.* The cohort is the *first* onset. Units that switch off and on again contribute to large-`event_time` cells under both methods, but each handles carryover differently, so those cells are the least comparable. The fill encodes the point estimate only. Standard errors live in the `std.error` column and can be drawn in each tile with `show_se = TRUE` (`"bootstrap"` for fect, `"native"` or CI-recovered `"ci"` for DCDH; see the schema below). For claims about whether two cells differ, look at those SEs rather than the colours. ## Working from existing objects If you already fit an estimator, coerce it with `as_nabs_effect_cells()`. For fect you need `imputed_outcomes()` (fect >= 2.4.0); for bootstrap cell SEs the fit must have been run with `se = TRUE, keep.sims = TRUE`. For DCDH, pass an object run with a unit-level cohort `by` variable. ```{r from-existing, eval = FALSE} fit <- fect::fect(y ~ d, data = panel, index = c("id", "t"), method = "fe", force = "two-way", se = TRUE, nboots = 100, keep.sims = TRUE) cells <- as_nabs_effect_cells(fit, method = "FE", outcome = "y") ``` A data-frame escape hatch needs no estimator packages -- handy for testing the plot or building cells from numbers you already have: ```{r escape-hatch} raw <- expand.grid(cohort = c(4L, 6L, 8L), event_time = -2:5) raw$estimate <- with(raw, ifelse(event_time < 0, 0, 0.4 + 0.05 * event_time)) raw$std.error <- 0.07 cells <- as_nabs_effect_cells(raw, method = "FE", outcome = "y") plot_effect_matrix(cells, show_estimates = TRUE, show_se = TRUE) ``` ## Collapsing back to an event study `aggregate_effects()` averages cells over cohorts and returns a `nabs_event_study_tbl`, making explicit that the event study is the cohort-collapsed view of the same cells. (Re-aggregated standard errors are not computed, so they come back `NA`; use it for a quick overlay, not inference.) ```{r aggregate, eval = has_fect} agg <- aggregate_effects(res_ife$cells, by = "event_time") nabs_event_plot(agg, xlim = c(0, 6)) ``` ## The cell schema `as_nabs_effect_cells()` returns a tibble of class `nabs_effect_cell_tbl`: | column | type | description | |-----------------|------|--------------------------------------------------------| | `cohort` | int | Onset calendar period (first treated period). | | `event_time` | int | Relative period; `0` = onset. | | `calendar_time` | int | `cohort + event_time` (may be `NA`). | | `estimate` | num | Cell point estimate. | | `std.error` | num | Standard error (may be `NA`). | | `conf.low/high` | num | CI bounds. | | `n` | int | Treated cells aggregated (fect only; `NA` for DCDH). | | `window` | chr | `"pre"` / `"post"`. | | `method` | chr | Estimator label. | | `outcome` | chr | Outcome name. | | `se_method` | chr | `"bootstrap"` (fect), `"native"`/`"ci"` (DCDH), or `"none"`. | The `se_method` column records how uncertainty was produced. fect cells use the bootstrap surface (`imputed_outcomes(replicates = TRUE)`), re-aggregated within each replicate; DCDH cells carry the estimator's own SEs; otherwise SEs are `NA`. ## Why not PanelMatch? {#why-not-panelmatch} A faithful cohort matrix needs cohort-level estimates *and* cohort-level uncertainty. For DCDH and fect both fall out of objects the packages already expose. PanelMatch reports lead-specific ATTs aggregated over all matched sets; recovering a per-cohort cell means re-aggregating matched-set effects by switch time **and** re-running the matched-set bootstrap on that re-aggregation to get honest SEs. That is real work and out of scope for this pass, so PanelMatch is omitted here rather than shipped with naive (wrong) standard errors. The `se_method` column is reserved so a PanelMatch path can slot in later without changing the plotting code.