--- title: "nonabsdid for Stata users" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{nonabsdid for Stata users} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} has_haven <- requireNamespace("haven", quietly = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` ```{r setup} library(nonabsdid) ``` This vignette is for researchers whose main workflow is in Stata. It covers: 1. **Why bother**: which of the estimators wrapped here exist in Stata, and which are R-only. 2. **Getting data in**: reading `.dta` files with `nabs_read_dta()`, and the labelled-variable / extended-missing pitfalls it handles for you. 3. **A Rosetta stone**: option-by-option mapping from Stata's `did_multiplegt_dyn` to `nabs_event_study()`. 4. **Stata-style argument aliases**: `group`, `effects`, `placebo`, and `df` are accepted directly. 5. **Getting results out**: writing estimates back to `.dta` with `nabs_write_dta()` so you (or a coauthor) can finish in Stata. ## 1. Why use R for this at all? Of the heterogeneity-robust estimators that `nonabsdid` harmonizes, only one has an official Stata implementation: | Estimator | Stata | R | |---|---|---| | DCDH (de Chaisemartin & D'Haultfoeuille) | `did_multiplegt_dyn` (SSC) | `DIDmultiplegtDYN` | | PanelMatch (Imai, Kim, & Wang) | — | `PanelMatch` | | fect: IFE / FE-imputation / MC (Liu, Wang, & Xu) | — | `fect` | If your treatment is **non-absorbing** (it can switch on and off) and you want to compare DCDH against matching-based and imputation/factor-model-based estimators on the same axis, R is currently the only place where all of them live. `nonabsdid` exists to make that comparison a few lines of code; this vignette exists to make those lines feel familiar if you arrive from Stata. Because the same DCDH estimator is implemented in both languages by the same authors, the DCDH series is also your *bridge for trust*: run `did_multiplegt_dyn` on the same data in Stata and through `nonabsdid`, check that the point estimates agree, and then read the R-only estimators with the same confidence. (Pin the version of `DIDmultiplegtDYN` you used; see "Reproducibility" at the end.) ## 2. Getting your data in: `nabs_read_dta()` The two classic stumbling blocks when moving a `.dta` file into R are: * **Value labels.** Stata variables with `label values` arrive in R as `haven_labelled` vectors, which most estimation packages (including the ones wrapped here) do not understand. * **Extended missing values.** Stata's `.a`–`.z` arrive as *tagged* `NA`s, which look like ordinary `NA` when printed but are a distinct thing internally. `nabs_read_dta()` handles both with sensible defaults: labelled columns become factors, and all extended missings collapse to regular `NA`. ```{r, eval = has_haven} # For this vignette we fabricate a .dta file; in real life you already # have one. tmp <- tempfile(fileext = ".dta") panel <- expand.grid(id = 1:60, t = 1:10) panel$d <- with(panel, as.integer( (id %% 4 == 1 & t %in% 4:7) | (id %% 4 == 2 & t %in% 5:8) | (id %% 4 == 3 & t %in% 6:9) )) panel$y <- 0.2 * panel$t + 0.5 * panel$d + rnorm(nrow(panel)) haven::write_dta(panel, tmp) mydata <- nabs_read_dta(tmp) head(mydata) ``` If a labelled variable is really numeric — a 0/1 treatment dummy that happens to carry "treated"/"untreated" labels is the common case — use `labelled = "numeric"` to keep the underlying codes: ```{r, eval = FALSE} mydata <- nabs_read_dta("mypanel.dta", labelled = "numeric") ``` You can also skip the explicit read entirely: `nabs_event_study()` and `nabs_event_study_simple()` accept a path to a `.dta` file as their `data` argument. ```{r, eval = FALSE} res <- nabs_event_study_simple( "mypanel.dta", outcome = "y", treatment = "d", unit = "id", time = "t" ) ``` ## 3. Rosetta stone: `did_multiplegt_dyn` → `nabs_event_study()` A typical Stata call: ```stata did_multiplegt_dyn y, group(id) time(t) treatment(d) /// effects(8) placebo(6) cluster(state) controls(x1 x2) ``` The equivalent through `nonabsdid`: ```{r, eval = FALSE} res <- nabs_event_study( mydata, outcome = "y", treatment = "d", unit = "id", # Stata: group() time = "t", method = "DCDH", leads = 7, # Stata: effects(8) -> leads = 8 - 1 lags = 6, # Stata: placebo(6) cluster = "state", controls = c("x1", "x2") ) ``` Option by option: | Stata (`did_multiplegt_dyn`) | `nabs_event_study()` | Note | |---|---|---| | `varlist` first variable (Y) | `outcome = "y"` | | | `group(id)` | `unit = "id"` | | | `time(t)` | `time = "t"` | | | `treatment(d)` | `treatment = "d"` | | | `effects(k)` | `leads = k - 1` | see below | | `placebo(k)` | `lags = k` | same count of placebos | | `cluster(v)` | `cluster = "v"` | defaults to `unit` | | `controls(x1 x2)` | `controls = c("x1", "x2")` | | | any other option | pass through `...` | forwarded to `DIDmultiplegtDYN::did_multiplegt_dyn()` | **Why `leads = effects - 1`?** Pure axis convention, not a difference in the estimator. `did_multiplegt_dyn` counts `effects(k)` post-treatment estimates labelled 1 through *k*; `nonabsdid` places treatment onset at relative time 0, so a window of `leads` produces estimates at 0, 1, ..., `leads` — that is, `leads + 1` post-period estimates. `effects(8)` in Stata and `leads = 7` here produce the *identical* underlying call and the same number of estimated effects; only the x-axis labels shift by one. The pre-period side has no shift: `placebo(6)` and `lags = 6` both give six placebo estimates. For options the unified wrapper doesn't name explicitly (e.g. `normalized`, `switchers`, `trends_nonparam`), pass them through `...` using the R package's argument names — they generally match the Stata option names — or call `DIDmultiplegtDYN::did_multiplegt_dyn()` directly and tidy the result with `as_nabs_event_study()`. ### What about csdid / did_imputation / xtevent? `csdid` (Callaway–Sant'Anna), `did_imputation` (Borusyak–Jaravel–Spiess), and `eventstudyinteract` (Sun–Abraham) are built for **absorbing** treatment (staggered adoption with no reversals). If your treatment switches off, those designs don't apply directly — that is exactly the gap `nonabsdid`'s estimator set targets. There is no option-level translation to give, because the estimators are different; conceptually, your `csdid`-style event-study plot maps onto `nabs_event_study_simple()`'s overlay figure. ## 4. Stata-style argument aliases If you paste arguments from a Stata script, the wrappers understand the Stata names directly and tell you how they were translated: ```{r, eval = FALSE} # These two calls are identical: nabs_event_study(mydata, outcome = "y", treatment = "d", time = "t", method = "DCDH", group = "id", effects = 8, placebo = 6) #> Translated Stata-style arguments: #> * `group` -> `unit` #> * `placebo` = 6 -> `lags` = 6 #> * `effects` = 8 -> `leads` = 7 #> i nonabsdid puts treatment onset at relative time 0, so `effects` #> post-period estimates correspond to `leads = effects - 1`. ... nabs_event_study(mydata, outcome = "y", treatment = "d", time = "t", method = "DCDH", unit = "id", leads = 7, lags = 6) ``` `df` is likewise accepted for `data`. Supplying both a canonical name and its alias (e.g. `unit` *and* `group`) is an error rather than a silent choice. ## 5. Getting results out: `nabs_write_dta()` Every estimator's output lands in one tidy schema (`time`, `estimate`, `std.error`, `conf.low`, `conf.high`, `window`, `method`, `outcome`), so exporting all of it for a Stata-using coauthor is one line: ```{r, eval = FALSE} res <- nabs_event_study_simple(mydata, outcome = "y", treatment = "d", unit = "id", time = "t") nabs_write_dta(res$tidy, "event_study_results.dta") ``` Dots are not legal in Stata variable names, so `std.error`, `conf.low`, and `conf.high` are renamed to `std_error`, `conf_low`, and `conf_high` on the way out (you'll see a message listing the renames). Back in Stata, rebuilding the figure for one method is the usual `twoway`: ```stata use event_study_results.dta, clear keep if method == "DCDH" twoway (rcap conf_low conf_high time) /// (scatter estimate time), /// yline(0, lpattern(dash)) xline(-0.5, lpattern(dot)) /// xtitle("Periods since treatment") ytitle("Effect on outcome") /// legend(off) ``` Or compare methods side by side: ```stata use event_study_results.dta, clear encode method, gen(m) twoway (scatter estimate time if m == 1) /// (scatter estimate time if m == 2) /// (scatter estimate time if m == 3), /// yline(0) legend(order(1 "DCDH" 2 "IFE" 3 "PanelMatch")) ``` `nabs_write_dta()` also accepts the result objects themselves (`nabs_event_study_result` / `nabs_event_study_simple`) and routes them through `as_nabs_event_study()` for you. ## Reproducibility checklist * **Cross-check DCDH.** Run `did_multiplegt_dyn` on the same data in both Stata and R once, and confirm the estimates match before relying on the R-only estimators. * **Pin versions.** Record `packageVersion("DIDmultiplegtDYN")` (and the SSC version on the Stata side); the authors occasionally change defaults between releases. * **Mind the axis.** When comparing figures across the two programs, remember the one-period shift in post-treatment labels described above. ```{r, include = FALSE, eval = has_haven} unlink(tmp) ```