| Title: | Sensitivity of Formula Instrument to Shock Design |
| Version: | 0.1.0 |
| Description: | Functions to implement the formula instrument method in Borusyak and Hull (2023) <doi:10.3982/ECTA19367> and examine its sensitivity to the assumed distributional of counterfactual shocks. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/peizansheng/formulaiv |
| BugReports: | https://github.com/peizansheng/formulaiv/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| LazyDataCompression: | xz |
| RoxygenNote: | 7.3.3 |
| Imports: | stats, lpSolve |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
| Config/testthat/edition: | 3 |
| Depends: | R (≥ 3.5) |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-06-19 15:38:42 UTC; peizan |
| Author: | Peizan Sheng [aut, cre] |
| Maintainer: | Peizan Sheng <peizan@uchicago.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-24 09:00:11 UTC |
Precomputed BH (2023) Sensitivity Bounds
Description
A named list of four formulaiv() bound tables for the Borusyak and Hull
(2023) China market-access application, used as the expected values in
tests/testthat/test-bh2023.R. Each element is the $beta data frame
returned by formulaiv() for one specification. Because they are produced by
the package itself, that test is a self-regression check; regenerate the list
with source("data-raw/BH_precomputed_results.R").
Usage
BH_precomputed_results
Format
A named list of four data frames:
BH_sens_joint_cons_no_controlsjoint set, no controls,
eps = seq(1, 20, 0.2)BH_sens_joint_cons_with_controlsjoint set, with controls,
eps = seq(1, 20, 0.2)BH_sens_marginal_cons_no_controlsmarginal set, no controls,
eps = seq(1, 2.5, 0.25)BH_sens_marginal_cons_with_controlsmarginal set, with controls,
eps = seq(1, 2.5, 0.25)
Each data frame has columns eps, lb, and ub (the sensitivity bounds).
Sensitivity of formula instrument to shock design
Description
formulaiv() evaluates the sensitivity of formula instrument estimates to
small or large deviations away from an assumed baseline distribution of shocks.
Usage
formulaiv(y, x, z, f, eps, cons, controls = NULL, denom_tol = 1e-08)
Arguments
y |
outcome ( |
x |
endogenous regressor / treatment ( |
z |
formula instrument ( |
f |
counterfactual shock realizations, one scenario per column
( |
eps |
degree of sensitivity of interest (
|
cons |
constraint list specifying the sensitivity set. One of:
|
controls |
control variables
( |
denom_tol |
numeric tolerance below which the first-stage denominator is
treated as zero when reading off its sign over the feasible set (default
|
Value
A list of two data frames:
-
beta— the sensitivity bounds, one row pereps(columnseps,lb,ub). -
p_opt— the optimal weights achieving each bound, in tidy long form (columnseps,scenario,lb,ub;MSrows).
Numerical conditioning
Each bound is the optimal value of a linear program solved with
lpSolve::lp(), whose simplex routine is sensitive to the magnitude of the
problem coefficients. Inputs on a very large scale — for example population-
or weight-aggregated sums — can make the solver report a numerical failure
(lpSolve status 5) instead of returning a bound. The estimate is invariant
to multiplying y and x by a common positive constant, so when this happens
it is enough to divide both y and x by a single scale factor that brings
them to within a few orders of magnitude of 1: every bound (beta$lb,
beta$ub) is unchanged. (Rescaling only one of y or x, or rescaling them by
different factors, does change the estimate.)
Examples
# BH market-access data (bundled with the package)
y <- ma$emp_growth # outcome (N x 1)
x <- ma$dma0 # endogenous regressor (N x 1)
z <- x # formula instrument (N x 1)
f <- ma[, paste0("ma_nlink", 1:1999)] - ma$ma2007 # shock draws (N x S)
pbar <- rep(1 / 1999, 1999) # baseline weights (S x 1)
# Joint sensitivity set without controls, evaluated at eps = 1.5 and 2.
# Wrapped in \donttest{} because solving the linear-fractional programs over
# all S = 1999 shock draws takes longer than CRAN's 5s example budget.
formulaiv(
y = y,
x = x,
z = z,
f = f,
eps = c(1.5, 2),
cons = list(name = "joint", pbar = pbar)
)$beta
High Speed Railway (HSR) Data
Description
high speed railway (HSR) data from Borusyak and Hull (2023) replication file
Usage
line
Format
A data frame with 150 rows and 2022 columns. The package directly uses the following groups of columns:
line characteristics used in the ML model:
line_type_en,speed,computed_length,nlinks,year_approved,plan_typerealized opening year:
year_operate0simulated opening years:
year_operate1toyear_operate1999
Source
https://zenodo.org/records/8286785
Market Access (MA) Data
Description
city-level data from Borusyak and Hull (2023) replication file
Usage
ma
Format
A data frame with 275 rows and 2013 columns. The sensitivity analysis uses the following groups of columns:
identifiers and geography:
cityid,year,latitude,longitude,scaled_lat,scaled_lon,distance_Boutcomes and baseline market access:
emp_growth,dma0,ma0,ma2007market-access simulations:
ma_nlink1toma_nlink1999expected market access:
ma_nlink_pscore,dma_nlink_pscorerecentered market access:
ma_nlink_rc