formulaiv

The formulaiv package implements a method for assessing the sensitivity of formula instrument estimators to the assumed distribution of the counterfactual shocks. Given a formula instrument and a baseline shock distribution, the method computes the range of estimates consistent with shock distributions in a neighborhood of that baseline. Each bound is the optimal value of a linear-fractional program solved as a linear program via the Charnes–Cooper transform using lpSolve.

Installation

formulaiv can be installed from its GitHub repository via

devtools::install_github("peizansheng/formulaiv")

Load the Package

library(formulaiv)

Main Functions

The generic engine is formulaiv():

formulaiv(y, x, z, f, eps, cons, controls = NULL)

where:

y: outcome (\(N \times 1\) vector)
x: endogenous regressor / treatment (\(N \times 1\) vector)
z: formula instrument (\(N \times 1\) vector)
f: counterfactual shock realizations, one scenario per column (\(N \times S\) data frame or matrix)
eps: size(s) of the sensitivity neighborhood to evaluate (scalar or \(M \times 1\) vector)
cons: the sensitivity-set (constraint) specification
controls: optional controls (\(N \times J\) data frame or matrix, or NULL)

The sensitivity set is a neighborhood of either the joint shock distribution or its marginals: There are two supported constraint types:

joint constraints: reweight the baseline joint distribution pbar

cons <- list(name = "joint", pbar = pbar)

where pbar is the baseline \(S \times 1\) weight vector

marginal constraints: bound the implied per-shock marginals. Supply the shock realization matrix g, and the baseline marginals either directly as qbar or via a baseline joint distribution pbar (from which qbar = g %*% pbar):

cons <- list(name = "marginal", g = g, qbar = qbar)   # baseline marginals directly
cons <- list(name = "marginal", g = g, pbar = pbar)   # baseline joint -> qbar = g %*% pbar

where

g is the \(L \times S\) shock realization matrix (data frame or matrix),
qbar is the baseline \(L \times 1\) vector of marginal shock probabilities, or
pbar is the baseline \(S \times 1\) joint weight vector

Which inputs to supply

f (the instrument’s recentering matrix) is always required. For a joint neighborhood you also give the baseline weights pbar. For a marginal neighborhood you also give g — the matrix of shock realizations whose marginals are constrained — plus the baseline marginals, either directly as qbar or computed from a baseline joint pbar. Note that the instrument is a function of the shocks (f = f(g)), so g generally cannot be recovered from f, which is why a marginal analysis needs it explicitly.

Examples

For a detailed empirical application, see the China application vignette.

Note on numerical conditioning

formulaiv() solves its linear programs with lpSolve, which is sensitive to the scale of the input coefficients. If y, x, z, or f are on a very large scale (for example population- or weight-aggregated sums), lpSolve may report a numerical failure instead of returning a bound. The estimate is invariant to dividing y and x by a common constant, so rescaling both to within a few orders of magnitude of 1 resolves this without changing any bound.