formulaiv

The formulaiv package implements a method for assessing the sensitivity of formula instrument estimators to the assumed distribution of the counterfactual shocks. Given a formula instrument and a baseline shock distribution, the method computes the range of estimates consistent with shock distributions in a neighborhood of that baseline. Each bound is the optimal value of a linear-fractional program solved as a linear program via the Charnes–Cooper transform using lpSolve.

Installation

formulaiv can be installed from its GitHub repository via

devtools::install_github("peizansheng/formulaiv")

Load the Package

library(formulaiv)

Main Functions

The generic engine is formulaiv():

formulaiv(y, x, z, f, eps, cons, controls = NULL)

where:

The sensitivity set is a neighborhood of either the joint shock distribution or its marginals: There are two supported constraint types:

cons <- list(name = "joint", pbar = pbar)

where pbar is the baseline \(S \times 1\) weight vector

cons <- list(name = "marginal", g = g, qbar = qbar)   # baseline marginals directly
cons <- list(name = "marginal", g = g, pbar = pbar)   # baseline joint -> qbar = g %*% pbar

where

Which inputs to supply

f (the instrument’s recentering matrix) is always required. For a joint neighborhood you also give the baseline weights pbar. For a marginal neighborhood you also give g — the matrix of shock realizations whose marginals are constrained — plus the baseline marginals, either directly as qbar or computed from a baseline joint pbar. Note that the instrument is a function of the shocks (f = f(g)), so g generally cannot be recovered from f, which is why a marginal analysis needs it explicitly.

Examples

For a detailed empirical application, see the China application vignette.

Note on numerical conditioning

formulaiv() solves its linear programs with lpSolve, which is sensitive to the scale of the input coefficients. If y, x, z, or f are on a very large scale (for example population- or weight-aggregated sums), lpSolve may report a numerical failure instead of returning a bound. The estimate is invariant to dividing y and x by a common constant, so rescaling both to within a few orders of magnitude of 1 resolves this without changing any bound.