The RPIV package implements a residual prediction test for the well specification of linear instrumental variable (IV) models, as presented in Scheidegger, Londschien and Bühlmann (2025). For a response \(Y_i\in \mathbb R\), endogenous explanatory variables \(X_i\in \mathbb R^p\) and instruments \(Z_i\in \mathbb R^d\) (\(d\geq p\)), it tests the well-specification of the linear IV model (``is the linear IV model appropriate?’’). More formally, it tests the null hypothesis \[H_0: \exists \beta\in \mathbb R^p\text{ s.t. } \mathbb E[Y_i - X_i^T\beta|Z_i] = 0 \text{ a.s., } i =1,\ldots, n,\] which is implied by the well-specification of the linear IV model (with mean-independence assumption on the errors).
The model allows for additional exogenous explanatory variables (``exogenous controls’’) (denoted by \(C\) in the R function) and an intercept, which are added both to \(X\) and \(Z\).
For a detailed discussion of the method, we refer to Scheidegger, Londschien and Bühlmann (2025). A python implementation is available in the package ivmodels. We now demonstrate, how the RPIV package is used in practice.
You can install the released CRAN version of RPIV with
install.packages("RPIV")
You can install the development version of RPIV from GitHub with
::install_github("cyrillsch/RPIV") devtools
This is a basic example presenting, how the well-specification of linear IV models can be tested with the RPIV package. We simulate a dataset with \(n = 200\) observation and three responses.
set.seed(1)
<- 200
n <- rnorm(n) # exogenous explanatory variable
C <- cbind(rnorm(n), C + rnorm(n)) # instrumental variable
Z <- rnorm(n) # hidden confounding
H <- Z[, 1] - Z[, 2] + rnorm(n) # endogenous explanatory variable
X <- X - C + H + rnorm(n) # linear IV model
Y1 <- X - C + H + Z[, 1]^2 + rnorm(n) # invalid IV -> misspecified
Y2 <- 2 * sign(X - C) + H + rnorm(n) # nonlinear IV model -> misspecified Y3
To apply the well-specification test to the three responses, we use the function , which uses a heteroskedasticity robust variance estimator by default.
library(RPIV)
<- RPIV_test(Y = Y1, X = X, C = C, Z = Z)
result1 <- RPIV_test(Y = Y2, X = X, C = C, Z = Z)
result2 <- RPIV_test(Y = Y3, X = X, C = C, Z = Z)
result3
$p_value
result1#> [1] 0.1575286
$p_value
result2#> [1] 0.0004228503
$p_value
result3#> [1] 0.005525054
We see that, indeed, well-specification is rejected at significance level \(\alpha = 0.05\) for the responses \(Y_2\) and \(Y_3\).
The RPIV package also supports cluster-robust inference. We simulate data with 50 clusters of size 4, but the linear IV model is well-specified otherwise.
set.seed(1)
<- 200
n <- rep(1:50, length.out = n)
clustering <- rep(rnorm(1:50), length.out = n) + 0.1 * rnorm(n)
Z <- rep(rnorm(1:50), length.out = n) + 0.1 * rnorm(n)
H <- Z + rep(rnorm(1:50), length.out = n) + 0.1 * rnorm(n)
X <- X + H + rep(rnorm(1:50), length.out = n) + 0.1 * rnorm(n) Y
We apply the test with three different variance estimators: assuming homoskedasticity, robust to heteroskedasticity, robust to clustering.
<- RPIV_test(Y = Y, X = X, C = NULL, Z = Z, variance_estimator =
result c("homoskedastic", "heteroskedastic", "cluster"), clustering = clustering)
$homoskedastic$p_value
result#> [1] 0.02844595
$heteroskedastic$p_value
result#> [1] 0.01728716
$cluster$p_value
result#> [1] 0.1347029
We see that only using the cluster-robust variance estimator does not reject the null hypothesis at significance level \(\alpha = 0.05\).
More examples can be found in Scheidegger, Londschien and Bühlmann (2025) and the associated GithHub repository RPIV_Application.
Cyrill Scheidegger, Malte Londschien and Peter Bühlmann. A residual prediction test for the well-specification of linear instrumental variable models. Preprint, arXiv:2506.12771, 2025.