| Title: | Conditional Independence of Missingness Test |
| Version: | 0.1.1 |
| Description: | Tests whether missingness in explanatory variables is conditionally independent of the outcome, given observed data. Uses multiply-imputed datasets and cross-validated classifiers to produce a test statistic and p-value, with a sensitivity parameter (kappa) for calibrating interpretation. Wraps the 'citest' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/midasverse/citest |
| BugReports: | https://github.com/midasverse/citest/issues |
| Depends: | R (≥ 4.1.0) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| SystemRequirements: | Python (>= 3.9) with the 'midasverse-citest-api' package |
| Imports: | curl, httr2 (≥ 1.0.0), processx (≥ 3.8.0), rlang (≥ 1.1.0) |
| Suggests: | arrow, jsonlite, reticulate, testthat (≥ 3.0.0), knitr, rmarkdown |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-19 14:38:48 UTC; t.robinson7 |
| Author: | Thomas Robinson [aut, cre], Ranjit Lall [aut] |
| Maintainer: | Thomas Robinson <t.robinson7@lse.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-23 17:40:19 UTC |
citestR: Conditional Independence of Missingness Test
Description
Tests whether missingness in explanatory variables is conditionally independent of the outcome, given observed data. Uses multiply-imputed datasets and cross-validated classifiers to produce a test statistic and p-value, with a sensitivity parameter (kappa) for calibrating interpretation. Wraps the 'citest' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime.
Author(s)
Maintainer: Thomas Robinson t.robinson7@lse.ac.uk
Authors:
Ranjit Lall ranjit.lall@politics.ox.ac.uk
See Also
Useful links:
Build a base request pointing at the running server
Description
Build a base request pointing at the running server
Usage
base_req(path)
Arguments
path |
API path (e.g. "/fit"). |
Value
An httr2 request object.
Generate a calibration pivot table
Description
Rows are R-squared values, columns are gamma_x values, for a fixed beta_yx.
Usage
calibration_pivot(
beta_yx = 0.3,
r2_grid = NULL,
beta_grid = NULL,
gamma_grid = NULL,
...
)
Arguments
beta_yx |
Numeric. Fixed beta_yx value (default 0.3). |
r2_grid |
Numeric vector, or |
beta_grid |
Numeric vector, or |
gamma_grid |
Numeric vector, or |
... |
Arguments forwarded to |
Value
A data frame (pivot table).
Examples
calibration_pivot(beta_yx = 0.3)
Check whether the installed backend is up-to-date with PyPI
Description
Compares the locally installed version of midasverse-citest-api against
the latest release on PyPI.
Runs silently on success; emits a message when an update is available.
Failures (e.g. no network) are silently ignored.
Usage
check_backend_version(python, package = "midasverse-citest-api")
Arguments
python |
Path to the Python interpreter. |
package |
PyPI package name (default |
Value
No return value, called for side effects.
Run the conditional independence test
Description
All-in-one convenience function: creates a dataset on the server, builds a
CIMissTest, runs it, and returns the results.
Usage
ci_test(
data,
y,
expl_vars = NULL,
onehot = TRUE,
imputer = "midas",
classifier = "rf",
m = 10L,
n_folds = 10L,
classifier_args = list(),
imputer_args = list(),
random_state = 42L,
target_level = "variable",
variance_method = "mi_crossfit",
subsample_cap = 2000L,
...
)
Arguments
data |
A data frame (may contain |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categoricals (default |
imputer |
Character. Imputer backend: |
classifier |
Character. Classifier backend: |
m |
Integer. Number of multiply-imputed datasets (default 10). |
n_folds |
Integer. Number of cross-validation folds (default 10). |
classifier_args |
Named list of extra classifier arguments. |
imputer_args |
Named list of extra imputer arguments. |
random_state |
Integer. Random seed (default 42). |
target_level |
Character. |
variance_method |
Character. |
subsample_cap |
Integer or |
... |
Arguments forwarded to |
Value
A list with elements model_id, dataset_id, and results.
The results element contains m, B, W_bar, T, t_k, p_k,
p_2s, and optionally df.
Examples
df <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200))
df$X1[sample(200, 20)] <- NA
result <- ci_test(df, y = "Y")
result$results$p_2s
Remove the saved virtualenv path
Description
Remove the saved virtualenv path
Usage
clear_venv_path()
Value
No return value, called for side effects.
Compute theoretical imputation bias kappa
Description
Compute theoretical imputation bias kappa
Usage
compute_kappa(r2_x_z, beta_yx, gamma_x, ...)
Arguments
r2_x_z |
Numeric. R-squared of X on observed covariates Z. |
beta_yx |
Numeric. Coefficient of X in the Y equation. |
gamma_x |
Numeric. Loading of X in the missingness equation. |
... |
Arguments forwarded to |
Value
A single numeric value (kappa).
Examples
compute_kappa(r2_x_z = 0.5, beta_yx = 0.3, gamma_x = 0.2)
Path to the package config directory
Description
Path to the package config directory
Usage
config_dir()
Value
Character path to the config directory.
Ensure the server is running
Description
Starts the server if it is not already running. Called internally by every client function so users never have to manage the server manually.
Usage
ensure_server(...)
Arguments
... |
Arguments forwarded to |
Value
Invisibly returns the base URL of the running server.
Examples
ensure_server()
Find a free TCP port
Description
Samples random ports in the dynamic range and uses serverSocket() to
verify availability.
Usage
find_free_port()
Value
Integer port number.
GET and return parsed body
Description
GET and return parsed body
Usage
get_json(path, timeout = 30)
Get a summary of test results
Description
Retrieves a structured summary for a previously fitted model.
Usage
get_summary(model_id, ...)
Arguments
model_id |
Character. UUID returned by |
... |
Arguments forwarded to |
Value
A list with elements outcome, imputer, classifier,
variance_method, mean_difference, t_statistic, df, p_value,
and p_value_two_sided.
Examples
result <- ci_test(df, y = "Y")
get_summary(result$model_id)
Check whether the citest server is running
Description
Returns TRUE if the package's background server process is alive.
Used as the guard for @examplesIf so that examples requiring the
Python backend are skipped when no server is available.
Usage
has_server()
Value
Logical.
Estimate imputer out-of-sample R-squared
Description
Runs a mask-and-impute diagnostic on the server.
Usage
imputer_r2(model_id, mask_frac = 0.2, m_eval = 1L, ...)
Arguments
model_id |
Character. UUID returned by |
mask_frac |
Numeric. Fraction of observed cells to hold out (default 0.2). |
m_eval |
Integer. Number of imputations to average over (default 1). |
... |
Arguments forwarded to |
Value
A list with mean_r2 and per_variable (named numeric vector).
Examples
result <- ci_test(df, y = "Y")
imputer_r2(result$model_id)
Install the citest Python backend
Description
Creates an isolated Python environment and installs the midasverse-citest-api
package (which pulls in midasverse-citest as a dependency).
Usage
install_backend(
method = c("pip", "conda", "uv"),
envname = "citest_env",
package = "midasverse-citest-api"
)
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment to create
(default |
package |
Character. Package specifier to install
(default |
Details
This is the only function in the package that uses reticulate, and
only for environment creation. It is never used at runtime.
Value
No return value, called for side effects.
Examples
install_backend()
install_backend(method = "conda")
Generate a kappa calibration table
Description
Generate a kappa calibration table
Usage
kappa_calibration_table(
r2_grid = NULL,
beta_grid = NULL,
gamma_grid = NULL,
...
)
Arguments
r2_grid |
Numeric vector of R-squared values, or |
beta_grid |
Numeric vector of beta values, or |
gamma_grid |
Numeric vector of gamma values, or |
... |
Arguments forwarded to |
Value
A data frame with columns r2_x_z, beta_yx, gamma_x, kappa,
abs_kappa.
Examples
kappa_calibration_table(r2_grid = c(0.3, 0.5, 0.7))
Load the saved virtualenv path (or NULL)
Description
Load the saved virtualenv path (or NULL)
Usage
load_venv_path()
Value
Character path or NULL.
Create a dataset on the server
Description
Sends a data frame to the citest API server and creates a Dataset object.
Usage
make_dataset(data, y, expl_vars = NULL, onehot = TRUE, ...)
Arguments
data |
A data frame (may contain |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categorical columns (default |
... |
Arguments forwarded to |
Value
A list with elements dataset_id, n, columns, y_name,
expl_vars, and pct_missing.
Examples
df <- data.frame(Y = rnorm(100), X1 = rnorm(100))
ds <- make_dataset(df, y = "Y")
ds$dataset_id
Create a dataset from a Parquet file
Description
Uploads a Parquet file to the citest API server.
Usage
make_dataset_parquet(file, y, expl_vars = NULL, onehot = TRUE, ...)
Arguments
file |
Path to a |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categorical columns (default |
... |
Arguments forwarded to |
Value
A list with elements dataset_id, n, columns, y_name,
expl_vars, and pct_missing.
Examples
ds <- make_dataset_parquet("data.parquet", y = "Y")
POST JSON and return parsed body
Description
POST JSON and return parsed body
Usage
post_json(path, body, timeout = 300)
Print a citest result
Description
Displays a concise summary of the conditional independence test result, including the test statistic, degrees of freedom, p-value, and a plain language interpretation.
Usage
## S3 method for class 'citest_result'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (currently ignored). |
Value
Invisibly returns x.
Examples
result <- structure(list(
model_id = "example-id",
dataset_id = "example-ds",
results = list(m = 0.12, t_k = 2.5, df = 9, p_2s = 0.034)
), class = "citest_result")
print(result)
Print a citest summary
Description
Displays a formatted summary of a fitted conditional independence test, including model configuration and key results.
Usage
## S3 method for class 'citest_summary'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (currently ignored). |
Value
Invisibly returns x.
Examples
smry <- structure(list(
outcome = "Y",
imputer = "midas",
classifier = "rf",
variance_method = "mi_crossfit",
mean_difference = 0.12,
t_statistic = 2.5,
df = 9,
p_value_two_sided = 0.034
), class = "citest_summary")
print(smry)
Save the virtualenv path to persistent config
Description
Save the virtualenv path to persistent config
Usage
save_venv_path(path)
Arguments
path |
Character path to save. |
Value
No return value, called for side effects.
Generate a simulated dataset
Description
Calls one of the built-in data-generating processes on the Python server.
Usage
simulate_data(
dgp,
n = 1000L,
ci = TRUE,
missing_mech = "linear",
beta_y = NULL,
mcar_prop = NULL,
k = NULL,
...
)
Arguments
dgp |
Character. Name of the DGP (e.g. |
n |
Integer. Number of observations. |
ci |
Logical. Conditional independence holds ( |
missing_mech |
Character. Missingness mechanism ( |
beta_y |
Numeric or |
mcar_prop |
Numeric or |
k |
Integer or |
... |
Arguments forwarded to |
Value
A list with dataset_id, n, columns, pct_missing.
Examples
sim <- simulate_data("single_mar", n = 500, ci = TRUE)
Start the citest API server
Description
Launches python -m citest_api as a background process and waits for the
/health endpoint to respond.
Usage
start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)
Arguments
python |
Path to the Python interpreter (default |
port |
Port to bind to. If |
venv |
Path to a Python virtual environment.
If supplied, the interpreter is taken from |
max_wait |
Maximum number of 0.5-second polling attempts (default 120, i.e. 60 seconds). The first launch may be slower due to Python import caching. |
Value
Invisibly returns the port number.
Examples
start_server()
start_server(venv = "~/.virtualenvs/citest_env")
Stop the citest API server
Description
Kills the background Python process and clears the internal state.
Usage
stop_server()
Value
No return value, called for side effects.
Examples
stop_server()
Convert an R matrix / data.frame to a nested list suitable for JSON
Description
Convert an R matrix / data.frame to a nested list suitable for JSON
Usage
to_nested_list(x)
Uninstall the citest Python backend
Description
Stops the running server (if any), removes the Python environment created by
install_backend(), and clears the saved configuration.
Usage
uninstall_backend(method = c("pip", "conda", "uv"), envname = "citest_env")
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment to remove
(default |
Value
No return value, called for side effects.
Examples
uninstall_backend()
uninstall_backend(method = "conda")
Update the citest Python backend
Description
Upgrades the midasverse-citest-api package (and its dependencies) in the
existing Python environment. Stops the running server first so that the
new version is loaded on next use.
Usage
update_backend(
method = c("pip", "conda", "uv"),
envname = "citest_env",
package = "midasverse-citest-api"
)
Arguments
method |
Character. One of |
envname |
Character. Name of the virtual environment
(default |
package |
Character. Package specifier to upgrade
(default |
Value
No return value, called for side effects.
Examples
update_backend()