Type: | Package |
Title: | Assessing Proximal and Distal Causal Excursion Effects for Micro-Randomized Trials |
Version: | 0.2.0 |
Description: | Estimates marginal causal excursion effects and moderated causal excursion effects for micro-randomized trial (MRT). Applicable to MRT with binary treatment options and continuous or binary outcomes. The method for MRT with continuous outcomes is the weighted centered least squares (WCLS) by Boruvka et al. (2018) <doi:10.1080/01621459.2017.1305274>. The method for MRT with binary outcomes is the estimator for marginal excursion effect (EMEE) by Qian et al. (2021) <doi:10.1093/biomet/asaa070>. Estimates marginal and moderated causal excursion effects for micro-randomized trials (MRTs) with binary treatment options. Supports continuous and binary proximal outcomes as well as distal outcomes. Methods include weighted and centered least squares (WCLS) for continuous proximal outcomes by Boruvka et al. (2018) <doi:10.1080/01621459.2017.1305274>, the estimator for marginal excursion effect (EMEE) for binary proximal outcomes by Qian et al. (2021) <doi:10.1093/biomet/asaa070>, and two-stage estimation of distal causal excursion effects (DCEE) for continuous distal outcomes <doi:10.48550/arXiv.2502.13500>. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | rootSolve, stats, geepack, sandwich, mgcv, randomForest, ranger |
Depends: | R (≥ 4.2) |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), SuperLearner, earth |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-08-28 22:54:41 UTC; tqian |
Author: | Tianchen Qian |
Maintainer: | Tianchen Qian <t.qian@uci.edu> |
Repository: | CRAN |
Date/Publication: | 2025-08-29 08:41:17 UTC |
A synthetic data set of an MRT with binary proximal outcomes
Description
Baseline model:
\log E\{Y_{t+1} \mid A_t = 0, I_t = 1\} =
\alpha_0 + \alpha_1 \cdot \mathrm{time} / \mathrm{total\_T}
+ \alpha_2 \cdot \mathbf{1}(\mathrm{time} > \mathrm{total\_T}/2).
Treatment effect model:
\log RR_t = \beta_0 + \beta_1 \cdot \mathrm{time} / \mathrm{total\_T}.
Randomization probabilities p_t
cycle over 0.3, 0.5, 0.7 (with repetition).
Availability is exogenous at 0.8 for all time points.
Usage
data_binary
Format
A data frame with 3000 observations and 10 variables:
- userid
Individual id number.
- time
Decision point index.
- time_var1
Time-varying covariate 1, the \"standardized time in study\", defined as the current decision point index divided by the total number of decision points.
- time_var2
Time-varying covariate 2, indicator of \"the second half of the study\", defined as whether the current decision point index is greater than the total number of decision points divided by 2.
- Y
Binary proximal outcome.
- A
Treatment assignment: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point.
- rand_prob
Randomization probability
P(A=1)
for the current decision point.- avail
Availability indicator (=1 available, =0 not available) at the current decision point.
A synthetic data set of an MRT with continuous distal outcome
Description
Simulated longitudinal dataset suitable for illustrating the 'dcee()' function. Each row corresponds to one decision point for one subject. The distal outcome 'Y' is constant within subject (because it is measured at the end of the study, and here we append it to the long format data as an extra column to conform with the 'dcee()' function requirement.
Usage
data_distal_continuous
Format
a data frame with 1500 observations and 11 variables
- userid
Subject identifier
- dp
Decision point (1..T)
- X
Endogenous continuous time-varying covariate
- Z
Endogenous binary time-varying covariate
- avail
Availability indicator (0/1)
- A
Treatment (0/1)
- prob_A
Randomization probability P(A=1|H_t)
- A_lag1
Lagged treatment
- Y
Distal continuous outcome (constant per subject)
A synthetic data set that mimics the HeartSteps V1 data structure to illustrate the use of [wcls()] function for continuous proximal outcomes
Description
A synthetic data set that mimics the HeartSteps V1 data structure to illustrate the use of [wcls()] function for continuous proximal outcomes
Usage
data_mimicHeartSteps
Format
a data frame with 7770 observations and 9 variables
- userid
individual id number
- time
decision point index
- day_in_study
day in the study
- logstep_30min
proximal outcome: the step count in the 30 minutes following the current decision point (log-transformed)
- logstep_30min_lag1
proximal outcome at the previous decision point (lag-1 outcome): the step count in the 30 minutes following the previous decision point (log-transformed)
- logstep_pre30min
the step count in the 30 minutes prior to the current decision point (log-transformed); used as a control variable
- is_at_home_or_work
whether the individual is at home or work (=1) or at other locations (=0) at the current decision point
- intervention
whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point
- rand_prob
the randomization probability P(A=1) for the current decision point
- availability
whether the individual is available (=1) or not (=0) at the current decision point
Distal Causal Excursion Effect (DCEE) Estimation
Description
Fits distal causal excursion effects in micro-randomized trials using a
**two-stage** estimator: (i) learn nuisance outcome regressions
\mu_a(H_t)
with a specified learner (parametric/ML), optionally with
cross-fitting; (ii) solve estimating equations for the distal excursion
effect parameters (\beta
).
This wrapper standardizes inputs and delegates computation to [dcee_helper_2stage_estimation()].
Usage
dcee(
data,
id,
outcome,
treatment,
rand_prob,
moderator_formula,
control_formula,
availability = NULL,
control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library",
"set_to_zero"),
cross_fit = FALSE,
cf_fold = 10,
weighting_function = NULL,
verbose = TRUE,
...
)
Arguments
data |
A data.frame in long format. |
id |
Character scalar: column name for subject identifier. |
outcome |
Character scalar: column name for proximal/distal outcome. |
treatment |
Character scalar: column name for binary treatment {0,1}. |
rand_prob |
Character scalar: column name for randomization probability
giving |
moderator_formula |
RHS-only formula of moderators of the excursion effect (e.g., '~ 1', '~ Z', or '~ Z1 + Z2'). |
control_formula |
RHS-only formula of covariates for learning nuisance outcome regressions. When 'control_reg_method = "gam"', 's(x)' terms are allowed (e.g., '~ x1 + s(x2)'). For SuperLearner methods, variables are extracted from this formula to build the design matrix 'X'. |
availability |
Optional character scalar: column name for availability indicator (0/1). If 'NULL', availability is taken as 1 for all rows. |
control_reg_method |
One of '"gam"', '"lm"', '"rf"', '"ranger"', '"sl"', '"sl.user-specified-library"', '"set_to_zero"'. See Details. |
cross_fit |
Logical; if 'TRUE', perform K-fold cross-fitting by subject id. |
cf_fold |
Integer; number of folds if 'cross_fit = TRUE' (default 10). |
weighting_function |
Either a single numeric constant applied to all
rows, or a character column name in 'data' giving decision-point weights
|
verbose |
Logical; print minimal preprocessing messages (default 'TRUE'). |
... |
Additional arguments passed through to the chosen learner (e.g., 'num.trees', 'mtry' for random forests; 'sl.library' when 'control_reg_method = "sl.user-specified-library"'). |
Details
**Learners.**
- 'gam' uses mgcv and supports 's(.)' terms in 'control_formula'.
- 'lm' uses base stats::lm
.
- 'rf' uses randomForest; 'ranger' uses ranger.
- 'sl' / 'sl.user-specified-library' use SuperLearner. For the former,
'sl.library = c("SL.mean", "SL.glm", "SL.earth")' are used. For the latter,
please provide 'sl.library = c("SL.mean", ...)' via '...'.
**Notes.** - Treatment must be coded 0/1; 'rand_prob' must lie strictly in (0,1). - 'control_formula = ~ 1' is only valid with 'control_reg_method = "set_to_zero"'.
Value
An object of class '"dcee_fit"' with components:
- call
The matched call to
dcee()
.- fit
A list returned by the two–stage helper with elements:
beta_hat
Named numeric vector of distal causal excursion effect estimates
\beta
. Names are"Intercept"
and the moderator names (if any) frommoderator_formula
.beta_se
Named numeric vector of standard errors for
beta_hat
(same order/names).beta_varcov
Variance–covariance matrix of
beta_hat
(square matrix; row/column names matchnames(beta_hat)
).conf_int
Matrix of large-sample (normal) Wald 95% confidence intervals for
beta_hat
; columns are"2.5 %"
and"97.5 %"
.conf_int_tquantile
Matrix of small-sample (t-quantile) 95% confidence intervals for
beta_hat
; columns are"2.5 %"
and"97.5 %"
; degrees of freedom are provided in$df
of the"dcee_fit"
object.regfit_a0
Stage-1 nuisance regression fit for
\mu_0(H_t)
(outcome model amongA=0
), orNULL
whencontrol_reg_method = "set_to_zero"
. Note: whencross_fit = TRUE
, this is the learner object from the last fold and is provided for inspection only (do not use for out-of-fold prediction).regfit_a1
Stage-1 nuisance regression fit for
\mu_1(H_t)
(outcome model amongA=1
); same caveats asregfit_a0
regardingcross_fit
.
- df
Small-sample degrees of freedom used for t-based intervals: number of unique subjects minus
length(fit$beta_hat)
.
Examples
data(data_distal_continuous, package = "MRTAnalysis")
## Fast example: marginal effect with linear nuisance (CRAN-friendly)
fit_lm <- dcee(
data = data_distal_continuous,
id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
moderator_formula = ~1, # marginal (no moderators)
control_formula = ~X, # simple linear nuisance
availability = "avail",
control_reg_method = "lm",
cross_fit = FALSE
)
summary(fit_lm)
summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info
## Moderated effect with GAM nuisance (allows smooth terms); may be slower
fit_gam <- dcee(
data = data_distal_continuous,
id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
moderator_formula = ~Z, # test moderation by Z
control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam
availability = "avail",
control_reg_method = "gam",
cross_fit = TRUE, cf_fold = 5
)
summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient
summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info
## Optional: SuperLearner (runs only if installed)
library(SuperLearner)
fit_sl <- dcee(
data = data_distal_continuous,
id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
moderator_formula = ~1,
control_formula = ~ X + Z,
availability = "avail",
control_reg_method = "sl",
cross_fit = FALSE
)
summary(fit_sl)
Estimates the causal excursion effect for binary outcome MRT
Description
Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.
Usage
emee(
data,
id,
outcome,
treatment,
rand_prob,
moderator_formula,
control_formula,
availability = NULL,
numerator_prob = NULL,
start = NULL,
verbose = TRUE
)
Arguments
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
start |
A vector of the initial value of the estimators used in the numerical
solver. If using default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
Value
An object of type "emee_fit"
Examples
## estimating the fully marginal excursion effect by setting
## moderator_formula = ~ 1
emee(
data = data_binary,
id = "userid",
outcome = "Y",
treatment = "A",
rand_prob = "rand_prob",
moderator_formula = ~1,
control_formula = ~ time_var1 + time_var2,
availability = "avail"
)
## estimating the causal excursion effect moderated by time_var1
## by setting moderator_formula = ~ time_var1
emee(
data = data_binary,
id = "userid",
outcome = "Y",
treatment = "A",
rand_prob = "rand_prob",
moderator_formula = ~time_var1,
control_formula = ~ time_var1 + time_var2,
availability = "avail"
)
Estimates the causal excursion effect for binary outcome MRT
Description
Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error.
Small sample correction using the "Hat" matrix in the variance estimate is implemented.
This is a slightly altered version of emee()
, where the treatment
assignment indicator is also centered in the residual term. It would have
similar (but not exactly the same) numerical output as emee()
. This
is the estimator based on which the sample size calculator for binary outcome
MRT is developed. (See R package MRTSampleSizeBinary
.)
Usage
emee2(
data,
id,
outcome,
treatment,
rand_prob,
moderator_formula,
control_formula,
availability = NULL,
numerator_prob = NULL,
start = NULL,
verbose = TRUE
)
Arguments
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
start |
A vector of the initial value of the estimators used in the numerical
solver. If using default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
Value
An object of type "emee_fit"
Examples
## estimating the fully marginal excursion effect by setting
## moderator_formula = ~ 1
emee2(
data = data_binary,
id = "userid",
outcome = "Y",
treatment = "A",
rand_prob = "rand_prob",
moderator_formula = ~1,
control_formula = ~ time_var1 + time_var2,
availability = "avail"
)
## estimating the causal excursion effect moderated by time_var1
## by setting moderator_formula = ~ time_var1
emee2(
data = data_binary,
id = "userid",
outcome = "Y",
treatment = "A",
rand_prob = "rand_prob",
moderator_formula = ~time_var1,
control_formula = ~ time_var1 + time_var2,
availability = "avail"
)
Summary for DCEE fits
Description
Produce inference tables for distal causal excursion effects from a
[dcee()] model. By default uses small-sample t
-tests with
df = object$df
(subjects minus number of betas). If df
is missing or nonpositive, falls back to large-sample normal (z) inference.
Usage
## S3 method for class 'dcee_fit'
summary(
object,
lincomb = NULL,
conf_level = 0.95,
show_control_fit = FALSE,
...
)
Arguments
object |
An object of class |
lincomb |
Optional numeric vector or matrix specifying linear
combinations |
conf_level |
Confidence level for intervals (default |
show_control_fit |
Logical; if |
... |
Currently ignored. |
Value
A list of class "summary.dcee_fit"
with components:
-
call
— the original call -
df
— degrees of freedom used for t-tests (may beNA
) -
conf_level
— the confidence level -
excursion_effect
— data frame with coefficient table for\beta
-
lincomb
— optional data frame with linear-combination results -
control_fit
— optional list describing Stage-1 fits (only ifshow_control_fit
)
Summarize Causal Excursion Effect Fits for MRT with Binary Outcomes
Description
summary
method for class "emee_fit".
Usage
## S3 method for class 'emee_fit'
summary(
object,
lincomb = NULL,
conf_level = 0.95,
show_control_fit = FALSE,
...
)
Arguments
object |
An object of class "emee_fit". |
lincomb |
A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value. |
conf_level |
A numeric value indicating the confidence level for confidence intervals. Default to 0.95. |
show_control_fit |
A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.) |
... |
Further arguments passed to or from other methods. |
Value
the original function call and the estimated causal excursion effect coefficients, confidence interval with conf_level, standard error, t-statistic value, degrees of freedom, and p-value.
Examples
fit <- emee(
data = data_binary,
id = "userid",
outcome = "Y",
treatment = "A",
rand_prob = "rand_prob",
moderator_formula = ~time_var1,
control_formula = ~ time_var1 + time_var2,
availability = "avail",
numerator_prob = 0.5,
start = NULL
)
summary(fit)
Summarize Causal Excursion Effect Fits for MRT with Continuous Outcomes
Description
summary
method for class "wcls_fit".
Usage
## S3 method for class 'wcls_fit'
summary(
object,
lincomb = NULL,
conf_level = 0.95,
show_control_fit = FALSE,
...
)
Arguments
object |
An object of class "wcls_fit". |
lincomb |
A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value. |
conf_level |
A numeric value indicating the confidence level for confidence intervals. Default to 0.95. |
show_control_fit |
A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.) |
... |
Further arguments passed to or from other methods. |
Value
the original function call and the estimated causal excursion effect coefficients, 95 value or Wald-statistic value (depending on whether sample size is < 50), degrees of freedom, and p-value.
Examples
fit <- wcls(
data = data_mimicHeartSteps,
id = "userid",
outcome = "logstep_30min",
treatment = "intervention",
rand_prob = 0.6,
moderator_formula = ~1,
control_formula = ~logstep_pre30min,
availability = "avail",
numerator_prob = 0.6
)
summary(fit)
Estimates the causal excursion effect for continuous outcome MRT
Description
Returns the estimated causal excursion effect (on additive scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.
Usage
wcls(
data,
id,
outcome,
treatment,
rand_prob,
moderator_formula,
control_formula,
availability = NULL,
numerator_prob = NULL,
verbose = TRUE
)
Arguments
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
Value
An object of type "wcls_fit"
Examples
wcls(
data = data_mimicHeartSteps,
id = "userid",
outcome = "logstep_30min",
treatment = "intervention",
rand_prob = 0.6,
moderator_formula = ~1,
control_formula = ~logstep_pre30min,
availability = "avail",
numerator_prob = 0.6
)