Type: Package
Title: Assessing Proximal and Distal Causal Excursion Effects for Micro-Randomized Trials
Version: 0.2.0
Description: Estimates marginal causal excursion effects and moderated causal excursion effects for micro-randomized trial (MRT). Applicable to MRT with binary treatment options and continuous or binary outcomes. The method for MRT with continuous outcomes is the weighted centered least squares (WCLS) by Boruvka et al. (2018) <doi:10.1080/01621459.2017.1305274>. The method for MRT with binary outcomes is the estimator for marginal excursion effect (EMEE) by Qian et al. (2021) <doi:10.1093/biomet/asaa070>. Estimates marginal and moderated causal excursion effects for micro-randomized trials (MRTs) with binary treatment options. Supports continuous and binary proximal outcomes as well as distal outcomes. Methods include weighted and centered least squares (WCLS) for continuous proximal outcomes by Boruvka et al. (2018) <doi:10.1080/01621459.2017.1305274>, the estimator for marginal excursion effect (EMEE) for binary proximal outcomes by Qian et al. (2021) <doi:10.1093/biomet/asaa070>, and two-stage estimation of distal causal excursion effects (DCEE) for continuous distal outcomes <doi:10.48550/arXiv.2502.13500>.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Imports: rootSolve, stats, geepack, sandwich, mgcv, randomForest, ranger
Depends: R (≥ 4.2)
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0), SuperLearner, earth
VignetteBuilder: knitr
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-08-28 22:54:41 UTC; tqian
Author: Tianchen Qian ORCID iD [aut, cre], Shaolin Xiang [aut], Zhaoxi Cheng [aut], Audrey Boruvka [ctb]
Maintainer: Tianchen Qian <t.qian@uci.edu>
Repository: CRAN
Date/Publication: 2025-08-29 08:41:17 UTC

A synthetic data set of an MRT with binary proximal outcomes

Description

Baseline model:

\log E\{Y_{t+1} \mid A_t = 0, I_t = 1\} = \alpha_0 + \alpha_1 \cdot \mathrm{time} / \mathrm{total\_T} + \alpha_2 \cdot \mathbf{1}(\mathrm{time} > \mathrm{total\_T}/2).

Treatment effect model:

\log RR_t = \beta_0 + \beta_1 \cdot \mathrm{time} / \mathrm{total\_T}.

Randomization probabilities p_t cycle over 0.3, 0.5, 0.7 (with repetition). Availability is exogenous at 0.8 for all time points.

Usage

data_binary

Format

A data frame with 3000 observations and 10 variables:

userid

Individual id number.

time

Decision point index.

time_var1

Time-varying covariate 1, the \"standardized time in study\", defined as the current decision point index divided by the total number of decision points.

time_var2

Time-varying covariate 2, indicator of \"the second half of the study\", defined as whether the current decision point index is greater than the total number of decision points divided by 2.

Y

Binary proximal outcome.

A

Treatment assignment: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point.

rand_prob

Randomization probability P(A=1) for the current decision point.

avail

Availability indicator (=1 available, =0 not available) at the current decision point.


A synthetic data set of an MRT with continuous distal outcome

Description

Simulated longitudinal dataset suitable for illustrating the 'dcee()' function. Each row corresponds to one decision point for one subject. The distal outcome 'Y' is constant within subject (because it is measured at the end of the study, and here we append it to the long format data as an extra column to conform with the 'dcee()' function requirement.

Usage

data_distal_continuous

Format

a data frame with 1500 observations and 11 variables

userid

Subject identifier

dp

Decision point (1..T)

X

Endogenous continuous time-varying covariate

Z

Endogenous binary time-varying covariate

avail

Availability indicator (0/1)

A

Treatment (0/1)

prob_A

Randomization probability P(A=1|H_t)

A_lag1

Lagged treatment

Y

Distal continuous outcome (constant per subject)


A synthetic data set that mimics the HeartSteps V1 data structure to illustrate the use of [wcls()] function for continuous proximal outcomes

Description

A synthetic data set that mimics the HeartSteps V1 data structure to illustrate the use of [wcls()] function for continuous proximal outcomes

Usage

data_mimicHeartSteps

Format

a data frame with 7770 observations and 9 variables

userid

individual id number

time

decision point index

day_in_study

day in the study

logstep_30min

proximal outcome: the step count in the 30 minutes following the current decision point (log-transformed)

logstep_30min_lag1

proximal outcome at the previous decision point (lag-1 outcome): the step count in the 30 minutes following the previous decision point (log-transformed)

logstep_pre30min

the step count in the 30 minutes prior to the current decision point (log-transformed); used as a control variable

is_at_home_or_work

whether the individual is at home or work (=1) or at other locations (=0) at the current decision point

intervention

whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point

rand_prob

the randomization probability P(A=1) for the current decision point

availability

whether the individual is available (=1) or not (=0) at the current decision point


Distal Causal Excursion Effect (DCEE) Estimation

Description

Fits distal causal excursion effects in micro-randomized trials using a **two-stage** estimator: (i) learn nuisance outcome regressions \mu_a(H_t) with a specified learner (parametric/ML), optionally with cross-fitting; (ii) solve estimating equations for the distal excursion effect parameters (\beta).

This wrapper standardizes inputs and delegates computation to [dcee_helper_2stage_estimation()].

Usage

dcee(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library",
    "set_to_zero"),
  cross_fit = FALSE,
  cf_fold = 10,
  weighting_function = NULL,
  verbose = TRUE,
  ...
)

Arguments

data

A data.frame in long format.

id

Character scalar: column name for subject identifier.

outcome

Character scalar: column name for proximal/distal outcome.

treatment

Character scalar: column name for binary treatment {0,1}.

rand_prob

Character scalar: column name for randomization probability giving P(A_t=1\mid H_t) (must lie in (0,1)).

moderator_formula

RHS-only formula of moderators of the excursion effect (e.g., '~ 1', '~ Z', or '~ Z1 + Z2').

control_formula

RHS-only formula of covariates for learning nuisance outcome regressions. When 'control_reg_method = "gam"', 's(x)' terms are allowed (e.g., '~ x1 + s(x2)'). For SuperLearner methods, variables are extracted from this formula to build the design matrix 'X'.

availability

Optional character scalar: column name for availability indicator (0/1). If 'NULL', availability is taken as 1 for all rows.

control_reg_method

One of '"gam"', '"lm"', '"rf"', '"ranger"', '"sl"', '"sl.user-specified-library"', '"set_to_zero"'. See Details.

cross_fit

Logical; if 'TRUE', perform K-fold cross-fitting by subject id.

cf_fold

Integer; number of folds if 'cross_fit = TRUE' (default 10).

weighting_function

Either a single numeric constant applied to all rows, or a character column name in 'data' giving decision-point weights \omega_t.

verbose

Logical; print minimal preprocessing messages (default 'TRUE').

...

Additional arguments passed through to the chosen learner (e.g., 'num.trees', 'mtry' for random forests; 'sl.library' when 'control_reg_method = "sl.user-specified-library"').

Details

**Learners.** - 'gam' uses mgcv and supports 's(.)' terms in 'control_formula'. - 'lm' uses base stats::lm. - 'rf' uses randomForest; 'ranger' uses ranger. - 'sl' / 'sl.user-specified-library' use SuperLearner. For the former, 'sl.library = c("SL.mean", "SL.glm", "SL.earth")' are used. For the latter, please provide 'sl.library = c("SL.mean", ...)' via '...'.

**Notes.** - Treatment must be coded 0/1; 'rand_prob' must lie strictly in (0,1). - 'control_formula = ~ 1' is only valid with 'control_reg_method = "set_to_zero"'.

Value

An object of class '"dcee_fit"' with components:

call

The matched call to dcee().

fit

A list returned by the two–stage helper with elements:

beta_hat

Named numeric vector of distal causal excursion effect estimates \beta. Names are "Intercept" and the moderator names (if any) from moderator_formula.

beta_se

Named numeric vector of standard errors for beta_hat (same order/names).

beta_varcov

Variance–covariance matrix of beta_hat (square matrix; row/column names match names(beta_hat)).

conf_int

Matrix of large-sample (normal) Wald 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %".

conf_int_tquantile

Matrix of small-sample (t-quantile) 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %"; degrees of freedom are provided in $df of the "dcee_fit" object.

regfit_a0

Stage-1 nuisance regression fit for \mu_0(H_t) (outcome model among A=0), or NULL when control_reg_method = "set_to_zero". Note: when cross_fit = TRUE, this is the learner object from the last fold and is provided for inspection only (do not use for out-of-fold prediction).

regfit_a1

Stage-1 nuisance regression fit for \mu_1(H_t) (outcome model among A=1); same caveats as regfit_a0 regarding cross_fit.

df

Small-sample degrees of freedom used for t-based intervals: number of unique subjects minus length(fit$beta_hat).

Examples

data(data_distal_continuous, package = "MRTAnalysis")

## Fast example: marginal effect with linear nuisance (CRAN-friendly)
fit_lm <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1, # marginal (no moderators)
    control_formula = ~X, # simple linear nuisance
    availability = "avail",
    control_reg_method = "lm",
    cross_fit = FALSE
)
summary(fit_lm)
summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info

## Moderated effect with GAM nuisance (allows smooth terms); may be slower

fit_gam <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~Z, # test moderation by Z
    control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam
    availability = "avail",
    control_reg_method = "gam",
    cross_fit = TRUE, cf_fold = 5
)
summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient
summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info


## Optional: SuperLearner (runs only if installed)

library(SuperLearner)
fit_sl <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1,
    control_formula = ~ X + Z,
    availability = "avail",
    control_reg_method = "sl",
    cross_fit = FALSE
)
summary(fit_sl)


Estimates the causal excursion effect for binary outcome MRT

Description

Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.

Usage

emee(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  numerator_prob = NULL,
  start = NULL,
  verbose = TRUE
)

Arguments

data

A data set in long format.

id

The subject id variable.

outcome

The outcome variable.

treatment

The binary treatment assignment variable.

rand_prob

The randomization probability variable.

moderator_formula

A formula for the moderator variables. This should start with ~ followed by the moderator variables. When set to ~ 1, a fully marginal excursion effect (no moderators) is estimated.

control_formula

A formula for the control variables. This should start with ~ followed by the control variables. When set to ~ 1, only an intercept is included as the control variable.

availability

The availability variable. Use the default value (NULL) if your MRT doesn't have availability considerations.

numerator_prob

Either a number between 0 and 1, or a variable name for a column in data. If you are not sure what this is, use the default value (NULL).

start

A vector of the initial value of the estimators used in the numerical solver. If using default value (NULL), a vector of 0 will be used internally. If specifying a non-default value, this needs to be a numeric vector of length (number of moderator variables including the intercept) + (number of control variables including the intercept).

verbose

If default ('TRUE'), additional messages will be printed during data preprocessing.

Value

An object of type "emee_fit"

Examples


## estimating the fully marginal excursion effect by setting
## moderator_formula = ~ 1
emee(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)

## estimating the causal excursion effect moderated by time_var1
## by setting moderator_formula = ~ time_var1
emee(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~time_var1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)

Estimates the causal excursion effect for binary outcome MRT

Description

Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented. This is a slightly altered version of emee(), where the treatment assignment indicator is also centered in the residual term. It would have similar (but not exactly the same) numerical output as emee(). This is the estimator based on which the sample size calculator for binary outcome MRT is developed. (See R package MRTSampleSizeBinary.)

Usage

emee2(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  numerator_prob = NULL,
  start = NULL,
  verbose = TRUE
)

Arguments

data

A data set in long format.

id

The subject id variable.

outcome

The outcome variable.

treatment

The binary treatment assignment variable.

rand_prob

The randomization probability variable.

moderator_formula

A formula for the moderator variables. This should start with ~ followed by the moderator variables. When set to ~ 1, a fully marginal excursion effect (no moderators) is estimated.

control_formula

A formula for the control variables. This should start with ~ followed by the control variables. When set to ~ 1, only an intercept is included as the control variable.

availability

The availability variable. Use the default value (NULL) if your MRT doesn't have availability considerations.

numerator_prob

Either a number between 0 and 1, or a variable name for a column in data. If you are not sure what this is, use the default value (NULL).

start

A vector of the initial value of the estimators used in the numerical solver. If using default value (NULL), a vector of 0 will be used internally. If specifying a non-default value, this needs to be a numeric vector of length (number of moderator variables including the intercept) + (number of control variables including the intercept).

verbose

If default ('TRUE'), additional messages will be printed during data preprocessing.

Value

An object of type "emee_fit"

Examples

## estimating the fully marginal excursion effect by setting
## moderator_formula = ~ 1
emee2(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)

## estimating the causal excursion effect moderated by time_var1
## by setting moderator_formula = ~ time_var1
emee2(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~time_var1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail"
)

Summary for DCEE fits

Description

Produce inference tables for distal causal excursion effects from a [dcee()] model. By default uses small-sample t-tests with df = object$df (subjects minus number of betas). If df is missing or nonpositive, falls back to large-sample normal (z) inference.

Usage

## S3 method for class 'dcee_fit'
summary(
  object,
  lincomb = NULL,
  conf_level = 0.95,
  show_control_fit = FALSE,
  ...
)

Arguments

object

An object of class "dcee_fit" returned by [dcee()].

lincomb

Optional numeric vector or matrix specifying linear combinations L \beta. If a vector of length p (number of betas), a single linear combination is evaluated. If a matrix, it must have p columns; each row defines one combination. Row names (if present) are used as labels.

conf_level

Confidence level for intervals (default 0.95).

show_control_fit

Logical; if TRUE, include compact information about the Stage-1 nuisance regressions (if available). When cross_fit = TRUE in [dcee()], regfit_a0/regfit_a1 refer to the last fold fit and are provided for inspection only.

...

Currently ignored.

Value

A list of class "summary.dcee_fit" with components:


Summarize Causal Excursion Effect Fits for MRT with Binary Outcomes

Description

summary method for class "emee_fit".

Usage

## S3 method for class 'emee_fit'
summary(
  object,
  lincomb = NULL,
  conf_level = 0.95,
  show_control_fit = FALSE,
  ...
)

Arguments

object

An object of class "emee_fit".

lincomb

A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value.

conf_level

A numeric value indicating the confidence level for confidence intervals. Default to 0.95.

show_control_fit

A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.)

...

Further arguments passed to or from other methods.

Value

the original function call and the estimated causal excursion effect coefficients, confidence interval with conf_level, standard error, t-statistic value, degrees of freedom, and p-value.

Examples

fit <- emee(
    data = data_binary,
    id = "userid",
    outcome = "Y",
    treatment = "A",
    rand_prob = "rand_prob",
    moderator_formula = ~time_var1,
    control_formula = ~ time_var1 + time_var2,
    availability = "avail",
    numerator_prob = 0.5,
    start = NULL
)
summary(fit)

Summarize Causal Excursion Effect Fits for MRT with Continuous Outcomes

Description

summary method for class "wcls_fit".

Usage

## S3 method for class 'wcls_fit'
summary(
  object,
  lincomb = NULL,
  conf_level = 0.95,
  show_control_fit = FALSE,
  ...
)

Arguments

object

An object of class "wcls_fit".

lincomb

A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value.

conf_level

A numeric value indicating the confidence level for confidence intervals. Default to 0.95.

show_control_fit

A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.)

...

Further arguments passed to or from other methods.

Value

the original function call and the estimated causal excursion effect coefficients, 95 value or Wald-statistic value (depending on whether sample size is < 50), degrees of freedom, and p-value.

Examples

fit <- wcls(
    data = data_mimicHeartSteps,
    id = "userid",
    outcome = "logstep_30min",
    treatment = "intervention",
    rand_prob = 0.6,
    moderator_formula = ~1,
    control_formula = ~logstep_pre30min,
    availability = "avail",
    numerator_prob = 0.6
)
summary(fit)

Estimates the causal excursion effect for continuous outcome MRT

Description

Returns the estimated causal excursion effect (on additive scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.

Usage

wcls(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  numerator_prob = NULL,
  verbose = TRUE
)

Arguments

data

A data set in long format.

id

The subject id variable.

outcome

The outcome variable.

treatment

The binary treatment assignment variable.

rand_prob

The randomization probability variable.

moderator_formula

A formula for the moderator variables. This should start with ~ followed by the moderator variables. When set to ~ 1, a fully marginal excursion effect (no moderators) is estimated.

control_formula

A formula for the control variables. This should start with ~ followed by the control variables. When set to ~ 1, only an intercept is included as the control variable.

availability

The availability variable. Use the default value (NULL) if your MRT doesn't have availability considerations.

numerator_prob

Either a number between 0 and 1, or a variable name for a column in data. If you are not sure what this is, use the default value (NULL).

verbose

If default ('TRUE'), additional messages will be printed during data preprocessing.

Value

An object of type "wcls_fit"

Examples

wcls(
    data = data_mimicHeartSteps,
    id = "userid",
    outcome = "logstep_30min",
    treatment = "intervention",
    rand_prob = 0.6,
    moderator_formula = ~1,
    control_formula = ~logstep_pre30min,
    availability = "avail",
    numerator_prob = 0.6
)