% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rctglm.R
\name{rctglm}
\alias{rctglm}
\title{Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation using
influence functions}
\usage{
rctglm(
  formula,
  exposure_indicator,
  exposure_prob,
  data,
  family = gaussian,
  estimand_fun = "ate",
  estimand_fun_deriv0 = NULL,
  estimand_fun_deriv1 = NULL,
  cv_variance = FALSE,
  cv_variance_folds = 10,
  verbose = options::opt("verbose"),
  ...
)
}
\arguments{
\item{formula}{an object of class "formula" (or one that can be coerced to that class):
a symbolic description of the model to be fitted. The details of model specification are
given under ‘Details’ in the \link{glm} documentation.}

\item{exposure_indicator}{(name of) the \emph{binary} variable in \code{data} that
identifies randomisation groups. The variable is required to be binary to
make the "orientation" of the \code{estimand_fun} clear.}

\item{exposure_prob}{a \code{numeric} with the probability of being in
"group 1" (rather than group 0) in groups defined by \code{exposure_indicator}.}

\item{data}{an optional data frame, list or environment (or object coercible
by as.data.frame to a data frame) containing the variables in the model. If
not found in data, the variables are taken from environment(formula), typically
the environment from which the function is called.}

\item{family}{a description of the error distribution and link
    function to be used in the model.  For \code{glm} this can be a
    character string naming a family function, a family function or the
    result of a call to a family function.  For \code{glm.fit} only the
    third option is supported.  (See \code{\link[stats]{family}} for details of
    family functions.)}

\item{estimand_fun}{a \code{function} with arguments \code{psi1} and \code{psi0} specifying
the estimand. Alternative, specify "ate" or "rate_ratio" as a \code{character}
to use one of the default estimand functions. See
more details in the "Estimand" section of \link{rctglm}.}

\item{estimand_fun_deriv0}{a \code{function} specifying the derivative of \code{estimand_fun} wrt. \code{psi0}. As a default
the algorithm will use symbolic differentiation to automatically find the derivative from \code{estimand_fun}}

\item{estimand_fun_deriv1}{a \code{function} specifying the derivative of \code{estimand_fun} wrt. \code{psi1}. As a default
the algorithm will use symbolic differentiation to automatically find the derivative from \code{estimand_fun}}

\item{cv_variance}{a \code{logical} determining whether to estimate the variance
using cross-validation (see details of \link{rctglm}).}

\item{cv_variance_folds}{a \code{numeric} with the number of folds to use for cross
validation if \code{cv_variance} is \code{TRUE}.}

\item{verbose}{\code{numeric} verbosity level. Higher values means more information is
printed in console. A value of 0 means nothing is printed to console during
execution (Defaults to \code{2}, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')}

\item{...}{Additional arguments passed to \code{\link[stats:glm]{stats::glm()}}}
}
\value{
\code{rctglm} returns an object of class inheriting from \code{"rctglm"}.

An object of class \code{rctglm} is a list containing the following components:
\itemize{
\item \strong{\code{estimand}}: A \code{data.frame} with plug-in estimate of estimand, standard
error (SE) estimate and variance estimate of estimand
\item \code{estimand_funs}: A \code{list} with
\itemize{
\item \code{f}: The \code{estimand_fun} used to obtain an estimate of the estimand from counterfactual means
\item \code{d0}: The derivative with respect to \code{psi0}
\item \code{d1}: The derivative with respect to \code{psi1}
}
\item \code{means_counterfactual}: A \code{data.frame} with counterfactual means \code{psi0} and \code{psi1}
\item \code{fitted.values_counterfactual}: A \code{data.frame} with counterfactual mean
values, obtained by transforming the linear predictors for each group
by the inverse of the link function.
\item \code{glm}: A \code{glm} object returned from running \link[stats:glm]{stats::glm} within the procedure
\item \code{call}: The matched \code{call}
}
}
\description{
The procedure uses plug-in-estimation and influence functions to perform robust inference of any specified
estimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect of
covariates in randomisation groups. See
\href{https://arxiv.org/abs/2503.22284}{Powering RCTs for marginal effects with GLMs using prognostic score adjustment}
by Højbjerre-Frandsen et. al (2025) for more details on methodology.
}
\details{
The procedure assumes the setup of a randomised clinical trial with observations grouped by a binary
\code{exposure_indicator} variable, allocated randomly with probability \code{exposure_prob}. A GLM is
fit and then used to predict the response of all observations in the event that the \code{exposure_indicator}
is 0 and 1, respectively. Taking means of these predictions produce the \emph{counterfactual means}
\code{psi0} and \code{psi1}, and an estimand \code{r(psi0, psi1)} is calculated using any specified \code{estimand_fun}.

The variance of the estimand is found by taking the variance of the influence function of the estimand.
If \code{cv_variance} is \code{TRUE}, then the counterfactual predictions for each observation (which are
used to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictions
using cross validation with number of folds specified by \code{cv_variance_folds}. The cross validation splits
are performed using stratified sampling with \code{exposure_indicator} as the \code{strata} argument in \link[rsample:vfold_cv]{rsample::vfold_cv}.

Read more in \code{vignette("model-fit")}.
}
\section{Estimands}{

As noted in the description, \code{psi0} and \code{psi1} are the counterfactual means found by prediction using
a fitted GLM in the binary groups defined by \code{exposure_indicator}.

Default estimand functions can be specified via \code{"ate"} (which uses the function
\code{function(psi1, psi0) psi1-psi0}) and \code{"rate_ratio"} (which uses the function
\code{function(psi1, psi0) psi1/psi0}). See more information on specifying the \code{estimand_fun}
in \code{vignette("model-fit")}.

As a default, the \code{Deriv} package is used to perform symbolic differentiation to find the derivatives of
the \code{estimand_fun}.
}

\examples{
# Generate some data to showcase example
n <- 100
exp_prob <- .5

dat_gaus <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = gaussian()
)

# Fit the model
ate <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_gaus,
              family = gaussian)

# Pull information on estimand
estimand(ate)

## Another example with different family and specification of estimand_fun
dat_binom <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = binomial()
)

rr <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial(),
              estimand_fun = "rate_ratio")

odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))
or <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial,
              estimand_fun = odds_ratio)

}
\seealso{
See how to extract information using methods in \link{rctglm_methods}.

Use \code{\link[=rctglm_with_prognosticscore]{rctglm_with_prognosticscore()}} to include prognostic covariate adjustment.

See vignettes
}
