Help for package pammtools

Title:

Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis

Version:

0.8.0

Date:

2026-06-17

Description:

The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi:10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated data as well as competing risks, recurrent events and multi-state settings. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization.

Depends:

R (≥ 4.1.0)

Imports:

mgcv, survival (≥ 2.39-5), checkmate, magrittr, rlang, tidyr (≥ 1.0.0), ggplot2 (≥ 3.2.2), dplyr (≥ 1.0.0), purrr (≥ 0.2.3), tibble, lazyeval, Formula, mvtnorm, pec, vctrs (≥ 0.3.0), scam

Suggests:

testthat, mstate, broom, etm, xgboost

Config/Needs/website:

coxme, eha, etm, scam, msm, mvna, rjags, brms, xgboost, TBFmultinomial

License:

MIT + file LICENSE

LazyData:

true

URL:

https://adibender.github.io/pammtools/

BugReports:

https://github.com/adibender/pammtools/issues

Encoding:

UTF-8

Config/roxygen2/version:

8.0.0

NeedsCompilation:

Packaged:

2026-06-22 11:37:49 UTC; abender

Author:

Andreas Bender

[aut, cre], Fabian Scheipl

[aut], Johannes Piller

[aut], Philipp Kopper

[aut], Lukas Burk

[ctb]

Maintainer:

Andreas Bender <andreas.bender@stat.uni-muenchen.de>

Repository:

CRAN

Date/Publication:

2026-06-22 12:50:15 UTC

pammtools: Piece-wise exponential Additive Mixed Modeling tools.

Description

pammtools provides functions and utilities that facilitate fitting Piece-wise Exponential Additive Mixed Models (PAMMs), including data transformation and other convenience functions for pre- and post-processing as well as plotting.

Details

The best way to get an overview of the functionality provided and how to fit PAMMs is to view the vignettes available at https://adibender.github.io/pammtools/articles/. A summary of the vignettes' content is given below:

basics: Introduction to PAMMs and basic modeling.
baseline: Shows how to estimate and visualize baseline model (without covariates) and comparison to respective Cox-PH model.
convenience: Convenience functions for post-processing and plotting PAMMs.
data-transformation: Transforming data into a format suitable to fit PAMMs.
frailty: Specifying "frailty" terms, i.e., random effects for PAMMs.
splines: Specifying spline smooth terms for PAMMs.
strata: Specifying stratified models in which each level of a grouping variable has a different baseline hazard.
tdcovar: Dealing with time-dependent covariates.
tveffects: Specifying time-varying effects.
left-truncation: Estimation for left-truncated data.
competing-risks: Competing risks analysis.

Author(s)

Maintainer: Andreas Bender andreas.bender@stat.uni-muenchen.de (ORCID)

Authors:

Andreas Bender andreas.bender@stat.uni-muenchen.de (ORCID)
Fabian Scheipl fabian.scheipl@stat.uni-muenchen.de (ORCID)
Johannes Piller johannes.piller@lmu.de (ORCID)
Philipp Kopper philipp.kopper@stat.uni-muenchen.de (ORCID)

Other contributors:

Lukas Burk burk@leibniz-bips.de (ORCID) [contributor]

References

Bender, Andreas, Andreas Groll, and Fabian Scheipl. 2018. “A Generalized Additive Model Approach to Time-to-Event Analysis” Statistical Modelling, February. https://doi.org/10.1177/1471082X17748083.

Bender, Andreas, Fabian Scheipl, Wolfgang Hartl, Andrew G. Day, and Helmut Küchenhoff. 2019. “Penalized Estimation of Complex, Non-Linear Exposure-Lag-Response Associations.” Biostatistics 20 (2): 315–31. https://doi.org/10.1093/biostatistics/kxy003.

Bender, Andreas, and Fabian Scheipl. 2018. “pammtools: Piece-Wise Exponential Additive Mixed Modeling Tools.” ArXiv:1806.01042 Stat, June. https://arxiv.org/abs/1806.01042. Ramjith J, Bender A, Roes KCB, Jonker MA. Recurrent events analysis with piece-wise exponential additive mixed models. 2022. Statistical Modelling., 2022

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Add cumulative incidence function to data

Description

Add cumulative incidence function to data

Usage

add_cif(newdata, object, ...)

## Default S3 method:
add_cif(
  newdata,
  object,
  ci = TRUE,
  overwrite = FALSE,
  alpha = 0.05,
  nsim = 500L,
  cause_var = "cause",
  time_var = NULL,
  interval_length = "intlen",
  ...
)

## S3 method for class 'pamm_ic'
add_cif(
  newdata,
  object,
  ci = TRUE,
  alpha = 0.05,
  nsim = 500L,
  cause_var = "cause",
  time_var = NULL,
  interval_length = "intlen",
  ...
)

Arguments

newdata

A data frame or list containing the values of the model covariates at which predictions are required. If this is not provided then predictions corresponding to the original data are returned. If newdata is provided then it should contain all the variables needed for prediction: a warning is generated if not. See details for use with link{linear.functional.terms}.

object

a fitted gam object as produced by gam().

...

Further arguments passed to predict.gam and get_hazard

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

overwrite

Should hazard columns be overwritten if already present in the data set? Defaults to FALSE. If TRUE, columns with names c("hazard", "se", "lower", "upper") will be overwritten.

alpha

Significance level for pooled confidence intervals.

nsim

Total number of pooled posterior draws used for the interval.

cause_var

Character. Column name of the 'cause' variable.

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

interval_length

Character, defaults to "intlen". contains the interval length in newdata.

Details

When computing cumulative incidence for multiple groups, the input data must be grouped via group_by() before calling this function. Omitting group_by() will not produce an error or warning but will return silently incorrect results, as the cumulative incidence will be accumulated over the entire dataset rather than within each group.

The returned data contains one boundary row per group at time_var = 0 for plotting cumulative incidence from the time origin. On this row, cif = 0; if confidence intervals are requested, cif_lower = cif_upper = 0. If an interval-length column is present, it is set to 0 on the boundary row. add_cumu_hazard() adds an analogous boundary row (with cumu_hazard = 0) for continuous-time models (GAM/SCAM/PAMM), controllable via its boundary argument; interval-factor models (e.g. PEM via glm) keep the original prediction grid without a boundary row.

Examples


if (require("etm")) {
  data("fourD", package = "etm")
  ped_stacked <- fourD |>
    dplyr::select(-medication, -treated) |>
    as_ped(Surv(time, status) ~., id = "id") |>
    dplyr::mutate(cause = as.factor(cause))
  pam <- pamm(
    ped_status ~ s(tend, by = cause) + sex + sex:cause + age + age:cause,
    data = ped_stacked)
  ped_stacked |>
    make_newdata(tend = unique(tend), cause = unique(cause)) |>
    group_by(cause) |>
    add_cif(pam)
}

Add counterfactual observations for possible transitions

Description

If data only contains one row per transition that took place, this function adds additional rows for each transition that was possible at that time (for each subject in the data).

Usage

add_counterfactual_transitions(
  data,
  from_to_pairs = list(),
  from_col = "from",
  to_col = "to",
  transition_col = "transition"
)

Arguments

data

Data set that only contains rows for transitions that took place.

from_to_pairs

A list with one element for each possible initial state. The values of each list element indicate possible transitions from that state. Will be calculated from the data if unspecified.

from_col

Name of the column that stores initial state.

to_col

Name of the column that stores end state.

transition_col

Name of the column that contains the transition identifier (factor variable).

Add predicted (cumulative) hazard to data set

Description

Add (cumulative) hazard based on the provided data set and model. If ci=TRUE confidence intervals (CI) are also added. Their width can be controlled via the se_mult argument. The method by which the CI are calculated can be specified by ci_type. This is a wrapper around predict.gam. When reference is specified, the (log-)hazard ratio is calculated. In addition to models fit with gam/bam or glm, shape-constrained additive models fit with scam are supported (e.g., for monotone baseline hazards). For scam models all calculations (including delta-method and simulation based confidence intervals) are based on the re-parametrized coefficients and their covariance matrix, i.e., on the same normal approximation that underlies the standard errors reported by scam itself.

Usage

add_hazard(newdata, object, ...)

## Default S3 method:
add_hazard(
  newdata,
  object,
  reference = NULL,
  type = c("response", "link"),
  ci = TRUE,
  se_mult = 2,
  ci_type = c("default", "delta", "sim"),
  overwrite = FALSE,
  time_var = NULL,
  nsim = 100L,
  alpha = 0.05,
  ...
)

add_cumu_hazard(newdata, object, ...)

## Default S3 method:
add_cumu_hazard(
  newdata,
  object,
  ci = TRUE,
  se_mult = 2,
  overwrite = FALSE,
  time_var = NULL,
  interval_length = "intlen",
  boundary = TRUE,
  ...
)

## S3 method for class 'pamm_ic'
add_hazard(
  newdata,
  object,
  ci = TRUE,
  alpha = 0.05,
  nsim = 500L,
  time_var = NULL,
  ...
)

## S3 method for class 'pamm_ic'
add_cumu_hazard(
  newdata,
  object,
  ci = TRUE,
  alpha = 0.05,
  nsim = 500L,
  time_var = NULL,
  interval_length = "intlen",
  ...
)

Arguments

newdata

object

a fitted gam object as produced by gam().

...

Further arguments passed to predict.gam and get_hazard

reference

A data frame with number of rows equal to nrow(newdata) or one, or a named list with (partial) covariate specifications. See examples.

type

Either "response" or "link". The former calculates hazard, the latter the log-hazard.

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

se_mult

Factor by which standard errors are multiplied for calculating the confidence intervals.

ci_type

The method by which standard errors/confidence intervals will be calculated. Default transforms the linear predictor at respective intervals. "delta" calculates CIs based on the standard error calculated by the Delta method. "sim" draws the property of interest from its posterior based on the normal distribution of the estimated coefficients. See here for details and empirical evaluation. For ci_type = "sim", interval bounds are empirical quantiles (type 6, see quantile) of nsim posterior draws (default nsim = 100L, passed via ...). Type-6 quantiles avoid the systematic inward bias that the quantile default (type 7) exhibits for small nsim, but at the default nsim = 100 the bounds are estimated from few tail draws and thus noisy; increase nsim (e.g., to 500 or more) for more stable interval bounds. Very small nsim (nsim < 2 / alpha - 1, i.e., below 39 for alpha = 0.05) cannot achieve the nominal level at all.

overwrite

Should hazard columns be overwritten if already present in the data set? Defaults to FALSE. If TRUE, columns with names c("hazard", "se", "lower", "upper") will be overwritten.

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

nsim

Total number of pooled posterior draws used for the interval.

alpha

Significance level for pooled confidence intervals (a (1-\alpha) interval).

interval_length

The variable in newdata containing the interval lengths. Can be either bare unquoted variable name or character. Defaults to "intlen".

boundary

Logical. If TRUE (default), a boundary row at time = 0 with cumulative hazard 0 is prepended (per group), so that cumulative hazards start at the natural origin (consistent with add_surv_prob, add_cif and add_trans_prob).

Details

When computing cumulative hazards or survival probabilities across groups, the input data must be grouped via group_by() prior to calling add_cumu_hazard() or add_surv_prob(). Omitting group_by() will not produce an error or warning but will return silently incorrect results, as the cumulative hazard will be accumulated over the entire dataset rather than within each group. See the workflow vignette for a worked example.

Examples

ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age)
pam <- mgcv::gam(ped_status ~ s(tend)+age, data = ped, family=poisson(), offset=offset)
ped_info(ped) %>% add_hazard(pam, type="link")
ped_info(ped) %>% add_hazard(pam, type = "response")
ped_info(ped) %>% add_cumu_hazard(pam)

Turn exact event times into interval-censored observations

Description

Convenience helper to manufacture interval-censored (panel) data from exact simulated survival times (e.g.\ the output of sim_pexp), for coverage studies and examples. Each subject is "inspected" at a sequence of times; the true event time is then only known to lie between the last clean and the first positive inspection. The exact time is retained (by default in column true_time) so that coverage can be scored against the truth.

Usage

add_inspections(
  data,
  time_var = "time",
  status_var = "status",
  mechanism = c("random", "fixed", "mixed"),
  rate = 1,
  schedule = NULL,
  max_time = NULL,
  terminal_exam = TRUE,
  keep_truth = TRUE,
  L = "L",
  R = "R"
)

Arguments

data

A data frame with one row per subject containing an exact event time and a status indicator (as produced by sim_pexp).

time_var, status_var

Names of the (exact) event-time and status columns. status_var may be missing, in which case all rows are treated as events.

mechanism

Inspection mechanism: "random" (default) draws inter-inspection gaps from an Exp(rate) distribution; "fixed" uses the common grid given in schedule; "mixed" jitters the fixed grid by a random offset per subject.

rate

Inspection rate for mechanism = "random" / "mixed" (expected gap 1/\mathrm{rate}).

schedule

Numeric vector of inspection times for mechanism = "fixed"/"mixed".

max_time

Inspection horizon. Defaults to max(data[[time_var]]).

terminal_exam

Logical; if TRUE (default), every subject is additionally examined at max_time (an end-of-study examination), so events before max_time always have a finite upper bound and only subjects event-free at max_time are right-censored. If FALSE, there is no closing examination: events after a subject's last inspection are right-censored at that inspection, and subjects that exit event-free (status == 0) are likewise right-censored at their last inspection before exit (not at their exact exit time). Both conventions yield coarsening-at-random data; mixing them (exact exit times for survivors but open intervals for undetected events) would make the right-censoring informative and bias every interval-censoring likelihood.

keep_truth

Logical; keep the exact event time in true_time.

L, R

Names of the created lower/upper bound columns.

Value

data augmented with interval bounds in columns L and R (and true_time). Use Surv(L, R, type = "interval2") on the result.

Examples


set.seed(1)
df <- data.frame(x = runif(100, -1, 1))
sdf <- sim_pexp(~ -2 + 0.4 * x, df, cut = seq(0, 10, by = 0.5))
icd <- add_inspections(sdf, rate = 1)
fit <- pamm_ic(Surv(L, R, type = "interval2") ~ x, icd, m = 5)

Add survival probability estimates

Description

Given suitable data (i.e. data with all columns used for estimation of the model), this functions adds a column surv_prob containing survival probabilities for the specified covariate and follow-up information (and CIs surv_lower, surv_upper if ci=TRUE).

Usage

add_surv_prob(newdata, object, ...)

## Default S3 method:
add_surv_prob(
  newdata,
  object,
  ci = TRUE,
  se_mult = 2,
  overwrite = FALSE,
  time_var = NULL,
  interval_length = "intlen",
  boundary = TRUE,
  ...
)

## S3 method for class 'pamm_ic'
add_surv_prob(
  newdata,
  object,
  ci = TRUE,
  alpha = 0.05,
  nsim = 500L,
  time_var = NULL,
  interval_length = "intlen",
  ...
)

Arguments

newdata

object

a fitted gam object as produced by gam().

...

Further arguments passed to predict.gam and get_hazard

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

se_mult

Factor by which standard errors are multiplied for calculating the confidence intervals.

overwrite

Should hazard columns be overwritten if already present in the data set? Defaults to FALSE. If TRUE, columns with names c("hazard", "se", "lower", "upper") will be overwritten.

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

interval_length

The variable in newdata containing the interval lengths. Can be either bare unquoted variable name or character. Defaults to "intlen".

boundary

alpha

Significance level for pooled confidence intervals.

nsim

Total number of pooled posterior draws used for the interval.

Details

The returned data contains one boundary row per group at time_var = 0 for plotting cumulative quantities from the time origin. On this row, surv_prob = 1; if confidence intervals are requested, surv_lower = surv_upper = 1. If an interval-length column is present, it is set to 0 on the boundary row.

Examples

ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age)
pam <- mgcv::gam(ped_status ~ s(tend)+age, data=ped, family=poisson(), offset=offset)
ped_info(ped) %>% add_surv_prob(pam, ci=TRUE)

Add time-dependent covariate to a data set

Description

Given a data set in standard format (with one row per subject/observation), this function adds a column with the specified exposure time points and a column with respective exposures, created from rng_fun. This function should usually only be used to create data sets passed to sim_pexp.

Usage

add_tdc(data, tz, rng_fun, ...)

Arguments

data

A data set with variables specified in formula.

tz

A numeric vector of exposure times (relative to the beginning of the follow-up time t)

rng_fun

A random number generating function that creates the time-dependent covariates at time points tz. First argument of the function should be n, the number of random numbers to generate. Within add_tdc, n will be set to length(tz).

...

Currently not used.

Embeds the data set with the specified (relative) term contribution

Description

Adds the contribution of a specific term to the linear predictor to the data specified by newdata. Essentially a wrapper to predict.gam, with type="terms". Thus most arguments and their documentation below is from predict.gam. Shape-constrained additive models fit with scam are supported as well.

Usage

add_term(newdata, object, term, reference = NULL, ci = TRUE, se_mult = 2, ...)

Arguments

newdata

object

a fitted gam object as produced by gam().

term

A character (vector) or regular expression indicating for which term(s) information should be extracted and added to data set.

reference

A data frame with number of rows equal to nrow(newdata) or one, or a named list with (partial) covariate specifications. See examples.

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

se_mult

The factor by which standard errors are multiplied to form confidence intervals.

...

Further arguments passed to predict.gam

Examples

library(ggplot2)
ped <- as_ped(tumor, Surv(days, status)~ age, cut = seq(0, 2000, by = 100))
pam <- mgcv::gam(ped_status ~ s(tend) + s(age), family = poisson(),
  offset = offset, data = ped)
#term contribution for sequence of ages
s_age <- ped %>% make_newdata(age = seq_range(age, 50)) %>%
  add_term(pam, term = "age")
ggplot(s_age, aes(x = age, y = fit)) + geom_line() +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)
# term contribution relative to mean age
s_age2 <- ped %>% make_newdata(age = seq_range(age, 50)) %>%
  add_term(pam, term = "age", reference = list(age = mean(.$age)))
ggplot(s_age2, aes(x = age, y = fit)) + geom_line() +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)

Add transition probabilities confidence intervals

Description

Add transition probabilities confidence intervals

Usage

add_trans_ci(newdata, object, nsim = 100L, alpha = 0.05, ...)

Add transition probabilities

Description

add_trans_prob adds transition probabilities on the provided data set and model. Optionally, confidence intervals (CI) are added if ci=TRUE. The function builds on cumulative hazards cumu_hazard and mgcv::gam models.

Usage

add_trans_prob(
  newdata,
  object,
  overwrite = FALSE,
  ci = FALSE,
  alpha = 0.05,
  nsim = 100L,
  time_var = "tend",
  interval_length = "intlen",
  transition = "transition",
  ...
)

Arguments

newdata

A data frame or list containing the values of the model covariates at which predictions are required. If this is not provided then predictions corresponding to the original data are returned. If newdata is provided then it should contain all the variables needed for prediction: a warning is generated if not. See details for use with linear.functional.terms.

object

A fitted gam object as produced by mgcv::gam

overwrite

Should transition probability columns be overwritten if already present in the data set? Defaults to FALSE. If TRUE, columns with names c("trans_prob", "trans_upper", "trans_lower") will be overwritten.

ci

Logical, defaults to TRUE. Decides if confidence intervals for transition probabilities are calculated.

alpha

Sets the confidence intervals' \alpha level, Defaults to 0.05

nsim

Sets the number of iterations for simulated confidence intervals. Defaults to 100L. Interval bounds are empirical type-6 quantiles of the nsim draws; larger values of nsim yield more stable interval bounds.

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

interval_length

Character, defaults to "intlen". contains the interval length in newdata.

transition

Character, defaults to "transition". contains the transition labels in newdata.

...

Further arguments passed to underlying methods.

Details

When computing transition probabilities for multiple groups, the input data must be grouped via group_by() before calling this function. Omitting group_by() will not produce an error or warning but will return silently incorrect results, as the transition probability will be accumulated over the entire dataset rather than within each group.

The returned data contains one boundary row per group and transition at time_var = 0 for plotting transition probabilities from the time origin. On this row, trans_prob = 0; if confidence intervals are requested, trans_lower = trans_upper = 0. If an interval-length column is present, it is set to 0 on the boundary row.

Examples


  data("prothr", package = "mstate")
  prothr <- prothr |>
    mutate(transition = as.factor(paste0(from, "->", to))
    , treat = as.factor(treat)) |>
    filter(Tstart != Tstop, id <= 100) |> select(-trans)
  ped <- as_ped(data= prothr, formula= Surv(Tstart, Tstop, status)~ .,
    transition = "transition", id= "id", timescale  = "calendar")
  pam <- mgcv::bam(ped_status ~ s(tend, by=transition) + transition * treat,
    data = ped, family = poisson(), offset = offset,
    method = "fREML", discrete = TRUE)
  ndf <- make_newdata(ped, tend  = unique(tend),
    treat  = unique(treat),
    transition = unique(transition)) |>
    group_by(treat, transition) |>  # important!
    add_trans_prob(pam)

Transform crps object to data.frame

Description

Aas.data.frame S3 method for objects of class crps.

Usage

## S3 method for class 'crps'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

Arguments

x

An object of class crps. See crps.

row.names

NULL or a character vector giving the row names for the data frame. Missing values are not allowed.

optional

logical. If TRUE, setting row names and converting column names (to syntactic names: see make.names) is optional. Note that all of R's base package as.data.frame() methods use optional only for column names treatment, basically with the meaning of data.frame(*, check.names = !optional). See also the make.names argument of the matrix method.

...

additional arguments to be passed to or from methods.

Transform data to Piece-wise Exponential Data (PED)

Description

This is the general data transformation function provided by the pammtools package. The following main applications must be distinguished:

Transformation of standard time-to-event data.
Transformation of left-truncated time-to-event data.
Transformation of time-to-event data with time-dependent covariates (TDC).
Transformation of competing risks data (single or stacked data sets).
Transformation of recurrent events and multi-state data.

For TDC data, the type of effect one wants to estimate is also important for the data transformation step. In case of TDCs, the right-hand-side of the formula can contain formula specials concurrent and cumulative.

Usage

as_ped(data, ...)

## S3 method for class 'data.frame'
as_ped(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  transition = character(),
  timescale = c("gap", "calendar"),
  min_events = 1L,
  ...
)

## S3 method for class 'nested_fdf'
as_ped(data, formula, ...)

## S3 method for class 'list'
as_ped(
  data,
  formula,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  ...
)

is.ped(x)

## S3 method for class 'ped'
as_ped(data, newdata, ...)

## S3 method for class 'pamm'
as_ped(data, newdata, ...)

as_ped_multistate(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  transition = character(),
  timescale = c("gap", "calendar"),
  min_events = 1L,
  ...
)

Arguments

data

Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed.

...

Further arguments passed to the data.frame method and eventually to survSplit. Notably, id (character string) sets the name of the subject identifier variable in data, and, for competing risks data, combine (logical, default TRUE) controls whether cause-specific data sets are stacked into a single data frame with an additional cause column (TRUE) or returned as a list of cause-specific data sets (FALSE); see the competing-risks vignette for details.

formula

A two sided formula with a Surv object on the left-hand-side and covariate specification on the right-hand-side (RHS). The RHS can be an extended formula, which specifies how TDCs should be transformed using specials concurrent and cumulative. The left-hand-side can be in start-stop notation. This, however, is only used to create left-truncated data and does not support the full functionality.

cut

Split points, used to partition the follow-up into intervals. If unspecified, all unique event times will be used. For competing risks, when combine = TRUE split points are derived from all event types combined.

max_time

If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.

tdc_specials

A character vector of names of potential specials in formula for concurrent and/or cumulative effects.

censor_code

Specifies the value of the status variable that indicates censoring. Often this will be 0, which is the default.

transition

Character string. Name of the column in data that identifies the transition type in multi-state models. When supplied, as_ped performs the multi-state PED transformation, stacking interval-transition rows for each subject.

timescale

Character string, either "gap" (time since last transition) or "calendar" (time since study entry, not reset after each transition).

x

any R object.

newdata

A new data set (data.frame) that contains the same variables that were used to create the PED object (data).

Details

For competing risks data, as_ped can return either:

A list of cause-specific data sets (combine = FALSE), where each element corresponds to one event type and uses cause-specific interval split points. This is suitable for cause-specific hazards models without shared effects.
A single stacked data set (combine = TRUE, the default), where all cause-specific data sets are combined with a cause column as covariate. Common split points are derived from all event times. This is required for models with shared covariate effects across causes, estimated via interaction terms (e.g., s(tend, by = cause)).

For multi-state data, as_ped extends the standard PED transformation to each transition type. The follow-up of each subject is split at all observed transition times across the entire dataset, and a row is added for every interval-transition combination the subject is at risk for. Two key differences arise compared to the single-event case:

Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.
Competing events are treated as censoring for all other transitions within the same interval.

In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.

Value

For standard and left-truncated data, a data frame of class ped in piece-wise exponential data format. For competing risks data, either a stacked data frame of class ped_cr (when combine = TRUE) or a list of cause-specific ped data frames of class ped_cr_list (when combine = FALSE). For multistate data, the result is a stacked long-format dataset with one row per subject, interval, and transition, which can be passed directly to a Poisson regression model.

Examples

# Standard single-event transformation
tumor[1:3, ]
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)

# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
## Not run: 
data("fourD", package = "etm")
ped_stacked <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)

# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])

# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
  filter(Tstart != Tstop) %>%
  as_ped(
    formula    = Surv(Tstart, Tstop, status) ~ .,
    transition = "trans",
    id         = "id",
    timescale  = "calendar",
)
head(ped_msm)

## End(Not run)
## Not run: 
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
 select(id, tstart, tstop, enum, status, age) %>%
 filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
  formula = Surv(tstart, tstop, status) ~ age + enum,
  data = cgd2,
 transition = "enum",
 timescale = "calendar")

## End(Not run)

Competing risks trafo

Description

This is the general data transformation function provided by the pammtools package. The following main applications must be distinguished:

Transformation of standard time-to-event data.
Transformation of left-truncated time-to-event data.
Transformation of time-to-event data with time-dependent covariates (TDC).
Transformation of competing risks data (single or stacked data sets).
Transformation of recurrent events and multi-state data.

Usage

as_ped_cr(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  combine = TRUE,
  ...
)

Arguments

data

formula

cut

max_time

If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.

tdc_specials

A character vector of names of potential specials in formula for concurrent and/or cumulative effects.

censor_code

Specifies the value of the status variable that indicates censoring. Often this will be 0, which is the default.

combine

Logical. If TRUE (the default), cause-specific data sets are stacked into a single data frame with an additional cause column, using split points common to all event types. If FALSE, a list of cause-specific data sets is returned.

...

Details

For competing risks data, as_ped can return either:

A list of cause-specific data sets (combine = FALSE), where each element corresponds to one event type and uses cause-specific interval split points. This is suitable for cause-specific hazards models without shared effects.
A single stacked data set (combine = TRUE, the default), where all cause-specific data sets are combined with a cause column as covariate. Common split points are derived from all event times. This is required for models with shared covariate effects across causes, estimated via interaction terms (e.g., s(tend, by = cause)).

Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.
Competing events are treated as censoring for all other transitions within the same interval.

In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.

Value

Examples

# Standard single-event transformation
tumor[1:3, ]
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)

# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
## Not run: 
data("fourD", package = "etm")
ped_stacked <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)

# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])

# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
  filter(Tstart != Tstop) %>%
  as_ped(
    formula    = Surv(Tstart, Tstop, status) ~ .,
    transition = "trans",
    id         = "id",
    timescale  = "calendar",
)
head(ped_msm)

## End(Not run)
## Not run: 
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
 select(id, tstart, tstop, enum, status, age) %>%
 filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
  formula = Surv(tstart, tstop, status) ~ age + enum,
  data = cgd2,
 transition = "enum",
 timescale = "calendar")

## End(Not run)

Calculate confidence intervals

Description

Given 2 column matrix or data frame, returns 3 column data.frame with coefficient estimate plus lower and upper borders of the 95% confidence intervals.

Usage

calc_ci(ftab)

Arguments

ftab

A table with two columns, containing coefficients in the first column and standard-errors in the second column.

Create a data frame from all combinations of data frames

Description

Works like expand.grid but for data frames.

Usage

combine_df(...)

Arguments

...

Data frames that should be combined to one data frame. Elements of first df vary fastest, elements of last df vary slowest.

Examples

combine_df(
  data.frame(x=1:3, y=3:1),
  data.frame(x1=c("a", "b"), x2=c("c", "d")),
  data.frame(z=c(0, 1)))

Calculate difference in cumulative hazards and respective standard errors

Description

CIs are calculated by sampling coefficients from their posterior and calculating the cumulative hazard difference nsim times. The CI are obtained by the 2.5\

Usage

compute_cumu_diff(
  d1,
  d2,
  model,
  alpha = 0.05,
  nsim = 100L,
  time_var = "tend",
  interval_length = "intlen"
)

Arguments

d1

A data set used as newdata in make_X

d2

See d1

model

A model object for which a predict method is implemented which returns the design matrix (e.g., mgcv::gam).

Formula specials for defining time-dependent covariates

Description

So far, two specials are implemented. concurrent is used when the goal is to estimate a concurrent effect of the TDC. cumulative is used when the goal is to estimate a cumulative effect of the TDC. These should usually not be called directly but rather as part of the formula argument to as_ped. See the vignette on data transformation for details.

Usage

cumulative(..., tz_var, ll_fun = function(t, tz) t >= tz, suffix = NULL)

concurrent(..., tz_var, lag = 0, suffix = NULL)

has_special(formula, special = "cumulative")

Arguments

...

For concurrent variables that will be transformed to covariate matrices. The number of columns of each covariate depends on tz. Usually, elements that will be specified here are time (which should be the name of the time-variable used on the LHS of the formula argument to as_ped), tz which is the variable containing information on the times at which the TDC was observed (can be wrapped in latency) and the TDCs that share the same tz and Lag-lead window (ll_fun).

tz_var

The name of the variable that stores information on the times at which the TDCs specified in this term where observed.

ll_fun

Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure.

lag

a single positive number giving the time lag between for a concurrent effect to occur (i.e., the TDC at time of exposure t-lag affects the hazard in the interval containing follow-up time t). Defaults to 0.

formula

special

The name of the special whose existence in the formula should be checked

Time-dependent covariates of the `patient` data set.

Description

This data set contains the time-dependent covariates (TDCs) for the patient data set. Note that nutrition was protocoled for at most 12 days after ICU admission. The data set includes:

CombinedID: Unique patient identifier. Can be used to merge with patient data
Study_Day: The calendar (!) day at which calories (or proteins) were administered
caloriesPercentage: The percentage of target calories supplied to the patient by the ICU staff
proteinGproKG: The amount of protein supplied to the patient by the ICU staff

Usage

daily

Format

An object of class tbl_df (inherits from tbl, data.frame) with 18797 rows and 4 columns.

`dplyr` Verbs for `ped`-Objects

Description

See dplyr documentation of the respective functions for description and examples.

Usage

## S3 method for class 'ped'
arrange(.data, ...)

## S3 method for class 'ped'
group_by(.data, ..., .add = FALSE)

## S3 method for class 'ped'
ungroup(x, ...)

## S3 method for class 'ped'
distinct(.data, ..., .keep_all = FALSE)

## S3 method for class 'ped'
filter(.data, ...)

## S3 method for class 'ped'
sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)

## S3 method for class 'ped'
sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)

## S3 method for class 'ped'
slice(.data, ...)

## S3 method for class 'ped'
select(.data, ...)

## S3 method for class 'ped'
mutate(.data, ...)

## S3 method for class 'ped'
rename(.data, ...)

## S3 method for class 'ped'
summarise(.data, ...)

## S3 method for class 'ped'
summarize(.data, ...)

## S3 method for class 'ped'
transmute(.data, ...)

## S3 method for class 'ped'
inner_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL,
  na_matches = c("na", "never"),
  multiple = "all",
  unmatched = "drop",
  relationship = NULL
)

## S3 method for class 'ped'
full_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL,
  na_matches = c("na", "never"),
  multiple = "all",
  relationship = NULL
)

## S3 method for class 'ped'
left_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL,
  na_matches = c("na", "never"),
  multiple = "all",
  unmatched = "drop",
  relationship = NULL
)

## S3 method for class 'ped'
right_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL,
  na_matches = c("na", "never"),
  multiple = "all",
  unmatched = "drop",
  relationship = NULL
)

Arguments

.data

an object of class ped, see as_ped.

...

see dplyr documentation

x

an object of class ped, see as_ped.

tbl

an object of class ped, see as_ped.

size

<tidy-select> For sample_n(), the number of rows to select. For sample_frac(), the fraction of rows to select. If tbl is grouped, size applies to each group.

replace

Sample with or without replacement?

weight

<tidy-select> Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

.env

DEPRECATED.

by

A join specification created with join_by(), or a character vector of variables to join by.

If NULL, the default, ⁠*_join()⁠ will perform a natural join, using all variables in common across x and y. A message lists the variables so that you can check they're correct; suppress the message by supplying by explicitly.

To join on different variables between x and y, use a join_by() specification. For example, join_by(a == b) will match x$a to y$b.

To join by multiple variables, use a join_by() specification with multiple expressions. For example, join_by(a == b, c == d) will match x$a to y$b and x$c to y$d. If the column names are the same between x and y, you can shorten this by listing only the variable names, like join_by(a, c).

join_by() can also be used to perform inequality, rolling, and overlap joins. See the documentation at ?join_by for details on these types of joins.

For simple equality joins, you can alternatively specify a character vector of variable names to join by. For example, by = c("a", "b") joins x$a to y$a and x$b to y$b. If variable names differ between x and y, use a named character vector like by = c("x_a" = "y_a", "x_b" = "y_b").

To perform a cross-join, generating all combinations of x and y, see cross_join().

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

keep

Should the join keys from both x and y be preserved in the output?

If NULL, the default, joins on equality retain only the keys from x, while joins on inequality retain the keys from both inputs.
If TRUE, all keys from both inputs are retained.
If FALSE, only keys from x are retained. For right and full joins, the data in key columns corresponding to rows that only exist in y are merged into the key columns from x. Can't be used when joining on inequality conditions.

na_matches

Should two NA or two NaN values match?

"na", the default, treats two NA or two NaN values as equal, like %in%, match(), and merge().
"never" treats two NA or two NaN values as different, and will never match them together or to any other values. This is similar to joins for database sources and to base::merge(incomparables = NA).

multiple

Handling of rows in x with multiple matches in y. For each row of x:

"all", the default, returns every match detected in y. This is the same behavior as SQL.
"any" returns one match detected in y, with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
"first" returns the first match detected in y.
"last" returns the last match detected in y.

unmatched

How should unmatched keys that would result in dropped rows be handled?

"drop" drops unmatched keys from the result.
"error" throws an error if unmatched keys are detected.

unmatched is intended to protect you from accidentally dropping rows during a join. It only checks for unmatched keys in the input that could potentially drop rows.

For left joins, it checks y.
For right joins, it checks x.
For inner joins, it checks both x and y. In this case, unmatched is also allowed to be a character vector of length 2 to specify the behavior for x and y independently.

relationship

Handling of the expected relationship between the keys of x and y. If the expectations chosen from the list below are invalidated, an error is thrown.

NULL, the default, doesn't expect there to be any relationship between x and y. However, for equality joins it will check for a many-to-many relationship (which is typically unexpected) and will warn if one occurs, encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying "many-to-many".

See the Many-to-many relationships section for more details.
"one-to-one" expects:
- Each row in x matches at most 1 row in y.
- Each row in y matches at most 1 row in x.
"one-to-many" expects:
- Each row in y matches at most 1 row in x.
"many-to-one" expects:
- Each row in x matches at most 1 row in y.
"many-to-many" doesn't perform any relationship checks, but is provided to allow you to be explicit about this relationship if you know it exists.

relationship doesn't handle cases where there are zero matches. For that, see unmatched.

Value

a modified ped object (except for do)

A formula special used to handle cumulative effect specifications

Description

Can be used in the second part of the formula specification provided to sim_pexp and should only be used in this context.

Usage

fcumu(..., by = NULL, f_xyz, ll_fun)

Extract transition information from different objects

Description

Extract transition information from different objects

Usage

from_to_pairs(t_mat, ...)

from_to_pairs2(t_mat, ...)

## S3 method for class 'data.frame'
from_to_pairs(t_mat, from_col = "from", to_col = "to", ...)

Arguments

t_mat

an object that contains information about possible transitions.

from_col

The name of the column in the data frame that contains "from" states.

to_col

The name of the column in the data frame that contains "to" states.

Examples

## Not run: 
df = data.frame(id = c(1,1, 2,2), from = c(1, 1, 2, 2), to = c(2, 3, 2, 2))
from_to_pairs(df)

## End(Not run)

(Cumulative) (Step-) Hazard Plots.

Description

geom_hazard is an extension of the geom_line, and is optimized for (cumulative) hazard plots. Essentially, it adds a (0,0) row to the data, if not already the case. Stolen from the RmcdrPlugin.KMggplot2 (slightly modified).

Usage

geom_hazard(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)

geom_stephazard(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  direction = "vh",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)

geom_surv(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)

Arguments

mapping

Set of aesthetic mappings created by aes(). If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping.

data

The data to be displayed in this layer. There are three options:

If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().

A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data. A function can be created from a formula (e.g. ~ head(.x, 10)).

stat

The statistical transformation to use on the data for this layer. When using a ⁠geom_*()⁠ function to construct a layer, the stat argument can be used to override the default coupling between geoms and stats. The stat argument accepts the following:

A Stat ggproto subclass, for example StatCount.
A string naming the stat. To give the stat as a string, strip the function name of the stat_ prefix. For example, to use stat_count(), give the stat as "count".
For more information and other ways to specify the stat, see the layer stat documentation.

position

A position adjustment to use on the data for this layer. This can be used in various ways, including to prevent overplotting and improving the display. The position argument accepts the following:

The result of calling a position function, such as position_jitter(). This method allows for passing extra arguments to the position.
A string naming the position adjustment. To give the position as a string, strip the function name of the position_ prefix. For example, to use position_jitter(), give the position as "jitter".
For more information and other ways to specify the position, see the layer position documentation.

na.rm

If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.

show.legend

logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. It can also be a named logical vector to finely select the aesthetics to display. To include legend keys for all levels, even when no data exists, use TRUE. If NA, all levels are shown in legend, but unobserved levels are omitted.

inherit.aes

If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. annotation_borders().

...

Other arguments passed on to layer()'s params argument. These arguments broadly fall into one of 4 categories below. Notably, further arguments to the position argument, or aesthetics that are required can not be passed through .... Unknown arguments that are not part of the 4 categories below are ignored.

Static aesthetics that are not mapped to a scale, but are at a fixed value and apply to the layer as a whole. For example, colour = "red" or linewidth = 3. The geom's documentation has an Aesthetics section that lists the available options. The 'required' aesthetics cannot be passed on to the params. Please note that while passing unmapped aesthetics as vectors is technically possible, the order and required length is not guaranteed to be parallel to the input data.
When constructing a layer using a ⁠stat_*()⁠ function, the ... argument can be used to pass on parameters to the geom part of the layer. An example of this is stat_density(geom = "area", outline.type = "both"). The geom's documentation lists which parameters it can accept.
Inversely, when constructing a layer using a ⁠geom_*()⁠ function, the ... argument can be used to pass on parameters to the stat part of the layer. An example of this is geom_area(stat = "density", adjust = 0.5). The stat's documentation lists which parameters it can accept.
The key_glyph argument of layer() may also be passed on through .... This can be one of the functions described as key glyphs, to change the display of the layer in the legend.

direction

direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values.

Examples

library(ggplot2)
library(pammtools)
ped <- tumor[10:50,] %>% as_ped(Surv(days, status)~1)
pam <- mgcv::gam(ped_status ~ s(tend), data=ped, family = poisson(), offset = offset)
ndf <- make_newdata(ped, tend = unique(tend)) %>% add_hazard(pam)
# piece-wise constant hazards
ggplot(ndf, aes(x = tend, y = hazard)) +
 geom_vline(xintercept = c(0, ndf$tend[c(1, (nrow(ndf)-2):nrow(ndf))]), lty = 3) +
 geom_hline(yintercept = c(ndf$hazard[1:3], ndf$hazard[nrow(ndf)]), lty = 3) +
 geom_stephazard() +
 geom_step(col=2) +
 geom_step(col=2, lty = 2, direction="vh")

# comulative hazard
ndf <- ndf %>% add_cumu_hazard(pam)
ggplot(ndf, aes(x = tend, y = cumu_hazard)) +
 geom_hazard() +
 geom_line(col=2) # doesn't start at (0, 0)

# survival probability
ndf <- ndf %>% add_surv_prob(pam)
ggplot(ndf, aes(x = tend, y = surv_prob)) +
 geom_surv() +
 geom_line(col=2) # doesn't start at c(0,1)

Step ribbon plots.

Description

geom_stepribbon is an extension of the geom_ribbon, and is optimized for Kaplan-Meier plots with pointwise confidence intervals or a confidence band. The default direction-argument "hv" is appropriate for right-continuous step functions like the hazard rates etc returned by pammtools.

Usage

geom_stepribbon(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  direction = "hv",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...
)

Arguments

mapping

data

The data to be displayed in this layer. There are three options:

If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().

A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.

stat

A Stat ggproto subclass, for example StatCount.
A string naming the stat. To give the stat as a string, strip the function name of the stat_ prefix. For example, to use stat_count(), give the stat as "count".
For more information and other ways to specify the stat, see the layer stat documentation.

position

A position adjustment to use on the data for this layer. This can be used in various ways, including to prevent overplotting and improving the display. The position argument accepts the following:

The result of calling a position function, such as position_jitter(). This method allows for passing extra arguments to the position.
A string naming the position adjustment. To give the position as a string, strip the function name of the position_ prefix. For example, to use position_jitter(), give the position as "jitter".
For more information and other ways to specify the position, see the layer position documentation.

direction

direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values.

na.rm

If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.

show.legend

inherit.aes

...

Static aesthetics that are not mapped to a scale, but are at a fixed value and apply to the layer as a whole. For example, colour = "red" or linewidth = 3. The geom's documentation has an Aesthetics section that lists the available options. The 'required' aesthetics cannot be passed on to the params. Please note that while passing unmapped aesthetics as vectors is technically possible, the order and required length is not guaranteed to be parallel to the input data.
When constructing a layer using a ⁠stat_*()⁠ function, the ... argument can be used to pass on parameters to the geom part of the layer. An example of this is stat_density(geom = "area", outline.type = "both"). The geom's documentation lists which parameters it can accept.
Inversely, when constructing a layer using a ⁠geom_*()⁠ function, the ... argument can be used to pass on parameters to the stat part of the layer. An example of this is geom_area(stat = "density", adjust = 0.5). The stat's documentation lists which parameters it can accept.
The key_glyph argument of layer() may also be passed on through .... This can be one of the functions described as key glyphs, to change the display of the layer in the legend.

Examples

library(ggplot2)
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
h <- ggplot(huron, aes(year))
h + geom_stepribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
    geom_step(aes(y = level))
h + geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
    geom_line(aes(y = level))

Extract the (Bayesian) covariance matrix of the model coefficients

Description

Returns the covariance matrix that matches the coefficients returned by get_coefs. For mgcv models this is the Bayesian posterior covariance matrix object$Vp, for scam models the covariance matrix of the re-parametrized coefficients object$Vp.t and vcov(object) otherwise.

Usage

get_Vp(object, ...)

## Default S3 method:
get_Vp(object, ...)

## S3 method for class 'gam'
get_Vp(object, ...)

## S3 method for class 'scam'
get_Vp(object, ...)

Arguments

object

A fitted model object.

...

Further arguments passed to methods.

Calculate CIF for one cause

Description

Internal generic dispatching CIF calculation based on the model class.

Usage

get_cif(newdata, object, ...)

## Default S3 method:
get_cif(
  newdata,
  object,
  ci,
  time_var,
  interval_length = "intlen",
  alpha,
  nsim,
  cause_var,
  ...
)

Arguments

newdata

A data frame of new observations, typically created via make_newdata().

object

A fitted model object. The method is dispatched on this argument.

...

Additional arguments passed to the respective method.

Value

A data frame with CIF estimates appended.

Extract model coefficients on the scale of the design matrix

Description

Returns the coefficient vector coefs such that make_X(object, newdata) %*% coefs yields the linear predictor. For most models this is simply coef(object). For scam models, however, coef() returns the coefficients on the underlying unconstrained scale, while the linear predictor is calculated from the re-parametrized (partially exponentiated) coefficients object$coefficients.t.

Usage

get_coefs(object, ...)

## Default S3 method:
get_coefs(object, ...)

## S3 method for class 'scam'
get_coefs(object, ...)

Arguments

object

A fitted model object.

...

Further arguments passed to methods.

Extract cumulative coefficients (cumulative hazard differences)

Description

These functions are designed to extract (or mimic) the cumulative coefficients usually used in additive hazards models (Aalen model) to depict (time-varying) covariate effects. For PAMMs, these are the differences between the cumulative hazard rates where all covariates except one have the identical values. For a numeric covariate of interest, this calculates \Lambda(t|x+1) - \Lambda(t|x). For non-numeric covariates the cumulative hazard of the reference level is subtracted from the cumulative hazards evaluated at all non reference levels. Standard errors are calculated using the delta method.

Usage

get_cumu_coef(model, data = NULL, terms, ...)

## S3 method for class 'gam'
get_cumu_coef(
  model,
  data,
  terms,
  time_var = "tend",
  interval_length = "intlen",
  ...
)

## S3 method for class 'scam'
get_cumu_coef(
  model,
  data,
  terms,
  time_var = "tend",
  interval_length = "intlen",
  ...
)

## S3 method for class 'aalen'
get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)

## S3 method for class 'cox.aalen'
get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)

Arguments

model

Object from which to extract cumulative coefficients.

data

Additional data if necessary.

terms

A character vector of variables for which the cumulative coefficient should be calculated.

...

Further arguments passed to methods.

time_var

Name of the evaluation time variable in data. Defaults to "tend".

interval_length

Name of the interval-length variable in data. Defaults to "intlen".

ci

Logical. Indicates if confidence intervals should be returned as well.

Calculate (or plot) cumulative effect for all time-points of the follow-up

Description

Calculate (or plot) cumulative effect for all time-points of the follow-up

Usage

get_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2)

gg_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2, ci = TRUE)

Arguments

data

Data used to fit the model.

model

A suitable model object which will be used to estimate the partial effect of term.

term

A character string indicating the model term for which partial effects should be plotted.

z1

The exposure profile for which to calculate the cumulative effect. Can be either a single number or a vector of same length as unique observation time points.

z2

If provided, calculated cumulative effect is for the difference between the two exposure profiles (g(z1,t)-g(z2,t)).

se_mult

Multiplicative factor used to calculate confidence intervals (e.g., lower = fit - 2*se).

ci

Logical. Indicates if confidence intervals for the term of interest should be calculated/plotted. Defaults to TRUE.

Calculate cumulative hazard

Description

Calculate cumulative hazard

Usage

get_cumu_hazard(
  newdata,
  object,
  ci = TRUE,
  ci_type = c("default", "delta", "sim"),
  time_var = NULL,
  se_mult = 2,
  interval_length = "intlen",
  nsim = 100L,
  ...
)

Arguments

newdata

object

a fitted gam object as produced by gam().

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

ci_type

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

se_mult

Factor by which standard errors are multiplied for calculating the confidence intervals.

interval_length

The variable in newdata containing the interval lengths. Can be either bare unquoted variable name or character. Defaults to "intlen".

nsim

Total number of pooled posterior draws used for the interval.

...

Further arguments passed to predict.gam and get_hazard

Expand time-dependent covariates to functionals

Description

Given formula specification on how time-dependent covariates affect the outcome, creates respective functional covariate as well as auxiliary matrices for time/latency etc.

Usage

get_cumulative(data, formula)

expand_cumulative(data, func, n_func)

Arguments

data

Data frame (or similar) in which variables specified in ... will be looked for

formula

A formula containing cumulative specials, that specify the type of cumulative effect one wants to estimate. For details see the vignettes on data transformation and time-dependent covariates.

func

Single evaluated cumulative term.

Obtain interval break points

Description

Default method words for data frames. The list method applies the default method to each data set within the list.

Usage

get_cut(data, formula, cut = NULL, ...)

## Default S3 method:
get_cut(data, formula, cut = NULL, max_time = NULL, event = 1L, ...)

## S3 method for class 'list'
get_cut(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  event = 1L,
  timescale = "gap",
  ...
)

Exctract event types

Description

Given a formula that specifies the status variable of the outcome, this function extracts the different event types (except for censoring, specified by censor_code).

Usage

get_event_types(data, formula, censor_code)

Arguments

data

formula

censor_code

Specifies the value of the status variable that indicates censoring. Often this will be 0, which is the default.

Point hazard predictor (backend primitive)

Description

Returns the predicted hazard (response scale) as a plain numeric vector, one value per row of newdata. Together with sim_hazard this is the only primitive a new estimation backend must provide: every derived quantity (cumulative hazard, survival probability, CIF, transition probabilities) and its simulation-based confidence intervals are built from these two. Analytic ("default"/"delta") confidence intervals additionally use make_X/get_coefs/get_Vp.

Usage

get_hazard(object, newdata, ...)

## Default S3 method:
get_hazard(object, newdata, ...)

Arguments

object

A fitted model object.

newdata

A data frame for which the hazard is predicted.

...

Further arguments passed to methods.

Value

A numeric vector of hazards on the response scale.

Information on intervals in which times fall

Description

Information on intervals in which times fall

Usage

get_intervals(x, times, ...)

## Default S3 method:
get_intervals(x, times, left.open = TRUE, rightmost.closed = TRUE, ...)

Arguments

x

An object from which interval information can be obtained, see int_info.

times

A vector of times for which corresponding interval information should be returned.

...

Further arguments passed to findInterval.

left.open

logical; if true all the intervals are open at left and closed at right; in the formulas below, \le should be swapped with < (and > with \ge), and rightmost.closed means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations.

rightmost.closed

logical; if true, the rightmost interval, vec[N-1] .. vec[N] is treated as closed, see below.

Value

A data.frame containing information on intervals in which values of times fall.

Examples

set.seed(111018)
brks <- c(0, 4.5, 5, 10, 30)
int_info(brks)
x <- runif (3, 0, 30)
x
get_intervals(brks, x)

Construct or extract data that represents a lag-lead window

Description

Constructs lag-lead window data set from raw inputs or from data objects with suitable information stored in attributes, e.g., objects created by as_ped.

Usage

get_laglead(x, ...)

## Default S3 method:
get_laglead(x, tz, ll_fun, ...)

## S3 method for class 'data.frame'
get_laglead(x, ...)

Arguments

x

Either a numeric vector of follow-up cut points or a suitable object.

...

Further arguments passed to methods.

tz

A vector of exposure times

ll_fun

Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure.

Examples

get_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
gg_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})

Extract variables from the left-hand-side of a formula

Description

Extract variables from the left-hand-side of a formula

Extract variables from the right-hand side of a formula

Usage

get_lhs_vars(formula)

get_rhs_vars(formula)

Arguments

formula

A formula object.

Extract variables from the left-hand-side of a formula

Description

Extract variables from the left-hand-side of a formula

Extract variables from the right-hand side of a formula

Usage

get_ped_form(
  formula,
  data = NULL,
  tdc_specials = c("concurrent", "cumulative")
)

Arguments

formula

A formula object.

Extract plot information for all special model terms

Description

Given a mgcv gamObject (or a scam object), returns the information used for the default plots produced by plot.gam (plot.scam, respectively).

Usage

get_plotinfo(x, ...)

Arguments

x

a fitted gam object as produced by gam().

...

Further arguments passed to plot.gam

Calculate simulation based confidence intervals

Description

These helpers draw the simulated hazard trajectories once for the whole (possibly grouped) newdata via sim_hazard – so one set of draws is shared across groups – and then summarise them into pointwise quantile intervals. Cumulative quantities are accumulated within each group of newdata.

Usage

get_sim_ci(newdata, object, alpha = 0.05, nsim = 100L, ...)

Enumerate plottable univariate smooth terms of a fitted model

Description

Internal helper. Given the model data and a fitted gam object, returns a tibble with one row per smooth curve to be drawn by get_terms / gg_smooth. Only smooths that vary over exactly one numeric covariate are returned. This includes ordinary 1d smooths (s(), 1d ti()), by-variable smooths (s(x, by = z)) and factor-smooth interactions (s(x, fac, bs = "fs"), s(x, fac, bs = "sz")). Tensor and multivariate smooths (te(), t2(), 2d ti(), s(x, z)) as well as (correlated) random effects (bs = "re", bs = "mrf") are excluded – use gg_tensor / gg_re for those.

Usage

get_smooth_terms(data, fit)

Arguments

data

A data frame containing the variables used to fit the model.

fit

A fitted model object.

Details

Smooths that are indexed by a factor (a factor by-variable or the factor in an fs/sz interaction) are expanded into one row per factor level, all sharing the same facet so that gg_smooth can draw them in a single panel, distinguished by colour/fill.

Returns NULL for models without a $smooth component (e.g. coxph), in which case get_terms falls back to label-based extraction.

Value

A tibble with columns facet, level, var, col and the list-column settings, or NULL.

Calculate survival probabilities

Description

Calculate survival probabilities

Usage

get_surv_prob(
  newdata,
  object,
  ci = TRUE,
  ci_type = c("default", "delta", "sim"),
  se_mult = 2L,
  time_var = NULL,
  interval_length = "intlen",
  nsim = 100L,
  ...
)

Arguments

newdata

object

a fitted gam object as produced by gam().

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

se_mult

Factor by which standard errors are multiplied for calculating the confidence intervals.

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

interval_length

The variable in newdata containing the interval lengths. Can be either bare unquoted variable name or character. Defaults to "intlen".

nsim

Total number of pooled posterior draws used for the interval.

...

Further arguments passed to predict.gam and get_hazard

Extract variables from the left-hand-side of a formula

Description

Extract variables from the left-hand-side of a formula

Extract variables from the right-hand side of a formula

Usage

get_tdc_form(
  formula,
  data = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  invert = FALSE
)

Arguments

formula

A formula object.

Extract variables from the left-hand-side of a formula

Description

Extract variables from the left-hand-side of a formula

Extract variables from the right-hand side of a formula

Usage

get_tdc_vars(formula, specials = "cumulative", data = NULL)

Arguments

formula

A formula object.

Extract the partial effect of a single smooth curve

Description

Extract the partial effect of a single smooth curve

Usage

get_term(data, fit, spec, n = 100, conf_level = 0.95, ...)

Arguments

data

A data frame containing variables used to fit the model. The first row is used as the basis for all covariates other than the one being varied (their values are irrelevant for the term-wise contribution).

fit

A fitted object of class gam.

spec

A single-row tibble (one row of get_smooth_terms) describing the curve to extract.

n

Number of points at which to evaluate the smooth over the range of its covariate.

conf_level

The confidence level for the pointwise confidence interval.

...

Further arguments (currently unused).

Extract a partial effect for models without a `$smooth` component

Description

Fallback used for fits such as coxph that support predict(type = "terms") but expose no mgcv smooth metadata. Matching is anchored to the variable name (exact, or as a parenthesised argument such as pspline(karno)) rather than an unanchored substring.

Usage

get_term_legacy(data, fit, term, n = 100, conf_level = 0.95, ...)

Arguments

data

fit

A fitted object of class gam.

term

A character string naming the model term/variable.

n

Number of points at which to evaluate the smooth over the range of its covariate.

conf_level

The confidence level for the pointwise confidence interval.

...

Further arguments (currently unused).

Extract the partial effects of univariate smooth model terms

Description

Creates, for each requested univariate smooth, a sequence over the range of the smooth's numeric covariate, evaluates the term-wise contribution via predict(fit, newdata = ., type = "terms") and stacks the results into a tidy data frame.

Usage

get_terms(data, fit, terms = NULL, ...)

Arguments

data

fit

A fitted object of class gam.

terms

A character vector (can be length one) specifying the terms for which partial effects will be returned. If NULL (the default) all univariate smooth terms in the model are used.

...

Further arguments controlling extraction, passed on per term, e.g. n (number of evaluation points) and conf_level.

Details

For gam fits the requested terms are matched against the model's smooths (see get_smooth_terms): a bare variable name (e.g. "tend") selects every univariate smooth over that variable – the main effect s(tend) as well as any s(tend, by = ...) or factor-smooth interaction – while an exact smooth label (e.g. "s(tend)") selects a single smooth. Names that do not match any smooth (for example parametric factor main effects) are skipped with a warning; use gg_fixed for those. For factor-indexed smooths one curve per factor level is returned, identified by the level column.

For models without mgcv smooth metadata (e.g. coxph) terms must be supplied and is matched against the columns of predict(type = "terms").

Value

A tibble with columns term, x, level, eff, se, ci_lower and ci_upper.

Examples

library(survival)
fit <- coxph(Surv(time, status) ~ pspline(karno) + pspline(age), data=veteran)
terms_df <- veteran %>% get_terms(fit, terms = c("karno", "age"))
head(terms_df)
tail(terms_df)

Forrest plot of fixed coefficients

Description

Given a model object, returns a data frame with columns variable, coef (coefficient), ci_lower (lower 95\ ci_upper (upper 95\

Usage

gg_fixed(x, intercept = FALSE, ...)

Arguments

x

A model object.

intercept

Logical, indicating whether intercept term should be included. Defaults to FALSE.

...

Currently not used.

Examples

g <- mgcv::gam(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
 data=iris)
gg_fixed(g, intercept=TRUE)
gg_fixed(g)

Plot Lag-Lead windows

Description

Given data defining a Lag-lead window, returns respective plot as a ggplot2 object.

Usage

gg_laglead(x, ...)

## Default S3 method:
gg_laglead(x, tz, ll_fun, ...)

## S3 method for class 'LL_df'
gg_laglead(
  x,
  high_col = "grey20",
  low_col = "whitesmoke",
  grid_col = "lightgrey",
  ...
)

## S3 method for class 'nested_fdf'
gg_laglead(x, ...)

Arguments

x

Either a numeric vector of follow-up cut points or a suitable object.

...

Further arguments passed to methods.

tz

A vector of exposure times

ll_fun

Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure.

high_col

Color used to highlight exposure times within the lag-lead window.

low_col

Color of exposure times outside the lag-lead window.

grid_col

Color of grid lines.

Examples

## Example 1: supply t, tz, ll_fun directly
 gg_laglead(1:10, tz=-5:5,
  ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})

## Example 2: extract information on t, tz, ll_from data with respective attributes
data("simdf_elra", package = "pammtools")
gg_laglead(simdf_elra)

Visualize effect estimates for specific covariate combinations

Description

Depending on the plot function and input, creates either a 1-dimensional slices, bivariate surface or (1D) cumulative effect.

Usage

gg_partial(data, model, term, ..., reference = NULL, ci = TRUE)

gg_partial_ll(
  data,
  model,
  term,
  ...,
  reference = NULL,
  ci = FALSE,
  time_var = "tend"
)

get_partial_ll(
  data,
  model,
  term,
  ...,
  reference = NULL,
  ci = FALSE,
  time_var = "tend"
)

Arguments

data

Data used to fit the model.

model

A suitable model object which will be used to estimate the partial effect of term.

term

A character string indicating the model term for which partial effects should be plotted.

...

Covariate specifications (expressions) that will be evaluated by looking for variables in x. Must be of the form z = f(z) where z is a variable in the data set and f a known function that can be usefully applied to z. Note that this is also necessary for single value specifications (e.g. age = c(50)). For data in PED (piece-wise exponential data) format, one can also specify the time argument, but see "Details" an "Examples" below.

reference

If specified, should be a list with covariate value pairs, e.g. list(x1 = 1, x2=50). The calculated partial effect will be relative to an observation specified in reference.

ci

Logical. Indicates if confidence intervals for the term of interest should be calculated/plotted. Defaults to TRUE.

time_var

The name of the variable that was used in model to represent follow-up time.

Plot Normal QQ plots for random effects

Description

Plot Normal QQ plots for random effects

Usage

gg_re(x, ...)

Arguments

x

a fitted gam object as produced by gam().

...

Further arguments passed to plot.gam

Examples

library(pammtools)
data("patient")
ped <- patient %>%
 dplyr::slice(1:100) %>%
 as_ped(Surv(Survdays, PatientDied)~ ApacheIIScore + CombinedicuID, id="CombinedID")
pam <- mgcv::gam(ped_status ~ s(tend) + ApacheIIScore + s(CombinedicuID, bs="re"),
 data=ped, family=poisson(), offset=offset)
gg_re(pam)
plot(pam, select = 2)

Plot 1D (smooth) effects

Description

Flexible, high-level plotting function for (non-linear) effects conditional on further covariate specifications and potentially relative to a comparison specification.

Usage

gg_slice(data, model, term, ..., reference = NULL, ci = TRUE)

Arguments

data

Data used to fit the model.

model

A suitable model object which will be used to estimate the partial effect of term.

term

A character string indicating the model term for which partial effects should be plotted.

...

reference

If specified, should be a list with covariate value pairs, e.g. list(x1 = 1, x2=50). The calculated partial effect will be relative to an observation specified in reference.

ci

Logical. Indicates if confidence intervals for the term of interest should be calculated/plotted. Defaults to TRUE.

Examples

ped <- tumor[1:200, ] %>% as_ped(Surv(days, status) ~ . )
model <- mgcv::gam(ped_status~s(tend) + s(age, by = complications), data=ped,
  family = poisson(), offset=offset)
make_newdata(ped, age = seq_range(age, 20), complications = levels(complications))
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications))
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications),
 ci = FALSE)
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications),
  reference=list(age = 50))

Plot smooth 1d terms of gam objects

Description

Given a gam model this convenience function returns a plot of its univariate smooth terms. If terms is not specified, all univariate smooths are plotted; otherwise only the requested ones (see get_terms for how terms are matched). Different smooths are faceted. Smooths that are indexed by a factor – a factor by-variable or a factor-smooth interaction (bs = "fs"/"sz") – are drawn in a single facet with one coloured/filled curve per factor level.

Usage

gg_smooth(x, ...)

## Default S3 method:
gg_smooth(x, fit, ...)

Arguments

x

A data frame or object of class ped.

...

Further arguments passed to get_terms (e.g. terms).

fit

A model object.

Value

A ggplot object.

Examples

g1 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width) + s(Petal.Length), data=iris)
gg_smooth(iris, g1, terms=c("Sepal.Width", "Petal.Length"))
# all univariate smooths (terms omitted)
gg_smooth(iris, g1)
# factor-by smooth: one coloured curve per Species
g2 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width, by = Species), data = iris)
gg_smooth(iris, g2, terms = "Sepal.Width")

Plot State Occupation Probabilities

Description

Creates a stacked area plot of state occupation probabilities over time, computed from transition probability matrices stored as an attribute of the input data. Optionally facets by a grouping variable.

Usage

gg_state_occupation(
  newdata,
  init_state,
  group_var = NULL,
  time_var = "tend",
  ncol = NULL
)

Arguments

newdata

A data frame with an attribute matrix containing a data frame with a column trans_prob_matrix. Each element of trans_prob_matrix should be a 3-dimensional array of dimensions n_states x n_states x n_timepoints.

init_state

A numeric vector specifying the initial state distribution. Should sum to 1 and have length equal to the number of states. For example, c(0, 1, 0, 0) places all subjects in state 2 at baseline.

group_var

A character string giving the name of the column in newdata to facet by (e.g., "treat"). If NULL (default), no faceting is applied.

time_var

A character string giving the name of the time variable in newdata. Defaults to "tend".

ncol

An integer specifying the number of columns in the facet wrap. If NULL (default), defaults to the number of unique groups.

Value

A ggplot object showing stacked-area state occupation probabilities over time, optionally faceted by group_var.

Plot tensor product effects

Description

Given a gam model this convenience function returns a ggplot2 object depicting 2d smooth terms specified in the model as heat/contour plots. If more than one 2d smooth term is present individual terms are faceted.

Usage

gg_tensor(x, ci = FALSE, ...)

Arguments

x

a fitted gam object as produced by gam().

ci

A logical value indicating whether confidence intervals should be calculated and returned. Defaults to TRUE.

...

Further arguments passed to plot.gam

Examples

g <- mgcv::gam(Sepal.Length ~ te(Sepal.Width, Petal.Length), data=iris)
gg_tensor(g)
gg_tensor(g, ci=TRUE)
gg_tensor(update(g, .~. + te(Petal.Width, Petal.Length)))

Checks if data contains timd-dependent covariates

Description

Checks if data contains timd-dependent covariates

Usage

has_tdc(data, id_var)

Arguments

data

A data frame (potentially) containing time-dependent covariates.

id_var

A character indicating the grouping variable. For each covariate it will be checked if their values change within a group specified by id_var.

Value

Logical. TRUE if data contains time-dependent covariates, else FALSE.

Analytic hazard with confidence interval (coefficient models)

Description

Adds a hazard column and, for ci_type "default"/ "delta", se/ci_lower/ci_upper, using the linear-predictor triplet make_X/get_coefs/get_Vp. This is the analytic CI path (also used to evaluate reference hazard ratios and type = "link"). Simulation-based CIs instead use the get_hazard + sim_hazard primitives.

Usage

hazard_ci(
  object,
  newdata,
  reference = NULL,
  ci = TRUE,
  type = c("response", "link"),
  ci_type = c("default", "delta", "sim"),
  time_var = NULL,
  se_mult = 2,
  ...
)

Arguments

object

a fitted gam object as produced by gam().

newdata

reference

A data frame with number of rows equal to nrow(newdata) or one, or a named list with (partial) covariate specifications. See examples.

ci

logical. Indicates if confidence intervals should be calculated. Defaults to TRUE.

type

Either "response" or "link". The former calculates hazard, the latter the log-hazard.

ci_type

time_var

Name of the variable used for the baseline hazard. Defaults to "tend".

se_mult

Factor by which standard errors are multiplied for calculating the confidence intervals.

...

Further arguments passed to predict.gam and get_hazard

Build the subject-by-interval prediction grid used for IC imputation

Description

Constructs the (subjects \times intervals) grid on the fixed cut-points and evaluates the lpmatrix of the fitted PAMM once, so that across imputations only the linear predictor (and hence the hazard) needs to be recomputed for a new coefficient draw. Rows are subject-major: the first n_int rows belong to subject 1, the next n_int to subject 2, etc., so that matrix(h, nrow = n_int) has one column per subject.

Usage

ic_pred_cache(object, ic, cut, cause_levels = NULL, cause_var = "cause")

Arguments

object

A fitted pamm model used as imputation model.

ic

A data frame as returned by parse_ic_surv (subject- level, with covariates and ic_L/ic_R/ic_kind).

cut

The fixed vector of interval cut-points (shared across imputations).

cause_levels

Optional character vector of competing-risk cause levels. When supplied, one lpmatrix per cause is built and returned in X_list.

cause_var

Name of the cause column expected by the model.

Value

A list with the interval information ii, n_int, n_sub, and either a single design matrix X or a list X_list (competing risks).

Draw event times and causes for interval-censored competing-risks subjects

Description

Draws the event time from the all-cause conditional hazard within (L, R] (as in impute_ic_times) and assigns a cause. If the cause is observed it is retained (the time is then drawn by a rejection step so that it follows the cause-specific conditional density); if the cause is unknown it is sampled with probability h_k(T)/h_\bullet(T) at the imputed time, mirroring the cause-assignment in sim_pexp and the CIF increment in get_cif.

Usage

impute_ic_cr(object, ic, cut, beta = NULL, cache = NULL, cause_known = NULL)

Arguments

object

A fitted pamm model used as imputation model.

ic

A data frame as returned by parse_ic_surv (subject- level, with covariates and ic_L/ic_R/ic_kind).

cut

The fixed vector of interval cut-points (shared across imputations).

beta

Coefficient vector to evaluate the hazard at. Defaults to coef(object); pass a posterior draw for proper multiple imputation.

cache

Optional pre-built cache from ic_pred_cache; avoids recomputing the (expensive) design matrix across imputations.

cause_known

Optional vector (length nrow(ic)) of observed causes (as levels of cache$cause_levels); NA marks unknown cause. Censored and exact rows are ignored.

Value

A list with numeric time and character cause (both length nrow(ic); cause is NA for censored rows).

Draw event times for interval-censored subjects from the conditional hazard

Description

For a fitted PAMM with piecewise-constant hazard, draws T_i \sim p(T \mid L_i < T \le R_i, x_i, \theta) by inverting the cumulative-hazard increment between L_i and R_i. Exact and right-censored observations are returned unchanged (right-censored subjects are not imputed: they contribute correctly as censored at ic_L).

Usage

impute_ic_times(object, ic, cut, beta = NULL, cache = NULL)

Arguments

object

A fitted pamm model used as imputation model.

ic

A data frame as returned by parse_ic_surv (subject- level, with covariates and ic_L/ic_R/ic_kind).

cut

The fixed vector of interval cut-points (shared across imputations).

beta

Coefficient vector to evaluate the hazard at. Defaults to coef(object); pass a posterior draw for proper multiple imputation.

cache

Optional pre-built cache from ic_pred_cache; avoids recomputing the (expensive) design matrix across imputations.

Value

Numeric vector of (possibly imputed) event times, length nrow(ic).

Create start/end times and interval information

Description

Given interval breaks points, returns data frame with information on interval start time, interval end time, interval length and a factor variable indicating the interval (left open intervals). If an object of class ped is provided, extracts unique interval information from object.

Usage

int_info(x, ...)

## Default S3 method:
int_info(x, min_time = 0L, ...)

## S3 method for class 'data.frame'
int_info(x, min_time = 0L, ...)

## S3 method for class 'ped'
int_info(x, ...)

## S3 method for class 'pamm'
int_info(x, ...)

Arguments

x

A numeric vector of cut points in which the follow-up should be partitioned in or object of class ped.

...

Currently ignored.

min_time

Only intervals that have lower borders larger than this value will be included in the resulting data frame.

Value

A data frame containing the start and end times of the intervals specified by the x argument. Additionally, the interval length, interval mid-point and a factor variable indicating the intervals.

Examples

## create interval information from cut points
int_info(c(1, 2.3, 5))

## extract interval information used to create ped object
tdf <- data.frame(time=c(1, 2.3, 5), status=c(0, 1, 0))
ped <- tdf %>% as_ped(Surv(time, status)~., id="id")
int_info(ped)

Detect, parse and transform interval-censored survival data

Description

Interval-censored (IC) data record the event time of subject i only up to an interval (L_i, R_i]. pammtools handles such data via multiple imputation (MI): exact event times are repeatedly drawn from the model-based conditional distribution and the resulting (exact) data sets are transformed and re-fit using the standard right-censored PAMM pipeline (see pamm_ic). The functions documented here implement the preprocessing building blocks of that workflow.

Usage

detect_ic(formula, data)

parse_ic_surv(formula, data, id = "id")

resolve_ic_cut(ic, cut = NULL, max_time = NULL)

ic_event_data(ic, t_imp)

drop_zero_followup(evd, warn = TRUE)

as_ped_ic(data, formula, cut = NULL, max_time = NULL, id = "id", ...)

Arguments

formula

A two-sided formula whose left-hand side is an interval- censored Surv object.

data

A data frame containing the variables referenced in formula.

id

Name of the subject identifier column. If it does not exist in data it is created as a row index.

ic

A data frame as returned by parse_ic_surv.

cut

Optional numeric vector of interval cut-points. If NULL (default) the finite interval endpoints are used.

max_time

Optional numeric scalar; cut-points are capped at this value.

t_imp

Numeric vector of imputed event times (length nrow(ic)). Ignored for exact and right-censored rows.

evd

A data frame with a .ped_time column (output of ic_event_data).

warn

Logical; emit a one-time warning when rows are dropped.

...

Details

IC data are specified through the standard survival interface, i.e. a three-argument response of the form Surv(L, R, type = "interval2"). The four observation types are encoded as usual:

exact: L = R (known event time).
right-censored: R = \infty (event after L).
left-censored: L = 0 (event in (0, R]).
interval-censored: 0 < L < R < \infty (event in (L, R]).

Functions

detect_ic(): Detect whether formula specifies interval-censored data. Returns "interval2" for interval-censored responses and "none" otherwise (right-censored and left-truncated counting-process responses both return "none" and are handled by the standard pipeline).
parse_ic_surv(): Parse the interval-censored response into a tibble of lower/upper bounds and observation type, augmenting data with the columns ic_L, ic_R and ic_kind (a factor with levels exact, right, left, interval) and, if absent, an id column.
resolve_ic_cut(): Resolve a fixed vector of interval cut-points for the IC transformation. When cut is supplied it is sorted and de-duplicated; otherwise the unique finite interval endpoints (the inspection times) are used, capped at max_time. The resolved cut must be shared across all imputations so that the PED interval structure is consistent across refits. Note that mgcv's centering constraints can still make the lpmatrix differ by fit.
ic_event_data(): Build the subject-level data frame of (exact) event times implied by an imputation. Exact observations keep their event time; right-censored observations are censored at ic_L; left- and interval-censored observations take the imputed time t_imp. Returns a data frame with the response columns .ped_time and .ped_status ready for split_data via a two-argument Surv.
drop_zero_followup(): Drop subjects with non-positive follow-up time (e.g.\ right-censored at time 0 with no observed inspection), which carry no information and would break the interval split. Returns the filtered data.
as_ped_ic(): Transform interval-censored data into an initial (midpoint-imputed) PED object. Left- and interval-censored event times are initialised at the interval midpoint ((L+R)/2, and R/2 for left-censored observations); this object is only an initialiser for pamm_ic and should not be used for inference on its own. The parsed interval bounds and the resolved cut-points are attached as the "ic" and "breaks" attributes.

Create design matrix from a suitable object

Description

Create design matrix from a suitable object

Usage

make_X(object, ...)

## Default S3 method:
make_X(object, newdata, ...)

## S3 method for class 'gam'
make_X(object, newdata, ...)

Arguments

object

A suitable object from which a design matrix can be generated. Often a model object.

newdata

A data frame from which design matrix will be constructed

Create design matrix from a suitable object

Description

Create design matrix from a suitable object

Usage

## S3 method for class 'scam'
make_X(object, newdata, ...)

Arguments

object

A suitable object from which a design matrix can be generated. Often a model object.

newdata

A data frame from which design matrix will be constructed

Construct a data frame suitable for prediction

Description

This functions provides a flexible interface to create a data set that can be plugged in as newdata argument to a suitable predict function (or similar). The function is particularly useful in combination with one of the add_* functions, e.g., add_term, add_hazard, etc.

Usage

make_newdata(x, ...)

## Default S3 method:
make_newdata(x, ...)

## S3 method for class 'ped'
make_newdata(x, ...)

## S3 method for class 'fped'
make_newdata(x, ...)

Arguments

x

A data frame (or object that inherits from data.frame).

...

Details

Depending on the type of variables in x, mean or modus values will be used for variables not specified in ellipsis (see also sample_info). If x is an object that inherits from class ped, useful data set completion will be attempted depending on variables specified in ellipsis. This is especially useful, when creating a data set with different time points, e.g. to calculate survival probabilities over time (add_surv_prob) or to calculate a time-varying covariate effects (add_term). To do so, the time variable has to be specified in ..., e.g., tend = seq_range(tend, 20). The problem with this specification is that not all values produced by seq_range(tend, 20) will be actual values of tend used at the stage of estimation (and in general, it will often be tedious to specify exact tend values). make_newdata therefore finds the correct interval and sets tend to the respective interval endpoint. For example, if the intervals of the PED object are (0,1], (1,2] then tend = 1.5 will be set to 2.

The returned data frame contains tend, id, the user-supplied covariates (and cause/transition for competing risks / multi-state models). Internal PED columns tstart, intlen, interval, offset, and ped_status are dropped. Downstream add_* functions reconstruct intlen on demand via reconstruct_intlen() when needed. See examples below.

Examples

# General functionality
tumor %>% make_newdata()
tumor %>% make_newdata(age=c(50))
tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55))
tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55))
# mean/modus values of unspecified variables are calculated over whole data
tumor %>% make_newdata(sex=unique(sex))
tumor %>% group_by(sex) %>% make_newdata()

# Examples for PED data
ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000))
ped %>% make_newdata(age=c(50, 55))

# if time information is specified, other time variables will be specified
# accordingly and offset calculated correctly
ped %>% make_newdata(tend = c(1000), age = c(50, 55))
ped %>% make_newdata(tend = unique(tend))
ped %>% group_by(sex) %>% make_newdata(tend = unique(tend))

# tend is set to the end point of respective interval:
ped <- tumor %>% as_ped(Surv(days, status)~.)
seq_range(ped$tend, 3)
make_newdata(ped, tend = seq_range(tend, 3))

Create matrix components for cumulative effects

Description

These functions are called internally by get_cumulative and should usually not be called directly.

Usage

make_time_mat(data, nz)

make_latency_mat(data, tz)

make_lag_lead_mat(data, tz, ll_fun = function(t, tz) t >= tz)

make_z_mat(data, z_var, nz, ...)

Arguments

data

A data set (or similar) from which meta information on cut-points, interval-specific time, covariates etc. can be obtained.

z_var

Which should be transformed into functional covariate format suitable to fit cumulative effects in mgcv::gam.

Calculate the modus

Description

Calculate the modus

Usage

modus(var)

Arguments

var

A atomic vector

Create nested data frame from data with time-dependent covariates

Description

Provides methods to nest data with time-dependent covariates (TDCs). A formula must be provided where the right hand side (RHS) contains the structure of the TDCs

Usage

nest_tdc(data, formula, ...)

## Default S3 method:
nest_tdc(data, formula, ...)

## S3 method for class 'list'
nest_tdc(data, formula, ...)

Arguments

data

A suitable data structure (e.g. unnested data frame with concurrent TDCs or a list where each element is a data frame, potentially containing TDCs as specified in the RHS of formula). Only TDCs present in formula will be returned.

formula

A two sided formula with a two part RHS, where the second part indicates the structure of the TDC structure.

...

Further arguments passed to methods.

Time until nuclear power plant construction in different regions.

Description

This dataset originates from IAEA and contains 730 power. The data contains the following variables:

months: Construction time
status: Event indicator (0 = censored, 1 = construction finished).
region: Continent, Africa/Asia, America, Europe, Soviet Union and Warsaw Pact

Usage

nuclear

Format

An object of class data.frame with 724 rows and 3 columns.

Fit a piece-wise exponential additive model

Description

A thin wrapper around gam, however, some arguments are prespecified: family=poisson() and offset=data$offset. These two can not be overwritten. In many cases it will also be advisable to set method="REML".

Usage

pamm(formula, data = list(), ..., trafo_args = NULL, engine = "gam")

is.pamm(x)

## S3 method for class 'pamm'
print(x, ...)

## S3 method for class 'pamm'
summary(object, ...)

## S3 method for class 'pamm'
plot(x, ...)

Arguments

formula

A GAM formula, or a list of formulae (see formula.gam and also gam.models). These are exactly like the formula for a GLM except that smooth terms, s, te, ti and t2, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these).

data

A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called.

...

Further arguments passed to engine.

trafo_args

Deprecated. A named list passed to as_ped for inline data transformation. Convert your data with as_ped() before calling pamm() instead.

engine

Character name of the function that will be called to fit the model. The intended entries are "gam" or "bam" (both from package mgcv) or "scam" (from package scam, for shape-constrained PAMMs, e.g. monotone baseline hazards).

x

Any R object.

object

An object of class pamm as returned by pamm.

Examples

ped <- tumor[1:100, ] %>%
 as_ped(Surv(days, status) ~ complications, cut = seq(0, 3000, by = 50))
pam <- pamm(ped_status ~ s(tend) + complications, data = ped)
summary(pam)
## Deprecated: trafo_args inline transformation (use as_ped() instead)
# ped2 <- as_ped(tumor[1:100, ], Surv(days, status) ~ complications)
# pamm(ped_status ~ s(tend) + complications, data = ped2)

Fit a competing-risks PAMM to interval-censored data via multiple imputation

Description

Competing-risks extension of pamm_ic. The event time is drawn from the all-cause conditional hazard within (L, R] and a cause is assigned: observed causes are retained (with the time drawn so that it follows the cause-specific conditional density, via rejection), unknown causes are sampled with probability proportional to the cause-specific hazards at the imputed time (see impute_ic_cr). Each completed data set is transformed with as_ped_cr (cause-specific hazards) and re-fit. Cf. Delord & Genin (2016) for MI of interval-censored competing-risks data.

Usage

pamm_ic_cr(
  formula,
  data,
  cause,
  model_formula = NULL,
  cut = NULL,
  max_time = NULL,
  m = 10L,
  iter = 1L,
  censor_code = 0L,
  id = "id",
  engine = "gam",
  ...
)

Arguments

formula

A two-sided formula whose left-hand side is an interval-censored response Surv(L, R, type = "interval2") and whose right-hand side lists the covariates to retain (as in as_ped).

data

A data frame in standard (one row per subject) format.

cause

Name of the column in data giving the observed cause for events (any factor/character coding). Rows with the censoring code are treated as right-censored; NA marks an event with unknown cause.

model_formula

Optional model formula passed to pamm (e.g.\ ped_status ~ s(tend) + x). If NULL, a default ped_status ~ s(tend) + <covariates> formula is constructed.

cut

Optional fixed vector of interval cut-points shared across all imputations. If NULL, the finite interval endpoints are used.

max_time

Optional cap on the cut-points.

m

Number of imputations (default 10).

iter

Number of impute-refit iterations per imputation chain (default 1 = classic one-step MI: all m imputations are drawn from the single initialiser fit). For iter = k > 1, each chain alternates imputation and re-fitting on its own completed data set k times, so later imputations are drawn from fits whose dependence on the midpoint initialiser is progressively attenuated – a sequential ("chained") MI scheme that progressively removes initialiser bias under sparse inspection, at roughly iter-fold fitting cost. Simulation evidence (see the package's interval-censoring benchmark): with inspection gaps that are small relative to the time scale, iter = 1 is unbiased; with wide gaps (mean gap of order 1/3 of the follow-up), early-time survival estimates from iter = 1 are biased upward and iter = 3 removes most of that bias (iter = 5 essentially all of it), with bias shrinking roughly geometrically in iter. Caveat: with flexible time-varying effect terms and small samples, iterating can occasionally amplify a weakly identified imputation chain into divergent estimates with very wide intervals (without mgcv warnings) – inspect pooled smooth effects for plausibility when iterating such models.

censor_code

Value of cause that encodes censoring (default 0).

id

Name of the subject identifier column.

engine

Estimation engine passed to pamm ("gam" or "bam").

...

Further arguments passed to pamm / mgcv.

Value

An object of class pamm_ic with type = "cr"; fits are cause-specific (stacked ped_cr) pamm objects and cause_levels records the competing causes.

Pooling of multiple-imputation PAMM fits

Description

Inference for interval-censored PAMMs (pamm_ic) pools the m re-fits by drawing from each fit's empirical-Bayes posterior N(\hat\beta^{(m)}, V_\beta^{(m)}) and propagating every draw through the quantity of interest using that fit's own design matrix, then taking empirical quantiles of the combined draws. Because mgcv's identifiability constraints make the (centered) spline basis depend on each imputed data set, the design matrix is not shared across fits, so each fit must be evaluated with its own lpmatrix. Before empirical quantiles are taken, the per-fit prediction draws are shifted on the quantity-of-interest scale so their between-imputation component has Rubin's finite-m variance (1 + 1/M)B rather than the raw mixture variance (M - 1)B/M. Point estimates are the average of the per-fit point estimates (the MI estimate).

Details

These methods are dispatched automatically by add_hazard, add_cumu_hazard, add_surv_prob and add_cif when given a pamm_ic object.

Parse the factor level from a factor-by smooth label

Description

Fallback for the rare case where by.level is unavailable. A label such as "s(tend):metastasesyes" encodes the level ("yes") as the suffix of the by-variable name ("metastases").

Usage

parse_by_level(label, by, lvls)

Survival data of critically ill ICU patients

Description

A data set containing the survival time (or hospital release time) among other covariates. The full data is available here. The following variables are provided:

Year: The year of ICU Admission
CombinedicuID: Intensive Care Unit (ICU) ID
CombinedID: Patient identificator
Survdays: Survival time of patients. Here it is assumed that patients survive until t=30 if released from hospital.
PatientDied: Status indicator; 1=death, 0=censoring
survhosp: Survival time in hospital. Here it is assumed that patients are censored at time of hospital release (potentially informative)
Gender: Male or female
Age: The patients age at Admission
AdmCatID: Admission category: medical, surgical elective or surgical emergency
ApacheIIScore: The patient's Apache II Score at Admission
BMI: Patient's Body Mass Index
DiagID2: Diagnosis at admission in 9 categories

Usage

patient

Format

An object of class data.frame with 2000 rows and 12 columns.

Extract interval information and median/modus values for covariates

Description

Given an object of class ped, returns data frame with one row for each interval containing interval information, mean values for numerical variables and modus for non-numeric variables in the data set.

Usage

ped_info(ped)

## S3 method for class 'ped'
ped_info(ped)

Arguments

ped

An object of class ped as returned by as_ped.

Value

A data frame with one row for each unique interval in ped.

Examples

ped <- tumor[1:4,] %>% as_ped(Surv(days, status)~ sex + age)
ped_info(ped)

Pool a list of (stripped) imputation fits into a pooled summary object

Description

Combines the m imputation fits with Rubin's rules: the pooled parametric coefficients use \bar Q and V = \bar W + (1 + 1/m) B. Smooth-term p-values are pooled with the median-p rule (see references in strip_pamm_fit). Because mgcv smooth coefficients can use different centered bases across imputations, this returns a plain pooled summary object rather than a gam: add_*() methods evaluate every imputation fit with its own design matrix for predictions.

Usage

pool_pamm_fits(fits, smry, skeleton = NULL)

Arguments

fits

List of stripped imputation fits.

smry

List of summary.gam objects, one per fit (computed before stripping).

skeleton

Optional full (unstripped) fit used only to supply a common training-grid model frame for smooth-term FMI summaries.

S3 method for pamm objects for compatibility with package pec

Description

S3 method for pamm objects for compatibility with package pec

Usage

## S3 method for class 'pamm'
predictSurvProb(object, newdata, times, ...)

Arguments

object

A fitted model from which to extract predicted survival probabilities

newdata

A data frame containing predictor variable combinations for which to compute predicted survival probabilities.

times

A vector of times in the range of the response variable, e.g. times when the response is a survival object, at which to return the survival probabilities.

...

Additional arguments that are passed on to the current method.

Extract information on concurrent effects

Description

Extract information on concurrent effects

Usage

prep_concurrent(x, formula, ...)

## S3 method for class 'list'
prep_concurrent(x, formula, ...)

Arguments

x

A suitable object from which variables contained in formula can be extracted.

...

Further arguments passed to methods.

Fit a PAMM to interval-censored data via multiple imputation

Description

Fits a piecewise exponential additive (mixed) model to interval-censored time-to-event data using a multiple-imputation (MI) and re-fit strategy: exact event times are repeatedly drawn from the model-based conditional distribution p(T \mid L < T \le R, x, \theta) (see impute_ic_times), with \theta drawn from the imputation model's asymptotic posterior before each imputation ("proper" MI – this is what makes the pooled intervals calibrated), each completed data set is transformed to PED format with the standard (right-censored) pipeline and re-fit, and the resulting fits are pooled for inference with the existing add_* family (see add_surv_prob and the pamm_ic methods).

Usage

## S3 method for class 'pamm_ic'
print(x, ...)

## S3 method for class 'pamm_ic'
summary(object, ...)

## S3 method for class 'summary.pamm_ic'
print(x, ...)

pamm_ic(
  formula,
  data,
  model_formula = NULL,
  cut = NULL,
  max_time = NULL,
  m = 10L,
  iter = 1L,
  init = c("midpoint", "uniform"),
  id = "id",
  engine = "gam",
  ...
)

Arguments

x, object

A pamm_ic object.

...

Further arguments passed to pamm / mgcv.

formula

A two-sided formula whose left-hand side is an interval-censored response Surv(L, R, type = "interval2") and whose right-hand side lists the covariates to retain (as in as_ped).

data

A data frame in standard (one row per subject) format.

model_formula

Optional model formula passed to pamm (e.g.\ ped_status ~ s(tend) + x). If NULL, a default ped_status ~ s(tend) + <covariates> formula is constructed.

cut

Optional fixed vector of interval cut-points shared across all imputations. If NULL, the finite interval endpoints are used.

max_time

Optional cap on the cut-points.

m

Number of imputations (default 10).

iter

init

Initialiser for the first fit: "midpoint" (default) or "uniform" imputation within each interval.

id

Name of the subject identifier column.

engine

Estimation engine passed to pamm ("gam" or "bam").

Details

An imputed event time is an exact event time, so once imputation has produced it, the entire downstream pipeline (split_data -> pamm -> add_*) is reused unchanged. The interval cut-points are resolved once and shared across all imputations, but mgcv's smooth bases and centering constraints can still differ across completed data sets. Pooled predictions therefore evaluate each fitted imputation model with its own design matrix; object$pooled is a summary container, not a gam-like model for direct predict() or plot() calls.

Value

An object of class pamm_ic: a list with

fits: the m imputation fits, each slimmed (via strip_pamm_fit) to drop per-observation slots so memory does not scale with the number of imputations; they still support coef, vcov and predict(type = "lpmatrix"), which is all the pooled add_* methods need.
pooled: a pooled summary container with Rubin-pooled parametric coefficients and covariance, pooled parametric/smooth tables with median-p values ($p.table, $s.table), parametric coefficient FMI diagnostics ($fmi.table) and smooth-term FMI five-number summaries over the training grid ($smooth.fmi).
init_fit: the (slimmed) initialiser/imputation model.
unstable_chains: indices of imputation chains flagged as numerically unstable (extreme coefficients or coefficient SEs on the log-hazard scale; also raised as a warning). Degenerate chains can arise – silently, without mgcv warnings – when iterating flexible time-varying models on small samples.
others: the parsed bounds ic, the shared cut, and metadata.

print/summary report the pooled summary; add_* compute pooled quantities of interest from fits.

Ensure all breakpoints are present in newdata for cumulative calculations

Description

Checks whether all cut points up to the maximum observed time are present in newdata. If not, expands the data frame to include the missing breakpoints via expand_df. In either case the function guarantees that an interval-length column (default intlen) exists on return. Existing grouping is preserved after expansion.

Usage

reconstruct_cutpoints(newdata, object, time_var, interval_length)

Arguments

newdata

A data frame with a time column and, optionally, grouping. Must carry the trafo_args and intvars attributes set by as_ped.

object

A fitted PAM/PAMM model object, passed to expand_df.

time_var

Character name of the time variable (e.g. "tend").

interval_length

Character name of the interval-length column. If absent from newdata it is created via reconstruct_intlen.

Value

A data frame with all required breakpoints present and an interval_length column guaranteed to exist.

Reconstruct intlen from time variable and stored cut points

Description

Computes interval lengths from the sorted unique values of the time variable in newdata. This is used by add_* functions that need intlen for cumulative calculations. If tstart is not available, the first interval length is taken as the first sorted time value (implicitly assuming a 0-origin time scale).

Usage

reconstruct_intlen(newdata, time_var = "tend", interval_length = "intlen")

Arguments

newdata

A data frame with a time column (default tend).

time_var

Character name of the time variable. Defaults to "tend".

interval_length

Character name of the interval-length column to create.

Value

The input data frame with an intlen column added.

Resolve requested terms against the model's plottable smooths

Description

Resolve requested terms against the model's plottable smooths

Usage

resolve_terms(smooth_tbl, terms)

Arguments

smooth_tbl

Output of get_smooth_terms.

terms

A character vector of requested terms, or NULL for all.

Draw random numbers from piece-wise exponential distribution.

Description

This is a copy of the same function from rpexp from package msm. Copied here to reduce dependencies.

Usage

rpexp(n = 1, rate = 1, t = 0)

Arguments

n

number of observations. If length(n) > 1, the length is taken to be the number required.

rate

vector of rates.

t

vector of the same length as rate, giving the times at which the rate changes. The values of t should be in increasing order.

Draw coefficients from their approximate posterior distribution

Description

Simulation based confidence intervals are calculated by drawing coefficient vectors from their asymptotic (posterior) distribution, a multivariate normal with mean get_coefs and covariance get_Vp. For scam models this means that draws are obtained on the scale of the re-parametrized (partially exponentiated) coefficients, i.e., based on the same normal approximation that underlies the reported standard errors of the model (the exact posterior of the constrained coefficients is not Gaussian, so individual draws may violate the shape constraints slightly).

Usage

sample_coefs(object, nsim, ...)

## Default S3 method:
sample_coefs(object, nsim, ...)

Arguments

object

A fitted model object.

nsim

Number of draws.

...

Further arguments passed to methods.

Value

A matrix with nsim rows, one coefficient vector per row, on the scale of the design matrix returned by make_X.

Extract information of the sample contained in a data set

Description

Given a data set and grouping variables, this function returns mean values for numeric variables and modus for characters and factors. Usually this function should not be called directly but will rather be called as part of a call to make_newdata.

Usage

sample_info(x)

## S3 method for class 'data.frame'
sample_info(x)

## S3 method for class 'ped'
sample_info(x)

## S3 method for class 'fped'
sample_info(x)

Arguments

x

A data frame (or object that inherits from data.frame).

Value

A data frame containing sample information (for each group). If applied to an object of class ped, the sample means of the original data is returned. Note: When applied to a ped object, that doesn't contain covariates (only interval information), returns data frame with 0 columns.

Generate a sequence over the range of a vector

Description

Stolen from here

Usage

seq_range(x, n, by, trim = NULL, expand = NULL, pretty = FALSE)

Arguments

x

A numeric vector

n, by

Specify the output sequence either by supplying the length of the sequence with n, or the spacing between value with by. Specifying both is an error.

I recommend that you name these arguments in order to make it clear to the reader.

trim

Optionally, trim values off the tails. trim / 2 * length(x) values are removed from each tail.

expand

Optionally, expand the range by expand * (1 + range(x) (computed after trimming).

pretty

If TRUE, will generate a pretty sequence. If n is supplied, this will use pretty() instead of seq(). If by is supplied, it will round the first value to a multiple of by.

Examples

x <- rcauchy(100)
seq_range(x, n = 10)
seq_range(x, n = 10, trim = 0.1)
seq_range(x, by = 1, trim = 0.1)

# Make pretty sequences
y <- runif (100)
seq_range(y, n = 10)
seq_range(y, n = 10, pretty = TRUE)
seq_range(y, n = 10, expand = 0.5, pretty = TRUE)

seq_range(y, by = 0.1)
seq_range(y, by = 0.1, pretty = TRUE)

Draw hazard trajectories from a model's sampling distribution

Description

Internal seam used by the simulation-based confidence interval helpers (get_sim_ci, get_sim_ci_cumu, get_sim_ci_surv). It returns a matrix of nsim draws of the (response-scale) hazard, one column per draw and one row per row of newdata. The default method draws coefficient vectors via sample_coefs and evaluates the linear predictor make_X(object, newdata) %*% z; other backends (e.g. a bootstrap ensemble that has no coefficient covariance) can provide their own method to obtain simulation-based intervals from the same machinery.

Usage

sim_hazard(object, newdata, nsim = 100L, ...)

## Default S3 method:
sim_hazard(object, newdata, nsim = 100L, ...)

Arguments

object

A fitted model object.

newdata

A data frame for which hazards are predicted.

nsim

Number of draws.

...

Further arguments passed to methods.

Value

A numeric matrix with nrow(newdata) rows and nsim columns of hazard draws on the response scale. The draws are produced once for the whole newdata, so the callers can share one set of draws across groups by passing the full (grouped) data.

Simulate survival times from the piece-wise exponential distribution

Description

Simulate survival times from the piece-wise exponential distribution

Usage

sim_pexp(formula, data, cut)

Arguments

formula

An extended formula that specifies the linear predictor. If you want to include a smooth baseline or time-varying effects, use t within your formula as if it was a covariate in the data, although it is not and should not be included in the data provided to sim_pexp. See examples below. Covariates enter the (numeric) linear predictor directly, so factor/character covariates must be encoded explicitly, e.g.\ as an indicator (trt == "1"); using a factor in arithmetic (b * trt) is an error rather than a silent coercion.

data

A data set with variables specified in formula.

cut

A sequence of time-points starting with 0.

Examples

library(survival)
library(dplyr)
library(pammtools)

# set number of observations/subjects
n <- 250
# create data set with variables which will affect the hazard rate.
df <- cbind.data.frame(x1 = runif (n, -3, 3), x2 = runif (n, 0, 6)) %>%
 as_tibble()
# the formula which specifies how covariates affet the hazard rate
f0 <- function(t) {
 dgamma(t, 8, 2) *6
}
form <- ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)
set.seed(24032018)
sim_df <- sim_pexp(form, df, 1:10)
head(sim_df)
plot(survfit(Surv(time, status)~1, data = sim_df ))

# for control, estimate with Cox PH
mod <- coxph(Surv(time, status) ~ x1 + pspline(x2), data=sim_df)
coef(mod)[1]
layout(matrix(1:2, nrow=1))
termplot(mod, se = TRUE)

# and using PAMs
layout(1)
ped <- sim_df %>% as_ped(Surv(time, status)~., max_time=10)
library(mgcv)
pam <- gam(ped_status ~ s(tend) + x1 + s(x2), data=ped, family=poisson, offset=offset)
coef(pam)[2]
plot(pam, page=1)

## Not run: 
# Example 2: Functional covariates/cumulative coefficients
# function to generate one exposure profile, tz is a vector of time points
# at which TDC z was observed
rng_z = function(nz) {
  as.numeric(arima.sim(n = nz, list(ar = c(.8, -.6))))
}
# two different exposure times  for two different exposures
tz1 <- 1:10
tz2 <- -5:5
# generate exposures and add to data set
df <- df %>%
  add_tdc(tz1, rng_z) %>%
  add_tdc(tz2, rng_z)
df

# define tri-variate function of time, exposure time and exposure z
ft <- function(t, tmax) {
  -1*cos(t/tmax*pi)
}
fdnorm <- function(x) (dnorm(x,1.5,2)+1.5*dnorm(x,7.5,1))
wpeak2 <- function(lag) 15*dnorm(lag,8,10)
wdnorm <- function(lag) 5*(dnorm(lag,4,6)+dnorm(lag,25,4))
f_xyz1 <- function(t, tz, z) {
  ft(t, tmax=10) * 0.8*fdnorm(z)* wpeak2(t - tz)
}
f_xyz2 <- function(t, tz, z) {
  wdnorm(t-tz) * z
}

# define lag-lead window function
ll_fun <- function(t, tz) {t >= tz}
ll_fun2 <- function(t, tz) {t - 2 >= tz}
# simulate data with cumulative effect
sim_df <- sim_pexp(
  formula = ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)|
     fcumu(t, tz1, z.tz1, f_xyz=f_xyz1, ll_fun=ll_fun) +
     fcumu(t, tz2, z.tz2, f_xyz=f_xyz2, ll_fun=ll_fun2),
  data = df,
  cut = 0:10)

## End(Not run)

Simulate data for competing risks scenario

Description

Simulate data for competing risks scenario

Usage

sim_pexp_cr(formula, data, cut)

Simulated data with cumulative effects

Description

This is data simulated using the sim_pexp function. It contains two time-constant and two time-dependent covariates (observed on different exposure time grids). The code used for simulation is contained in the examples of ?sim_pexp.

Usage

simdf_elra

Format

An object of class nested_fdf (inherits from sim_df, tbl_df, tbl, data.frame) with 250 rows and 9 columns.

New basis for penalized lag selection

Description

Originally proposed in Obermeier et al., 2015, Flexible Distributed Lags for Modelling Earthquake Data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12077. Here extended in order to penalize lead times in addition to lag times. Ideally the lag-lead window would then be selected in a data-driven fashion. Treat as experimental.

Usage

## S3 method for class 'fdl.smooth.spec'
smooth.construct(object, data, knots)

Arguments

object

An object handled by mgcv

data

The data set

knots

A vector of knots

Turn a single mgcv smooth into zero or more curve specifications

Description

Turn a single mgcv smooth into zero or more curve specifications

Usage

smooth_term_rows(s, data)

Arguments

s

A single smooth object from fit$smooth.

data

A data frame containing the variables used to fit the model.

Function to transform data without time-dependent covariates into piece-wise exponential data format

Description

Function to transform data without time-dependent covariates into piece-wise exponential data format

Usage

split_data(
  formula,
  data,
  cut = NULL,
  max_time = NULL,
  multiple_id = FALSE,
  ...
)

Arguments

formula

data

cut

max_time

If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.

multiple_id

Are occurences of same id allowed (per transition). Defaults to FALSE, but is sometimes set to TRUE, e.g., in case of multi-state models with back transitions.

...

Split data to obtain recurrent event data in PED format

Description

Currently, the input data must be in start-stop notation for each spell and contain a colum that indicates the spell (event number).

Usage

split_data_multistate(
  formula,
  data,
  transition = character(),
  cut = NULL,
  max_time = NULL,
  event = 1L,
  min_events = 1L,
  timescale = c("gap", "calendar"),
  ...
)

Arguments

formula

data

transition

A character indicating the column in data that indicates the event/episode number for recurrent events.

cut

max_time

If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.

event

The value that encodes the occurrence of an event in the data set.

min_events

Minimum number of events for each event number.

timescale

Defines the timescale for the recurrent event data transformation. Defaults to "gaptime".

...

Time until staphylococcus aureaus infection in children, with possible recurrence

Description

This dataset originates from the Drakenstein child health study. The data contains the following variables:

id: Randomly generated unique child ID
t.start: The time at which the child enters the risk set for the $k$-th event
t.stop: Time of $k$-th infection or censoring

enum: Event number. Maximum of 6.
hiv

Usage

staph

Format

An object of class tbl_df (inherits from tbl, data.frame) with 374 rows and 6 columns.

Slim down a fitted PAMM for storage inside a `pamm_ic` object

Description

Removes the per-observation slots (model frame, fitted values, residuals, working weights, ...) and the call (which captures the full PED data), none of which are needed for the downstream multiple-imputation pooling: the pooled add_* methods only require each fit's coefficients, Vp/Ve and the smooth/parametric structure used by predict(type = "lpmatrix"). Stripping makes the stored size independent of the data set size, so memory does not blow up with many imputations.

Usage

strip_pamm_fit(fit)

Arguments

fit

A fitted pamm/gam object.

Value

The same object with large per-observation slots removed; class and everything needed for predict/coef/vcov are retained.

Extract fixed coefficient table from model object

Description

Given a model object, returns a data frame with columns variable, coef (coefficient), ci_lower (lower 95\ ci_upper (upper 95\

Usage

tidy_fixed(x, ...)

## S3 method for class 'gam'
tidy_fixed(x, intercept = FALSE, ...)

## S3 method for class 'scam'
tidy_fixed(x, intercept = FALSE, ...)

## S3 method for class 'coxph'
tidy_fixed(x, ...)

Arguments

x

A model object.

...

Currently not used.

intercept

Should intercept also be returned? Defaults to FALSE.

Examples

library(survival)
gc <- coxph(Surv(days, status)~age + sex, data = tumor)
tidy_fixed(gc)

Extract random effects in tidy data format.

Description

Extract random effects in tidy data format.

Usage

tidy_re(x, keep = c("fit", "main", "xlab", "ylab"), ...)

Arguments

x

a fitted gam object as produced by gam().

keep

A vector of variables to keep.

...

Further arguments passed to plot.gam

Extract 1d smooth objects in tidy data format.

Description

Extract 1d smooth objects in tidy data format.

Usage

tidy_smooth(
  x,
  keep = c("x", "fit", "se", "xlab", "ylab"),
  ci = TRUE,
  conf_level = 0.95,
  ...
)

Arguments

x

a fitted gam object as produced by gam().

keep

A vector of variables to keep.

ci

A logical value indicating whether confidence intervals should be calculated and returned. Defaults to TRUE.

conf_level

Numeric scalar in (0, 1). Confidence level used for the returned confidence intervals when ci = TRUE. Defaults to 0.95.

...

Further arguments passed to plot.gam

Extract 2d smooth objects in tidy format.

Description

Extract 2d smooth objects in tidy format.

Usage

tidy_smooth2d(
  x,
  keep = c("x", "y", "fit", "se", "xlab", "ylab", "main"),
  ci = FALSE,
  conf_level = 0.95,
  ...
)

Arguments

x

a fitted gam object as produced by gam().

keep

A vector of variables to keep.

ci

A logical value indicating whether confidence intervals should be calculated and returned. Defaults to TRUE.

conf_level

Numeric scalar in (0, 1). Confidence level used for the returned confidence intervals when ci = TRUE. Defaults to 0.95.

...

Further arguments passed to plot.gam

Stomach area tumor data

Description

Information on patients treated for a cancer disease located in the stomach area. The data set includes:

days: Time from operation until death in days.
status: Event indicator (0 = censored, 1 = death).
age: The subject's age.
sex: The subject's sex (male/female).
charlson_score: Charlson comorbidity score, 1-6.
transfusion: Has subject received transfusions (no/yes).
complications: Did major complications occur during operation (no/yes).
metastases: Did the tumor develop metastases? (no/yes).
resection: Was the operation accompanied by a major resection (no/yes).

Usage

tumor

Format

An object of class tbl_df (inherits from tbl, data.frame) with 776 rows and 9 columns.

Warn if new t_j are used

Description

Warn if new t_j are used

Usage

warn_about_new_time_points(object, newdata, ...)

## S3 method for class 'pamm'
warn_about_new_time_points(object, newdata, ...)

Warn if new t_j are used

Description

Warn if new t_j are used

Usage

## S3 method for class 'glm'
warn_about_new_time_points(object, newdata, time_var, ...)

Package {pammtools}

pammtools: Piece-wise exponential Additive Mixed Modeling tools.

Description

Details

Author(s)

References

See Also

Pipe operator

Description

Usage

Add cumulative incidence function to data

Description

Usage

Arguments

Details

Examples

Add counterfactual observations for possible transitions

Description

Usage

Arguments

Add predicted (cumulative) hazard to data set

Description

Usage

Arguments

Details

See Also

Examples

Turn exact event times into interval-censored observations

Description

Usage

Arguments

Value

See Also

Examples

Add survival probability estimates

Description

Usage

Arguments

Details

See Also

Examples

Add time-dependent covariate to a data set

Description

Usage

Arguments

Embeds the data set with the specified (relative) term contribution

Description

Usage

Arguments

Examples

Add transition probabilities confidence intervals

Description

Usage

Add transition probabilities

Description

Usage

Arguments

Details

Examples

Transform crps object to data.frame

Description

Usage

Arguments

Transform data to Piece-wise Exponential Data (PED)

Description

Usage

Arguments

Details

Value

Examples

Competing risks trafo

Description

Usage

Arguments

Details

Value

Examples

Calculate confidence intervals

Description

Usage

Time-dependent covariates of the `patient` data set.

`dplyr` Verbs for `ped`-Objects