| Title: | Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis |
| Version: | 0.8.0 |
| Date: | 2026-06-17 |
| Description: | The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi:10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated data as well as competing risks, recurrent events and multi-state settings. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization. |
| Depends: | R (≥ 4.1.0) |
| Imports: | mgcv, survival (≥ 2.39-5), checkmate, magrittr, rlang, tidyr (≥ 1.0.0), ggplot2 (≥ 3.2.2), dplyr (≥ 1.0.0), purrr (≥ 0.2.3), tibble, lazyeval, Formula, mvtnorm, pec, vctrs (≥ 0.3.0), scam |
| Suggests: | testthat, mstate, broom, etm, xgboost |
| Config/Needs/website: | coxme, eha, etm, scam, msm, mvna, rjags, brms, xgboost, TBFmultinomial |
| License: | MIT + file LICENSE |
| LazyData: | true |
| URL: | https://adibender.github.io/pammtools/ |
| BugReports: | https://github.com/adibender/pammtools/issues |
| Encoding: | UTF-8 |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-22 11:37:49 UTC; abender |
| Author: | Andreas Bender |
| Maintainer: | Andreas Bender <andreas.bender@stat.uni-muenchen.de> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-22 12:50:15 UTC |
pammtools: Piece-wise exponential Additive Mixed Modeling tools.
Description
pammtools provides functions and utilities that facilitate fitting
Piece-wise Exponential Additive Mixed Models (PAMMs), including data
transformation and other convenience functions for pre- and post-processing
as well as plotting.
Details
The best way to get an overview of the functionality provided and how to fit PAMMs is to view the vignettes available at https://adibender.github.io/pammtools/articles/. A summary of the vignettes' content is given below:
-
basics: Introduction to PAMMs and basic modeling.
-
baseline: Shows how to estimate and visualize baseline model (without covariates) and comparison to respective Cox-PH model.
-
convenience: Convenience functions for post-processing and plotting PAMMs.
-
data-transformation: Transforming data into a format suitable to fit PAMMs.
-
frailty: Specifying "frailty" terms, i.e., random effects for PAMMs.
-
splines: Specifying spline smooth terms for PAMMs.
-
strata: Specifying stratified models in which each level of a grouping variable has a different baseline hazard.
-
tdcovar: Dealing with time-dependent covariates.
-
tveffects: Specifying time-varying effects.
-
left-truncation: Estimation for left-truncated data.
-
competing-risks: Competing risks analysis.
Author(s)
Maintainer: Andreas Bender andreas.bender@stat.uni-muenchen.de (ORCID)
Authors:
Andreas Bender andreas.bender@stat.uni-muenchen.de (ORCID)
Fabian Scheipl fabian.scheipl@stat.uni-muenchen.de (ORCID)
Johannes Piller johannes.piller@lmu.de (ORCID)
Philipp Kopper philipp.kopper@stat.uni-muenchen.de (ORCID)
Other contributors:
Lukas Burk burk@leibniz-bips.de (ORCID) [contributor]
References
Bender, Andreas, Andreas Groll, and Fabian Scheipl. 2018. “A Generalized Additive Model Approach to Time-to-Event Analysis” Statistical Modelling, February. https://doi.org/10.1177/1471082X17748083.
Bender, Andreas, Fabian Scheipl, Wolfgang Hartl, Andrew G. Day, and Helmut Küchenhoff. 2019. “Penalized Estimation of Complex, Non-Linear Exposure-Lag-Response Associations.” Biostatistics 20 (2): 315–31. https://doi.org/10.1093/biostatistics/kxy003.
Bender, Andreas, and Fabian Scheipl. 2018. “pammtools: Piece-Wise Exponential Additive Mixed Modeling Tools.” ArXiv:1806.01042 Stat, June. https://arxiv.org/abs/1806.01042. Ramjith J, Bender A, Roes KCB, Jonker MA. Recurrent events analysis with piece-wise exponential additive mixed models. 2022. Statistical Modelling., 2022
See Also
Useful links:
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Add cumulative incidence function to data
Description
Add cumulative incidence function to data
Usage
add_cif(newdata, object, ...)
## Default S3 method:
add_cif(
newdata,
object,
ci = TRUE,
overwrite = FALSE,
alpha = 0.05,
nsim = 500L,
cause_var = "cause",
time_var = NULL,
interval_length = "intlen",
...
)
## S3 method for class 'pamm_ic'
add_cif(
newdata,
object,
ci = TRUE,
alpha = 0.05,
nsim = 500L,
cause_var = "cause",
time_var = NULL,
interval_length = "intlen",
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
... |
Further arguments passed to |
ci |
|
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
alpha |
Significance level for pooled confidence intervals. |
nsim |
Total number of pooled posterior draws used for the interval. |
cause_var |
Character. Column name of the 'cause' variable. |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
interval_length |
|
Details
When computing cumulative incidence for multiple groups, the input data must
be grouped via group_by() before calling this function. Omitting
group_by() will not produce an error or warning but will return
silently incorrect results, as the cumulative incidence will be accumulated
over the entire dataset rather than within each group.
The returned data contains one boundary row per group at time_var = 0
for plotting cumulative incidence from the time origin. On this row,
cif = 0; if confidence intervals are requested,
cif_lower = cif_upper = 0. If an interval-length column is present,
it is set to 0 on the boundary row. add_cumu_hazard() adds an
analogous boundary row (with cumu_hazard = 0) for continuous-time
models (GAM/SCAM/PAMM), controllable via its boundary argument;
interval-factor models (e.g. PEM via glm) keep the original prediction
grid without a boundary row.
Examples
if (require("etm")) {
data("fourD", package = "etm")
ped_stacked <- fourD |>
dplyr::select(-medication, -treated) |>
as_ped(Surv(time, status) ~., id = "id") |>
dplyr::mutate(cause = as.factor(cause))
pam <- pamm(
ped_status ~ s(tend, by = cause) + sex + sex:cause + age + age:cause,
data = ped_stacked)
ped_stacked |>
make_newdata(tend = unique(tend), cause = unique(cause)) |>
group_by(cause) |>
add_cif(pam)
}
Add counterfactual observations for possible transitions
Description
If data only contains one row per transition that took place, this function adds additional rows for each transition that was possible at that time (for each subject in the data).
Usage
add_counterfactual_transitions(
data,
from_to_pairs = list(),
from_col = "from",
to_col = "to",
transition_col = "transition"
)
Arguments
data |
Data set that only contains rows for transitions that took place. |
from_to_pairs |
A list with one element for each possible initial state. The values of each list element indicate possible transitions from that state. Will be calculated from the data if unspecified. |
from_col |
Name of the column that stores initial state. |
to_col |
Name of the column that stores end state. |
transition_col |
Name of the column that contains the transition identifier (factor variable). |
Add predicted (cumulative) hazard to data set
Description
Add (cumulative) hazard based on the provided data set and model.
If ci=TRUE confidence intervals (CI) are also added. Their width can
be controlled via the se_mult argument. The method by which the
CI are calculated can be specified by ci_type.
This is a wrapper around
predict.gam. When reference is specified, the
(log-)hazard ratio is calculated. In addition to models fit with
gam/bam or glm,
shape-constrained additive models fit with scam are
supported (e.g., for monotone baseline hazards). For scam models all
calculations (including delta-method and simulation based confidence
intervals) are based on the re-parametrized coefficients and their
covariance matrix, i.e., on the same normal approximation that underlies
the standard errors reported by scam itself.
Usage
add_hazard(newdata, object, ...)
## Default S3 method:
add_hazard(
newdata,
object,
reference = NULL,
type = c("response", "link"),
ci = TRUE,
se_mult = 2,
ci_type = c("default", "delta", "sim"),
overwrite = FALSE,
time_var = NULL,
nsim = 100L,
alpha = 0.05,
...
)
add_cumu_hazard(newdata, object, ...)
## Default S3 method:
add_cumu_hazard(
newdata,
object,
ci = TRUE,
se_mult = 2,
overwrite = FALSE,
time_var = NULL,
interval_length = "intlen",
boundary = TRUE,
...
)
## S3 method for class 'pamm_ic'
add_hazard(
newdata,
object,
ci = TRUE,
alpha = 0.05,
nsim = 500L,
time_var = NULL,
...
)
## S3 method for class 'pamm_ic'
add_cumu_hazard(
newdata,
object,
ci = TRUE,
alpha = 0.05,
nsim = 500L,
time_var = NULL,
interval_length = "intlen",
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
... |
Further arguments passed to |
reference |
A data frame with number of rows equal to |
type |
Either |
ci |
|
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
ci_type |
The method by which standard errors/confidence intervals
will be calculated. Default transforms the linear predictor at
respective intervals. |
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
nsim |
Total number of pooled posterior draws used for the interval. |
alpha |
Significance level for pooled confidence intervals (a
|
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
boundary |
Logical. If |
Details
When computing cumulative hazards or survival probabilities across groups,
the input data must be grouped via group_by() prior to calling
add_cumu_hazard() or add_surv_prob(). Omitting
group_by() will not produce an error or warning but will return
silently incorrect results, as the cumulative hazard will be accumulated
over the entire dataset rather than within each group.
See the workflow vignette
for a worked example.
See Also
Examples
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age)
pam <- mgcv::gam(ped_status ~ s(tend)+age, data = ped, family=poisson(), offset=offset)
ped_info(ped) %>% add_hazard(pam, type="link")
ped_info(ped) %>% add_hazard(pam, type = "response")
ped_info(ped) %>% add_cumu_hazard(pam)
Turn exact event times into interval-censored observations
Description
Convenience helper to manufacture interval-censored (panel) data from exact
simulated survival times (e.g.\ the output of sim_pexp), for
coverage studies and examples. Each subject is "inspected" at a sequence of
times; the true event time is then only known to lie between the last clean
and the first positive inspection. The exact time is retained (by default in
column true_time) so that coverage can be scored against the truth.
Usage
add_inspections(
data,
time_var = "time",
status_var = "status",
mechanism = c("random", "fixed", "mixed"),
rate = 1,
schedule = NULL,
max_time = NULL,
terminal_exam = TRUE,
keep_truth = TRUE,
L = "L",
R = "R"
)
Arguments
data |
A data frame with one row per subject containing an exact event
time and a status indicator (as produced by |
time_var, status_var |
Names of the (exact) event-time and status columns.
|
mechanism |
Inspection mechanism: |
rate |
Inspection rate for |
schedule |
Numeric vector of inspection times for
|
max_time |
Inspection horizon. Defaults to |
terminal_exam |
Logical; if |
keep_truth |
Logical; keep the exact event time in |
L, R |
Names of the created lower/upper bound columns. |
Value
data augmented with interval bounds in columns L and
R (and true_time). Use
Surv(L, R, type = "interval2") on the result.
See Also
Examples
set.seed(1)
df <- data.frame(x = runif(100, -1, 1))
sdf <- sim_pexp(~ -2 + 0.4 * x, df, cut = seq(0, 10, by = 0.5))
icd <- add_inspections(sdf, rate = 1)
fit <- pamm_ic(Surv(L, R, type = "interval2") ~ x, icd, m = 5)
Add survival probability estimates
Description
Given suitable data (i.e. data with all columns used for estimation of the model),
this functions adds a column surv_prob containing survival probabilities
for the specified covariate and follow-up information (and CIs
surv_lower, surv_upper if ci=TRUE).
Usage
add_surv_prob(newdata, object, ...)
## Default S3 method:
add_surv_prob(
newdata,
object,
ci = TRUE,
se_mult = 2,
overwrite = FALSE,
time_var = NULL,
interval_length = "intlen",
boundary = TRUE,
...
)
## S3 method for class 'pamm_ic'
add_surv_prob(
newdata,
object,
ci = TRUE,
alpha = 0.05,
nsim = 500L,
time_var = NULL,
interval_length = "intlen",
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
... |
Further arguments passed to |
ci |
|
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
boundary |
Logical. If |
alpha |
Significance level for pooled confidence intervals. |
nsim |
Total number of pooled posterior draws used for the interval. |
Details
When computing cumulative hazards or survival probabilities across groups,
the input data must be grouped via group_by() prior to calling
add_cumu_hazard() or add_surv_prob(). Omitting
group_by() will not produce an error or warning but will return
silently incorrect results, as the cumulative hazard will be accumulated
over the entire dataset rather than within each group.
See the workflow vignette
for a worked example.
The returned data contains one boundary row per group at time_var = 0
for plotting cumulative quantities from the time origin. On this row,
surv_prob = 1; if confidence intervals are requested,
surv_lower = surv_upper = 1. If an interval-length column is present,
it is set to 0 on the boundary row.
See Also
Examples
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age)
pam <- mgcv::gam(ped_status ~ s(tend)+age, data=ped, family=poisson(), offset=offset)
ped_info(ped) %>% add_surv_prob(pam, ci=TRUE)
Add time-dependent covariate to a data set
Description
Given a data set in standard format (with one row per subject/observation),
this function adds a column with the specified exposure time points
and a column with respective exposures, created from rng_fun.
This function should usually only be used to create data sets passed
to sim_pexp.
Usage
add_tdc(data, tz, rng_fun, ...)
Arguments
data |
A data set with variables specified in |
tz |
A numeric vector of exposure times (relative to the
beginning of the follow-up time |
rng_fun |
A random number generating function that creates
the time-dependent covariates at time points |
... |
Currently not used. |
Embeds the data set with the specified (relative) term contribution
Description
Adds the contribution of a specific term to the
linear predictor to the data specified by newdata.
Essentially a wrapper to predict.gam, with type="terms".
Thus most arguments and their documentation below is from predict.gam.
Shape-constrained additive models fit with scam are
supported as well.
Usage
add_term(newdata, object, term, reference = NULL, ci = TRUE, se_mult = 2, ...)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
term |
A character (vector) or regular expression indicating for which term(s) information should be extracted and added to data set. |
reference |
A data frame with number of rows equal to |
ci |
|
se_mult |
The factor by which standard errors are multiplied to form confidence intervals. |
... |
Further arguments passed to |
Examples
library(ggplot2)
ped <- as_ped(tumor, Surv(days, status)~ age, cut = seq(0, 2000, by = 100))
pam <- mgcv::gam(ped_status ~ s(tend) + s(age), family = poisson(),
offset = offset, data = ped)
#term contribution for sequence of ages
s_age <- ped %>% make_newdata(age = seq_range(age, 50)) %>%
add_term(pam, term = "age")
ggplot(s_age, aes(x = age, y = fit)) + geom_line() +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)
# term contribution relative to mean age
s_age2 <- ped %>% make_newdata(age = seq_range(age, 50)) %>%
add_term(pam, term = "age", reference = list(age = mean(.$age)))
ggplot(s_age2, aes(x = age, y = fit)) + geom_line() +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)
Add transition probabilities confidence intervals
Description
Add transition probabilities confidence intervals
Usage
add_trans_ci(newdata, object, nsim = 100L, alpha = 0.05, ...)
Add transition probabilities
Description
add_trans_prob adds transition probabilities on the provided data set and model.
Optionally, confidence intervals (CI) are added if ci=TRUE.
The function builds on cumulative hazards cumu_hazard and mgcv::gam models.
Usage
add_trans_prob(
newdata,
object,
overwrite = FALSE,
ci = FALSE,
alpha = 0.05,
nsim = 100L,
time_var = "tend",
interval_length = "intlen",
transition = "transition",
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions are required. If this is not provided then predictions corresponding to the original data are returned. If newdata is provided then it should contain all the variables needed for prediction: a warning is generated if not. See details for use with linear.functional.terms. |
object |
A fitted |
overwrite |
Should transition probability columns be overwritten if
already present in the data set? Defaults to |
ci |
|
alpha |
Sets the confidence intervals' |
nsim |
Sets the number of iterations for simulated confidence intervals.
Defaults to |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
interval_length |
|
transition |
|
... |
Further arguments passed to underlying methods. |
Details
When computing transition probabilities for multiple groups, the input data must
be grouped via group_by() before calling this function. Omitting
group_by() will not produce an error or warning but will return
silently incorrect results, as the transition probability will be accumulated
over the entire dataset rather than within each group.
The returned data contains one boundary row per group and transition at
time_var = 0 for plotting transition probabilities from the time
origin. On this row, trans_prob = 0; if confidence intervals are
requested, trans_lower = trans_upper = 0. If an interval-length
column is present, it is set to 0 on the boundary row.
Examples
data("prothr", package = "mstate")
prothr <- prothr |>
mutate(transition = as.factor(paste0(from, "->", to))
, treat = as.factor(treat)) |>
filter(Tstart != Tstop, id <= 100) |> select(-trans)
ped <- as_ped(data= prothr, formula= Surv(Tstart, Tstop, status)~ .,
transition = "transition", id= "id", timescale = "calendar")
pam <- mgcv::bam(ped_status ~ s(tend, by=transition) + transition * treat,
data = ped, family = poisson(), offset = offset,
method = "fREML", discrete = TRUE)
ndf <- make_newdata(ped, tend = unique(tend),
treat = unique(treat),
transition = unique(transition)) |>
group_by(treat, transition) |> # important!
add_trans_prob(pam)
Transform crps object to data.frame
Description
Aas.data.frame S3 method for objects of class crps.
Usage
## S3 method for class 'crps'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
Arguments
x |
An object of class |
row.names |
|
optional |
logical. If |
... |
additional arguments to be passed to or from methods. |
Transform data to Piece-wise Exponential Data (PED)
Description
This is the general data transformation function provided by the
pammtools package. The following main applications must be distinguished:
Transformation of standard time-to-event data.
Transformation of left-truncated time-to-event data.
Transformation of time-to-event data with time-dependent covariates (TDC).
Transformation of competing risks data (single or stacked data sets).
Transformation of recurrent events and multi-state data.
For TDC data, the type of effect one wants to estimate is also
important for the data transformation step. In case of TDCs, the
right-hand-side of the formula can contain formula specials
concurrent and cumulative.
Usage
as_ped(data, ...)
## S3 method for class 'data.frame'
as_ped(
data,
formula,
cut = NULL,
max_time = NULL,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
transition = character(),
timescale = c("gap", "calendar"),
min_events = 1L,
...
)
## S3 method for class 'nested_fdf'
as_ped(data, formula, ...)
## S3 method for class 'list'
as_ped(
data,
formula,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
...
)
is.ped(x)
## S3 method for class 'ped'
as_ped(data, newdata, ...)
## S3 method for class 'pamm'
as_ped(data, newdata, ...)
as_ped_multistate(
data,
formula,
cut = NULL,
max_time = NULL,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
transition = character(),
timescale = c("gap", "calendar"),
min_events = 1L,
...
)
Arguments
data |
Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed. |
... |
Further arguments passed to the |
formula |
A two sided formula with a |
cut |
Split points, used to partition the follow-up into intervals.
If unspecified, all unique event times will be used. For competing risks,
when |
max_time |
If |
tdc_specials |
A character vector of names of potential specials in
|
censor_code |
Specifies the value of the status variable that indicates
censoring. Often this will be |
transition |
Character string. Name of the column in |
timescale |
Character string, either |
x |
any R object. |
newdata |
A new data set ( |
Details
For competing risks data, as_ped can return either:
A list of cause-specific data sets (
combine = FALSE), where each element corresponds to one event type and uses cause-specific interval split points. This is suitable for cause-specific hazards models without shared effects.A single stacked data set (
combine = TRUE, the default), where all cause-specific data sets are combined with acausecolumn as covariate. Common split points are derived from all event times. This is required for models with shared covariate effects across causes, estimated via interaction terms (e.g.,s(tend, by = cause)).
For multi-state data, as_ped extends the standard PED transformation
to each transition type. The follow-up of each subject is split at all
observed transition times across the entire dataset, and a row is added for
every interval-transition combination the subject is at risk for. Two key
differences arise compared to the single-event case:
Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.
Competing events are treated as censoring for all other transitions within the same interval.
In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.
Value
For standard and left-truncated data, a data frame of class
ped in piece-wise exponential data format. For competing risks data,
either a stacked data frame of class ped_cr (when
combine = TRUE) or a list of cause-specific ped data frames
of class ped_cr_list (when combine = FALSE). For multistate data,
the result is a stacked long-format dataset with one row per subject,
interval, and transition, which can be passed directly to a Poisson
regression model.
Examples
# Standard single-event transformation
tumor[1:3, ]
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)
# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
## Not run:
data("fourD", package = "etm")
ped_stacked <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)
# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])
# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
filter(Tstart != Tstop) %>%
as_ped(
formula = Surv(Tstart, Tstop, status) ~ .,
transition = "trans",
id = "id",
timescale = "calendar",
)
head(ped_msm)
## End(Not run)
## Not run:
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
select(id, tstart, tstop, enum, status, age) %>%
filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
formula = Surv(tstart, tstop, status) ~ age + enum,
data = cgd2,
transition = "enum",
timescale = "calendar")
## End(Not run)
Competing risks trafo
Description
This is the general data transformation function provided by the
pammtools package. The following main applications must be distinguished:
Transformation of standard time-to-event data.
Transformation of left-truncated time-to-event data.
Transformation of time-to-event data with time-dependent covariates (TDC).
Transformation of competing risks data (single or stacked data sets).
Transformation of recurrent events and multi-state data.
For TDC data, the type of effect one wants to estimate is also
important for the data transformation step. In case of TDCs, the
right-hand-side of the formula can contain formula specials
concurrent and cumulative.
Usage
as_ped_cr(
data,
formula,
cut = NULL,
max_time = NULL,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
combine = TRUE,
...
)
Arguments
data |
Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed. |
formula |
A two sided formula with a |
cut |
Split points, used to partition the follow-up into intervals.
If unspecified, all unique event times will be used. For competing risks,
when |
max_time |
If |
tdc_specials |
A character vector of names of potential specials in
|
censor_code |
Specifies the value of the status variable that indicates
censoring. Often this will be |
combine |
Logical. If |
... |
Further arguments passed to the |
Details
For competing risks data, as_ped can return either:
A list of cause-specific data sets (
combine = FALSE), where each element corresponds to one event type and uses cause-specific interval split points. This is suitable for cause-specific hazards models without shared effects.A single stacked data set (
combine = TRUE, the default), where all cause-specific data sets are combined with acausecolumn as covariate. Common split points are derived from all event times. This is required for models with shared covariate effects across causes, estimated via interaction terms (e.g.,s(tend, by = cause)).
For multi-state data, as_ped extends the standard PED transformation
to each transition type. The follow-up of each subject is split at all
observed transition times across the entire dataset, and a row is added for
every interval-transition combination the subject is at risk for. Two key
differences arise compared to the single-event case:
Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.
Competing events are treated as censoring for all other transitions within the same interval.
In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.
Value
For standard and left-truncated data, a data frame of class
ped in piece-wise exponential data format. For competing risks data,
either a stacked data frame of class ped_cr (when
combine = TRUE) or a list of cause-specific ped data frames
of class ped_cr_list (when combine = FALSE). For multistate data,
the result is a stacked long-format dataset with one row per subject,
interval, and transition, which can be passed directly to a Poisson
regression model.
Examples
# Standard single-event transformation
tumor[1:3, ]
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)
# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
## Not run:
data("fourD", package = "etm")
ped_stacked <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)
# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])
# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
filter(Tstart != Tstop) %>%
as_ped(
formula = Surv(Tstart, Tstop, status) ~ .,
transition = "trans",
id = "id",
timescale = "calendar",
)
head(ped_msm)
## End(Not run)
## Not run:
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
select(id, tstart, tstop, enum, status, age) %>%
filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
formula = Surv(tstart, tstop, status) ~ age + enum,
data = cgd2,
transition = "enum",
timescale = "calendar")
## End(Not run)
Calculate confidence intervals
Description
Given 2 column matrix or data frame, returns 3 column data.frame with coefficient estimate plus lower and upper borders of the 95% confidence intervals.
Usage
calc_ci(ftab)
Arguments
ftab |
A table with two columns, containing coefficients in the first column and standard-errors in the second column. |
Create a data frame from all combinations of data frames
Description
Works like expand.grid but for data frames.
Usage
combine_df(...)
Arguments
... |
Data frames that should be combined to one data frame. Elements of first df vary fastest, elements of last df vary slowest. |
Examples
combine_df(
data.frame(x=1:3, y=3:1),
data.frame(x1=c("a", "b"), x2=c("c", "d")),
data.frame(z=c(0, 1)))
Calculate difference in cumulative hazards and respective standard errors
Description
CIs are calculated by sampling coefficients from their posterior and
calculating the cumulative hazard difference nsim times. The CI
are obtained by the 2.5\
Usage
compute_cumu_diff(
d1,
d2,
model,
alpha = 0.05,
nsim = 100L,
time_var = "tend",
interval_length = "intlen"
)
Arguments
d1 |
A data set used as |
d2 |
See |
model |
A model object for which a predict method is implemented which
returns the design matrix (e.g., |
Formula specials for defining time-dependent covariates
Description
So far, two specials are implemented. concurrent is used when
the goal is to estimate a concurrent effect of the TDC. cumulative
is used when the goal is to estimate a cumulative effect of the TDC. These
should usually not be called directly but rather as part of the formula
argument to as_ped.
See the vignette on data transformation
for details.
Usage
cumulative(..., tz_var, ll_fun = function(t, tz) t >= tz, suffix = NULL)
concurrent(..., tz_var, lag = 0, suffix = NULL)
has_special(formula, special = "cumulative")
Arguments
... |
For |
tz_var |
The name of the variable that stores information on the times at which the TDCs specified in this term where observed. |
ll_fun |
Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure. |
lag |
a single positive number giving the time lag between for
a concurrent effect to occur (i.e., the TDC at time of exposure |
formula |
A two sided formula with a |
special |
The name of the special whose existence in the
|
Time-dependent covariates of the patient data set.
Description
This data set contains the time-dependent covariates (TDCs) for the patient
data set. Note that nutrition was protocoled for at most 12 days after
ICU admission. The data set includes:
- CombinedID
Unique patient identifier. Can be used to merge with
patientdata- Study_Day
The calendar (!) day at which calories (or proteins) were administered
- caloriesPercentage
The percentage of target calories supplied to the patient by the ICU staff
- proteinGproKG
The amount of protein supplied to the patient by the ICU staff
Usage
daily
Format
An object of class tbl_df (inherits from tbl, data.frame) with 18797 rows and 4 columns.
dplyr Verbs for ped-Objects
Description
See dplyr documentation of the respective functions for
description and examples.
Usage
## S3 method for class 'ped'
arrange(.data, ...)
## S3 method for class 'ped'
group_by(.data, ..., .add = FALSE)
## S3 method for class 'ped'
ungroup(x, ...)
## S3 method for class 'ped'
distinct(.data, ..., .keep_all = FALSE)
## S3 method for class 'ped'
filter(.data, ...)
## S3 method for class 'ped'
sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)
## S3 method for class 'ped'
sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)
## S3 method for class 'ped'
slice(.data, ...)
## S3 method for class 'ped'
select(.data, ...)
## S3 method for class 'ped'
mutate(.data, ...)
## S3 method for class 'ped'
rename(.data, ...)
## S3 method for class 'ped'
summarise(.data, ...)
## S3 method for class 'ped'
summarize(.data, ...)
## S3 method for class 'ped'
transmute(.data, ...)
## S3 method for class 'ped'
inner_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...,
keep = NULL,
na_matches = c("na", "never"),
multiple = "all",
unmatched = "drop",
relationship = NULL
)
## S3 method for class 'ped'
full_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...,
keep = NULL,
na_matches = c("na", "never"),
multiple = "all",
relationship = NULL
)
## S3 method for class 'ped'
left_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...,
keep = NULL,
na_matches = c("na", "never"),
multiple = "all",
unmatched = "drop",
relationship = NULL
)
## S3 method for class 'ped'
right_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...,
keep = NULL,
na_matches = c("na", "never"),
multiple = "all",
unmatched = "drop",
relationship = NULL
)
Arguments
.data |
an object of class |
... |
see |
x |
an object of class |
tbl |
an object of class |
size |
< |
replace |
Sample with or without replacement? |
weight |
< |
.env |
DEPRECATED. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
suffix |
If there are non-joined duplicate variables in |
keep |
Should the join keys from both
|
na_matches |
Should two |
multiple |
Handling of rows in
|
unmatched |
How should unmatched keys that would result in dropped rows be handled?
|
relationship |
Handling of the expected relationship between the keys of
|
Value
a modified ped object (except for do)
A formula special used to handle cumulative effect specifications
Description
Can be used in the second part of the formula specification provided
to sim_pexp and should only be used in this
context.
Usage
fcumu(..., by = NULL, f_xyz, ll_fun)
Extract transition information from different objects
Description
Extract transition information from different objects
Usage
from_to_pairs(t_mat, ...)
from_to_pairs2(t_mat, ...)
## S3 method for class 'data.frame'
from_to_pairs(t_mat, from_col = "from", to_col = "to", ...)
Arguments
t_mat |
an object that contains information about possible transitions. |
from_col |
The name of the column in the data frame that contains "from" states. |
to_col |
The name of the column in the data frame that contains "to" states. |
Examples
## Not run:
df = data.frame(id = c(1,1, 2,2), from = c(1, 1, 2, 2), to = c(2, 3, 2, 2))
from_to_pairs(df)
## End(Not run)
(Cumulative) (Step-) Hazard Plots.
Description
geom_hazard is an extension of the geom_line, and
is optimized for (cumulative) hazard plots. Essentially, it adds a (0,0)
row to the data, if not already the case. Stolen from the
RmcdrPlugin.KMggplot2 (slightly modified).
Usage
geom_hazard(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
...
)
geom_stephazard(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
direction = "vh",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
...
)
geom_surv(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
...
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to
|
direction |
direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values. |
See Also
Examples
library(ggplot2)
library(pammtools)
ped <- tumor[10:50,] %>% as_ped(Surv(days, status)~1)
pam <- mgcv::gam(ped_status ~ s(tend), data=ped, family = poisson(), offset = offset)
ndf <- make_newdata(ped, tend = unique(tend)) %>% add_hazard(pam)
# piece-wise constant hazards
ggplot(ndf, aes(x = tend, y = hazard)) +
geom_vline(xintercept = c(0, ndf$tend[c(1, (nrow(ndf)-2):nrow(ndf))]), lty = 3) +
geom_hline(yintercept = c(ndf$hazard[1:3], ndf$hazard[nrow(ndf)]), lty = 3) +
geom_stephazard() +
geom_step(col=2) +
geom_step(col=2, lty = 2, direction="vh")
# comulative hazard
ndf <- ndf %>% add_cumu_hazard(pam)
ggplot(ndf, aes(x = tend, y = cumu_hazard)) +
geom_hazard() +
geom_line(col=2) # doesn't start at (0, 0)
# survival probability
ndf <- ndf %>% add_surv_prob(pam)
ggplot(ndf, aes(x = tend, y = surv_prob)) +
geom_surv() +
geom_line(col=2) # doesn't start at c(0,1)
Step ribbon plots.
Description
geom_stepribbon is an extension of the geom_ribbon, and
is optimized for Kaplan-Meier plots with pointwise confidence intervals
or a confidence band. The default direction-argument "hv" is
appropriate for right-continuous step functions like the hazard rates etc
returned by pammtools.
Usage
geom_stepribbon(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
direction = "hv",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
...
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
direction |
direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to
|
See Also
geom_ribbon geom_stepribbon
Examples
library(ggplot2)
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
h <- ggplot(huron, aes(year))
h + geom_stepribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
geom_step(aes(y = level))
h + geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
geom_line(aes(y = level))
Extract the (Bayesian) covariance matrix of the model coefficients
Description
Returns the covariance matrix that matches the coefficients returned by
get_coefs. For mgcv models this is the Bayesian
posterior covariance matrix object$Vp, for scam models the
covariance matrix of the re-parametrized coefficients object$Vp.t
and vcov(object) otherwise.
Usage
get_Vp(object, ...)
## Default S3 method:
get_Vp(object, ...)
## S3 method for class 'gam'
get_Vp(object, ...)
## S3 method for class 'scam'
get_Vp(object, ...)
Arguments
object |
A fitted model object. |
... |
Further arguments passed to methods. |
Calculate CIF for one cause
Description
Internal generic dispatching CIF calculation based on the model class.
Usage
get_cif(newdata, object, ...)
## Default S3 method:
get_cif(
newdata,
object,
ci,
time_var,
interval_length = "intlen",
alpha,
nsim,
cause_var,
...
)
Arguments
newdata |
A data frame of new observations, typically created via
|
object |
A fitted model object. The method is dispatched on this argument. |
... |
Additional arguments passed to the respective method. |
Value
A data frame with CIF estimates appended.
Extract model coefficients on the scale of the design matrix
Description
Returns the coefficient vector coefs such that
make_X(object, newdata) %*% coefs yields the linear predictor.
For most models this is simply coef(object). For scam models,
however, coef() returns the coefficients on the underlying
unconstrained scale, while the linear predictor is calculated from the
re-parametrized (partially exponentiated) coefficients
object$coefficients.t.
Usage
get_coefs(object, ...)
## Default S3 method:
get_coefs(object, ...)
## S3 method for class 'scam'
get_coefs(object, ...)
Arguments
object |
A fitted model object. |
... |
Further arguments passed to methods. |
Extract cumulative coefficients (cumulative hazard differences)
Description
These functions are designed to extract (or mimic) the cumulative coefficients
usually used in additive hazards models (Aalen model) to depict (time-varying)
covariate effects. For PAMMs, these are the differences
between the cumulative hazard rates where all covariates except one have the
identical values. For a numeric covariate of interest, this calculates
\Lambda(t|x+1) - \Lambda(t|x). For non-numeric covariates
the cumulative hazard of the reference level is subtracted from
the cumulative hazards evaluated at all non reference levels. Standard
errors are calculated using the delta method.
Usage
get_cumu_coef(model, data = NULL, terms, ...)
## S3 method for class 'gam'
get_cumu_coef(
model,
data,
terms,
time_var = "tend",
interval_length = "intlen",
...
)
## S3 method for class 'scam'
get_cumu_coef(
model,
data,
terms,
time_var = "tend",
interval_length = "intlen",
...
)
## S3 method for class 'aalen'
get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)
## S3 method for class 'cox.aalen'
get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)
Arguments
model |
Object from which to extract cumulative coefficients. |
data |
Additional data if necessary. |
terms |
A character vector of variables for which the cumulative coefficient should be calculated. |
... |
Further arguments passed to methods. |
time_var |
Name of the evaluation time variable in |
interval_length |
Name of the interval-length variable in |
ci |
Logical. Indicates if confidence intervals should be returned as well. |
Calculate (or plot) cumulative effect for all time-points of the follow-up
Description
Calculate (or plot) cumulative effect for all time-points of the follow-up
Usage
get_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2)
gg_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2, ci = TRUE)
Arguments
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
z1 |
The exposure profile for which to calculate the cumulative effect. Can be either a single number or a vector of same length as unique observation time points. |
z2 |
If provided, calculated cumulative effect is for the difference between the two exposure profiles (g(z1,t)-g(z2,t)). |
se_mult |
Multiplicative factor used to calculate confidence intervals (e.g., lower = fit - 2*se). |
ci |
Logical. Indicates if confidence intervals for the |
Calculate cumulative hazard
Description
Calculate cumulative hazard
Usage
get_cumu_hazard(
newdata,
object,
ci = TRUE,
ci_type = c("default", "delta", "sim"),
time_var = NULL,
se_mult = 2,
interval_length = "intlen",
nsim = 100L,
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
ci |
|
ci_type |
The method by which standard errors/confidence intervals
will be calculated. Default transforms the linear predictor at
respective intervals. |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
nsim |
Total number of pooled posterior draws used for the interval. |
... |
Further arguments passed to |
Expand time-dependent covariates to functionals
Description
Given formula specification on how time-dependent covariates affect the outcome, creates respective functional covariate as well as auxiliary matrices for time/latency etc.
Usage
get_cumulative(data, formula)
expand_cumulative(data, func, n_func)
Arguments
data |
Data frame (or similar) in which variables specified in ... will be looked for |
formula |
A formula containing |
func |
Single evaluated |
Obtain interval break points
Description
Default method words for data frames. The list method applies the default method to each data set within the list.
Usage
get_cut(data, formula, cut = NULL, ...)
## Default S3 method:
get_cut(data, formula, cut = NULL, max_time = NULL, event = 1L, ...)
## S3 method for class 'list'
get_cut(
data,
formula,
cut = NULL,
max_time = NULL,
event = 1L,
timescale = "gap",
...
)
Exctract event types
Description
Given a formula that specifies the status variable of the outcome, this function
extracts the different event types (except for censoring, specified by
censor_code).
Usage
get_event_types(data, formula, censor_code)
Arguments
data |
Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed. |
formula |
A two sided formula with a |
censor_code |
Specifies the value of the status variable that indicates
censoring. Often this will be |
Point hazard predictor (backend primitive)
Description
Returns the predicted hazard (response scale) as a plain numeric vector, one
value per row of newdata. Together with sim_hazard this
is the only primitive a new estimation backend must provide: every derived
quantity (cumulative hazard, survival probability, CIF, transition
probabilities) and its simulation-based confidence intervals are built from
these two. Analytic ("default"/"delta") confidence intervals
additionally use make_X/get_coefs/get_Vp.
Usage
get_hazard(object, newdata, ...)
## Default S3 method:
get_hazard(object, newdata, ...)
Arguments
object |
A fitted model object. |
newdata |
A data frame for which the hazard is predicted. |
... |
Further arguments passed to methods. |
Value
A numeric vector of hazards on the response scale.
See Also
The package website (https://adibender.github.io/pammtools/)
has worked examples of implementing get_hazard and
sim_hazard for new estimation backends: the articles
“Defining a new backend: gradient boosting with xgboost” (a bootstrap
tree ensemble) and “Bayesian Baseline PAMMs” (a brms model,
drawing from the posterior).
Information on intervals in which times fall
Description
Information on intervals in which times fall
Usage
get_intervals(x, times, ...)
## Default S3 method:
get_intervals(x, times, left.open = TRUE, rightmost.closed = TRUE, ...)
Arguments
x |
An object from which interval information can be obtained,
see |
times |
A vector of times for which corresponding interval information should be returned. |
... |
Further arguments passed to |
left.open |
logical; if true all the intervals are open at left
and closed at right; in the formulas below, |
rightmost.closed |
logical; if true, the rightmost interval,
|
Value
A data.frame containing information on intervals in which
values of times fall.
See Also
Examples
set.seed(111018)
brks <- c(0, 4.5, 5, 10, 30)
int_info(brks)
x <- runif (3, 0, 30)
x
get_intervals(brks, x)
Construct or extract data that represents a lag-lead window
Description
Constructs lag-lead window data set from raw inputs or from data objects
with suitable information stored in attributes, e.g., objects created
by as_ped.
Usage
get_laglead(x, ...)
## Default S3 method:
get_laglead(x, tz, ll_fun, ...)
## S3 method for class 'data.frame'
get_laglead(x, ...)
Arguments
x |
Either a numeric vector of follow-up cut points or a suitable object. |
... |
Further arguments passed to methods. |
tz |
A vector of exposure times |
ll_fun |
Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure. |
Examples
get_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
gg_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
Extract variables from the left-hand-side of a formula
Description
Extract variables from the left-hand-side of a formula
Extract variables from the right-hand side of a formula
Usage
get_lhs_vars(formula)
get_rhs_vars(formula)
Arguments
formula |
A |
Extract variables from the left-hand-side of a formula
Description
Extract variables from the left-hand-side of a formula
Extract variables from the right-hand side of a formula
Usage
get_ped_form(
formula,
data = NULL,
tdc_specials = c("concurrent", "cumulative")
)
Arguments
formula |
A |
Extract plot information for all special model terms
Description
Given a mgcv gamObject (or a
scam object), returns the information
used for the default plots produced by plot.gam
(plot.scam, respectively).
Usage
get_plotinfo(x, ...)
Arguments
x |
a fitted |
... |
Further arguments passed to |
Calculate simulation based confidence intervals
Description
These helpers draw the simulated hazard trajectories once for the whole
(possibly grouped) newdata via sim_hazard – so one set
of draws is shared across groups – and then summarise them into pointwise
quantile intervals. Cumulative quantities are accumulated within each group
of newdata.
Usage
get_sim_ci(newdata, object, alpha = 0.05, nsim = 100L, ...)
Enumerate plottable univariate smooth terms of a fitted model
Description
Internal helper. Given the model data and a fitted gam
object, returns a tibble with one row per smooth curve to be drawn by
get_terms / gg_smooth. Only smooths that vary over
exactly one numeric covariate are returned. This includes ordinary 1d
smooths (s(), 1d ti()), by-variable smooths
(s(x, by = z)) and factor-smooth interactions
(s(x, fac, bs = "fs"), s(x, fac, bs = "sz")). Tensor and
multivariate smooths (te(), t2(), 2d ti(), s(x, z))
as well as (correlated) random effects (bs = "re", bs = "mrf")
are excluded – use gg_tensor / gg_re for those.
Usage
get_smooth_terms(data, fit)
Arguments
data |
A data frame containing the variables used to fit the model. |
fit |
A fitted model object. |
Details
Smooths that are indexed by a factor (a factor by-variable or the factor
in an fs/sz interaction) are expanded into one row per factor
level, all sharing the same facet so that gg_smooth can
draw them in a single panel, distinguished by colour/fill.
Returns NULL for models without a $smooth component (e.g.
coxph), in which case get_terms falls back
to label-based extraction.
Value
A tibble with columns facet, level, var,
col and the list-column settings, or NULL.
Calculate survival probabilities
Description
Calculate survival probabilities
Usage
get_surv_prob(
newdata,
object,
ci = TRUE,
ci_type = c("default", "delta", "sim"),
se_mult = 2L,
time_var = NULL,
interval_length = "intlen",
nsim = 100L,
...
)
Arguments
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
ci |
|
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
nsim |
Total number of pooled posterior draws used for the interval. |
... |
Further arguments passed to |
Extract variables from the left-hand-side of a formula
Description
Extract variables from the left-hand-side of a formula
Extract variables from the right-hand side of a formula
Usage
get_tdc_form(
formula,
data = NULL,
tdc_specials = c("concurrent", "cumulative"),
invert = FALSE
)
Arguments
formula |
A |
Extract variables from the left-hand-side of a formula
Description
Extract variables from the left-hand-side of a formula
Extract variables from the right-hand side of a formula
Usage
get_tdc_vars(formula, specials = "cumulative", data = NULL)
Arguments
formula |
A |
Extract the partial effect of a single smooth curve
Description
Extract the partial effect of a single smooth curve
Usage
get_term(data, fit, spec, n = 100, conf_level = 0.95, ...)
Arguments
data |
A data frame containing variables used to fit the model. The first row is used as the basis for all covariates other than the one being varied (their values are irrelevant for the term-wise contribution). |
fit |
A fitted object of class |
spec |
A single-row tibble (one row of |
n |
Number of points at which to evaluate the smooth over the range of its covariate. |
conf_level |
The confidence level for the pointwise confidence interval. |
... |
Further arguments (currently unused). |
Extract a partial effect for models without a $smooth component
Description
Fallback used for fits such as coxph that support
predict(type = "terms") but expose no mgcv smooth metadata. Matching is
anchored to the variable name (exact, or as a parenthesised argument such as
pspline(karno)) rather than an unanchored substring.
Usage
get_term_legacy(data, fit, term, n = 100, conf_level = 0.95, ...)
Arguments
data |
A data frame containing variables used to fit the model. The first row is used as the basis for all covariates other than the one being varied (their values are irrelevant for the term-wise contribution). |
fit |
A fitted object of class |
term |
A character string naming the model term/variable. |
n |
Number of points at which to evaluate the smooth over the range of its covariate. |
conf_level |
The confidence level for the pointwise confidence interval. |
... |
Further arguments (currently unused). |
Extract the partial effects of univariate smooth model terms
Description
Creates, for each requested univariate smooth, a sequence over the range of the
smooth's numeric covariate, evaluates the term-wise contribution via
predict(fit, newdata = ., type = "terms") and stacks the results into a
tidy data frame.
Usage
get_terms(data, fit, terms = NULL, ...)
Arguments
data |
A data frame containing variables used to fit the model. The first row is used as the basis for all covariates other than the one being varied (their values are irrelevant for the term-wise contribution). |
fit |
A fitted object of class |
terms |
A character vector (can be length one) specifying the terms for
which partial effects will be returned. If |
... |
Further arguments controlling extraction, passed on per term, e.g.
|
Details
For gam fits the requested terms are matched against
the model's smooths (see get_smooth_terms): a bare variable name
(e.g. "tend") selects every univariate smooth over that variable
– the main effect s(tend) as well as any s(tend, by = ...) or
factor-smooth interaction – while an exact smooth label (e.g. "s(tend)")
selects a single smooth. Names that do not match any smooth (for example
parametric factor main effects) are skipped with a warning; use
gg_fixed for those. For factor-indexed smooths one curve per
factor level is returned, identified by the level column.
For models without mgcv smooth metadata (e.g. coxph)
terms must be supplied and is matched against the columns of
predict(type = "terms").
Value
A tibble with columns term, x, level, eff,
se, ci_lower and ci_upper.
Examples
library(survival)
fit <- coxph(Surv(time, status) ~ pspline(karno) + pspline(age), data=veteran)
terms_df <- veteran %>% get_terms(fit, terms = c("karno", "age"))
head(terms_df)
tail(terms_df)
Forrest plot of fixed coefficients
Description
Given a model object, returns a data frame with columns variable,
coef (coefficient), ci_lower (lower 95\
ci_upper (upper 95\
Usage
gg_fixed(x, intercept = FALSE, ...)
Arguments
x |
A model object. |
intercept |
Logical, indicating whether intercept term should be included.
Defaults to |
... |
Currently not used. |
See Also
Examples
g <- mgcv::gam(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=iris)
gg_fixed(g, intercept=TRUE)
gg_fixed(g)
Plot Lag-Lead windows
Description
Given data defining a Lag-lead window, returns respective plot as a
ggplot2 object.
Usage
gg_laglead(x, ...)
## Default S3 method:
gg_laglead(x, tz, ll_fun, ...)
## S3 method for class 'LL_df'
gg_laglead(
x,
high_col = "grey20",
low_col = "whitesmoke",
grid_col = "lightgrey",
...
)
## S3 method for class 'nested_fdf'
gg_laglead(x, ...)
Arguments
x |
Either a numeric vector of follow-up cut points or a suitable object. |
... |
Further arguments passed to methods. |
tz |
A vector of exposure times |
ll_fun |
Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure. |
high_col |
Color used to highlight exposure times within the lag-lead window. |
low_col |
Color of exposure times outside the lag-lead window. |
grid_col |
Color of grid lines. |
See Also
get_laglead
Examples
## Example 1: supply t, tz, ll_fun directly
gg_laglead(1:10, tz=-5:5,
ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
## Example 2: extract information on t, tz, ll_from data with respective attributes
data("simdf_elra", package = "pammtools")
gg_laglead(simdf_elra)
Visualize effect estimates for specific covariate combinations
Description
Depending on the plot function and input, creates either a 1-dimensional slices, bivariate surface or (1D) cumulative effect.
Usage
gg_partial(data, model, term, ..., reference = NULL, ci = TRUE)
gg_partial_ll(
data,
model,
term,
...,
reference = NULL,
ci = FALSE,
time_var = "tend"
)
get_partial_ll(
data,
model,
term,
...,
reference = NULL,
ci = FALSE,
time_var = "tend"
)
Arguments
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
reference |
If specified, should be a list with covariate value pairs,
e.g. |
ci |
Logical. Indicates if confidence intervals for the |
time_var |
The name of the variable that was used in |
Plot Normal QQ plots for random effects
Description
Plot Normal QQ plots for random effects
Usage
gg_re(x, ...)
Arguments
x |
a fitted |
... |
Further arguments passed to |
See Also
Examples
library(pammtools)
data("patient")
ped <- patient %>%
dplyr::slice(1:100) %>%
as_ped(Surv(Survdays, PatientDied)~ ApacheIIScore + CombinedicuID, id="CombinedID")
pam <- mgcv::gam(ped_status ~ s(tend) + ApacheIIScore + s(CombinedicuID, bs="re"),
data=ped, family=poisson(), offset=offset)
gg_re(pam)
plot(pam, select = 2)
Plot 1D (smooth) effects
Description
Flexible, high-level plotting function for (non-linear) effects conditional on further covariate specifications and potentially relative to a comparison specification.
Usage
gg_slice(data, model, term, ..., reference = NULL, ci = TRUE)
Arguments
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
reference |
If specified, should be a list with covariate value pairs,
e.g. |
ci |
Logical. Indicates if confidence intervals for the |
Examples
ped <- tumor[1:200, ] %>% as_ped(Surv(days, status) ~ . )
model <- mgcv::gam(ped_status~s(tend) + s(age, by = complications), data=ped,
family = poisson(), offset=offset)
make_newdata(ped, age = seq_range(age, 20), complications = levels(complications))
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications))
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications),
ci = FALSE)
gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications),
reference=list(age = 50))
Plot smooth 1d terms of gam objects
Description
Given a gam model this convenience function returns a plot of its univariate
smooth terms. If terms is not specified, all univariate smooths are
plotted; otherwise only the requested ones (see get_terms for how
terms are matched). Different smooths are faceted. Smooths that are indexed by a
factor – a factor by-variable or a factor-smooth interaction
(bs = "fs"/"sz") – are drawn in a single facet with one
coloured/filled curve per factor level.
Usage
gg_smooth(x, ...)
## Default S3 method:
gg_smooth(x, fit, ...)
Arguments
x |
A data frame or object of class |
... |
Further arguments passed to |
fit |
A model object. |
Value
A ggplot object.
See Also
get_terms
Examples
g1 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width) + s(Petal.Length), data=iris)
gg_smooth(iris, g1, terms=c("Sepal.Width", "Petal.Length"))
# all univariate smooths (terms omitted)
gg_smooth(iris, g1)
# factor-by smooth: one coloured curve per Species
g2 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width, by = Species), data = iris)
gg_smooth(iris, g2, terms = "Sepal.Width")
Plot State Occupation Probabilities
Description
Creates a stacked area plot of state occupation probabilities over time, computed from transition probability matrices stored as an attribute of the input data. Optionally facets by a grouping variable.
Usage
gg_state_occupation(
newdata,
init_state,
group_var = NULL,
time_var = "tend",
ncol = NULL
)
Arguments
newdata |
A data frame with an attribute |
init_state |
A numeric vector specifying the initial state distribution.
Should sum to 1 and have length equal to the number of states. For example,
|
group_var |
A character string giving the name of the column in
|
time_var |
A character string giving the name of the time variable in
|
ncol |
An integer specifying the number of columns in the facet wrap.
If |
Value
A ggplot object showing stacked-area state occupation
probabilities over time, optionally faceted by group_var.
Plot tensor product effects
Description
Given a gam model this convenience function returns a ggplot2 object
depicting 2d smooth terms specified in the model as heat/contour plots. If
more than one 2d smooth term is present individual terms are faceted.
Usage
gg_tensor(x, ci = FALSE, ...)
Arguments
x |
a fitted |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
... |
Further arguments passed to |
See Also
Examples
g <- mgcv::gam(Sepal.Length ~ te(Sepal.Width, Petal.Length), data=iris)
gg_tensor(g)
gg_tensor(g, ci=TRUE)
gg_tensor(update(g, .~. + te(Petal.Width, Petal.Length)))
Checks if data contains timd-dependent covariates
Description
Checks if data contains timd-dependent covariates
Usage
has_tdc(data, id_var)
Arguments
data |
A data frame (potentially) containing time-dependent covariates. |
id_var |
A character indicating the grouping variable. For each covariate
it will be checked if their values change within a group specified by
|
Value
Logical. TRUE if data contains time-dependent covariates, else FALSE.
Analytic hazard with confidence interval (coefficient models)
Description
Adds a hazard column and, for ci_type "default"/
"delta", se/ci_lower/ci_upper, using the
linear-predictor triplet make_X/get_coefs/get_Vp. This is
the analytic CI path (also used to evaluate reference hazard ratios and
type = "link"). Simulation-based CIs instead use the
get_hazard + sim_hazard primitives.
Usage
hazard_ci(
object,
newdata,
reference = NULL,
ci = TRUE,
type = c("response", "link"),
ci_type = c("default", "delta", "sim"),
time_var = NULL,
se_mult = 2,
...
)
Arguments
object |
a fitted |
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
reference |
A data frame with number of rows equal to |
ci |
|
type |
Either |
ci_type |
The method by which standard errors/confidence intervals
will be calculated. Default transforms the linear predictor at
respective intervals. |
time_var |
Name of the variable used for the baseline hazard. Defaults
to |
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
... |
Further arguments passed to |
Build the subject-by-interval prediction grid used for IC imputation
Description
Constructs the (subjects \times intervals) grid on the fixed
cut-points and evaluates the lpmatrix of the fitted PAMM once, so that
across imputations only the linear predictor (and hence the hazard) needs to
be recomputed for a new coefficient draw. Rows are subject-major: the first
n_int rows belong to subject 1, the next n_int to subject 2,
etc., so that matrix(h, nrow = n_int) has one column per subject.
Usage
ic_pred_cache(object, ic, cut, cause_levels = NULL, cause_var = "cause")
Arguments
object |
A fitted |
ic |
A data frame as returned by |
cut |
The fixed vector of interval cut-points (shared across imputations). |
cause_levels |
Optional character vector of competing-risk cause levels.
When supplied, one |
cause_var |
Name of the cause column expected by the model. |
Value
A list with the interval information ii, n_int,
n_sub, and either a single design matrix X or a list
X_list (competing risks).
Draw event times and causes for interval-censored competing-risks subjects
Description
Draws the event time from the all-cause conditional hazard within
(L, R] (as in impute_ic_times) and assigns a cause. If the
cause is observed it is retained (the time is then drawn by a rejection step
so that it follows the cause-specific conditional density); if the cause is
unknown it is sampled with probability h_k(T)/h_\bullet(T) at the imputed
time, mirroring the cause-assignment in sim_pexp and the CIF
increment in get_cif.
Usage
impute_ic_cr(object, ic, cut, beta = NULL, cache = NULL, cause_known = NULL)
Arguments
object |
A fitted |
ic |
A data frame as returned by |
cut |
The fixed vector of interval cut-points (shared across imputations). |
beta |
Coefficient vector to evaluate the hazard at. Defaults to
|
cache |
Optional pre-built cache from |
cause_known |
Optional vector (length |
Value
A list with numeric time and character cause (both
length nrow(ic); cause is NA for censored rows).
Draw event times for interval-censored subjects from the conditional hazard
Description
For a fitted PAMM with piecewise-constant hazard, draws
T_i \sim p(T \mid L_i < T \le R_i, x_i, \theta) by inverting the
cumulative-hazard increment between L_i and R_i. Exact and
right-censored observations are returned unchanged (right-censored subjects
are not imputed: they contribute correctly as censored at ic_L).
Usage
impute_ic_times(object, ic, cut, beta = NULL, cache = NULL)
Arguments
object |
A fitted |
ic |
A data frame as returned by |
cut |
The fixed vector of interval cut-points (shared across imputations). |
beta |
Coefficient vector to evaluate the hazard at. Defaults to
|
cache |
Optional pre-built cache from |
Value
Numeric vector of (possibly imputed) event times, length
nrow(ic).
Create start/end times and interval information
Description
Given interval breaks points, returns data frame with information on
interval start time, interval end time, interval length and a factor
variable indicating the interval (left open intervals). If an object of class
ped is provided, extracts unique interval information from object.
Usage
int_info(x, ...)
## Default S3 method:
int_info(x, min_time = 0L, ...)
## S3 method for class 'data.frame'
int_info(x, min_time = 0L, ...)
## S3 method for class 'ped'
int_info(x, ...)
## S3 method for class 'pamm'
int_info(x, ...)
Arguments
x |
A numeric vector of cut points in which the follow-up should be
partitioned in or object of class |
... |
Currently ignored. |
min_time |
Only intervals that have lower borders larger than this value will be included in the resulting data frame. |
Value
A data frame containing the start and end times of the
intervals specified by the x argument. Additionally, the interval
length, interval mid-point and a factor variable indicating the intervals.
See Also
as_ped ped_info
Examples
## create interval information from cut points
int_info(c(1, 2.3, 5))
## extract interval information used to create ped object
tdf <- data.frame(time=c(1, 2.3, 5), status=c(0, 1, 0))
ped <- tdf %>% as_ped(Surv(time, status)~., id="id")
int_info(ped)
Detect, parse and transform interval-censored survival data
Description
Interval-censored (IC) data record the event time of subject i only up
to an interval (L_i, R_i]. pammtools handles such data via
multiple imputation (MI): exact event times are repeatedly drawn from the
model-based conditional distribution and the resulting (exact) data sets are
transformed and re-fit using the standard right-censored PAMM pipeline (see
pamm_ic). The functions documented here implement the
preprocessing building blocks of that workflow.
Usage
detect_ic(formula, data)
parse_ic_surv(formula, data, id = "id")
resolve_ic_cut(ic, cut = NULL, max_time = NULL)
ic_event_data(ic, t_imp)
drop_zero_followup(evd, warn = TRUE)
as_ped_ic(data, formula, cut = NULL, max_time = NULL, id = "id", ...)
Arguments
formula |
A two-sided formula whose left-hand side is an interval-
censored |
data |
A data frame containing the variables referenced in
|
id |
Name of the subject identifier column. If it does not exist in
|
ic |
A data frame as returned by |
cut |
Optional numeric vector of interval cut-points. If |
max_time |
Optional numeric scalar; cut-points are capped at this value. |
t_imp |
Numeric vector of imputed event times (length |
evd |
A data frame with a |
warn |
Logical; emit a one-time warning when rows are dropped. |
... |
Further arguments passed to the |
Details
IC data are specified through the standard survival interface, i.e. a
three-argument response of the form Surv(L, R, type = "interval2").
The four observation types are encoded as usual:
- exact
L = R(known event time).- right-censored
R = \infty(event afterL).- left-censored
L = 0(event in(0, R]).- interval-censored
0 < L < R < \infty(event in(L, R]).
Functions
-
detect_ic(): Detect whetherformulaspecifies interval-censored data. Returns"interval2"for interval-censored responses and"none"otherwise (right-censored and left-truncated counting-process responses both return"none"and are handled by the standard pipeline). -
parse_ic_surv(): Parse the interval-censored response into a tibble of lower/upper bounds and observation type, augmentingdatawith the columnsic_L,ic_Randic_kind(a factor with levelsexact,right,left,interval) and, if absent, anidcolumn. -
resolve_ic_cut(): Resolve a fixed vector of interval cut-points for the IC transformation. Whencutis supplied it is sorted and de-duplicated; otherwise the unique finite interval endpoints (the inspection times) are used, capped atmax_time. The resolvedcutmust be shared across all imputations so that the PED interval structure is consistent across refits. Note thatmgcv's centering constraints can still make thelpmatrixdiffer by fit. -
ic_event_data(): Build the subject-level data frame of (exact) event times implied by an imputation. Exact observations keep their event time; right-censored observations are censored atic_L; left- and interval-censored observations take the imputed timet_imp. Returns a data frame with the response columns.ped_timeand.ped_statusready forsplit_datavia a two-argumentSurv. -
drop_zero_followup(): Drop subjects with non-positive follow-up time (e.g.\ right-censored at time 0 with no observed inspection), which carry no information and would break the interval split. Returns the filtered data. -
as_ped_ic(): Transform interval-censored data into an initial (midpoint-imputed) PED object. Left- and interval-censored event times are initialised at the interval midpoint ((L+R)/2, andR/2for left-censored observations); this object is only an initialiser forpamm_icand should not be used for inference on its own. The parsed interval bounds and the resolved cut-points are attached as the"ic"and"breaks"attributes.
Create design matrix from a suitable object
Description
Create design matrix from a suitable object
Usage
make_X(object, ...)
## Default S3 method:
make_X(object, newdata, ...)
## S3 method for class 'gam'
make_X(object, newdata, ...)
Arguments
object |
A suitable object from which a design matrix can be generated. Often a model object. |
newdata |
A data frame from which design matrix will be constructed |
Create design matrix from a suitable object
Description
Create design matrix from a suitable object
Usage
## S3 method for class 'scam'
make_X(object, newdata, ...)
Arguments
object |
A suitable object from which a design matrix can be generated. Often a model object. |
newdata |
A data frame from which design matrix will be constructed |
Construct a data frame suitable for prediction
Description
This functions provides a flexible interface to create a data set that
can be plugged in as newdata argument to a suitable predict
function (or similar).
The function is particularly useful in combination with one of the
add_* functions, e.g., add_term,
add_hazard, etc.
Usage
make_newdata(x, ...)
## Default S3 method:
make_newdata(x, ...)
## S3 method for class 'ped'
make_newdata(x, ...)
## S3 method for class 'fped'
make_newdata(x, ...)
Arguments
x |
A data frame (or object that inherits from |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
Details
Depending on the type of variables in x, mean or modus values
will be used for variables not specified in ellipsis
(see also sample_info). If x is an object
that inherits from class ped, useful data set completion will be
attempted depending on variables specified in ellipsis. This is especially
useful, when creating a data set with different time points, e.g. to
calculate survival probabilities over time (add_surv_prob)
or to calculate a time-varying covariate effects (add_term).
To do so, the time variable has to be specified in ..., e.g.,
tend = seq_range(tend, 20). The problem with this specification is that
not all values produced by seq_range(tend, 20) will be actual values
of tend used at the stage of estimation (and in general, it will
often be tedious to specify exact tend values). make_newdata
therefore finds the correct interval and sets tend to the respective
interval endpoint. For example, if the intervals of the PED object are
(0,1], (1,2] then tend = 1.5 will be set to 2.
The returned data frame contains tend, id, the user-supplied
covariates (and cause/transition for competing risks /
multi-state models). Internal PED columns tstart, intlen,
interval, offset, and ped_status are dropped.
Downstream add_* functions reconstruct intlen on demand via
reconstruct_intlen() when needed.
See examples below.
Examples
# General functionality
tumor %>% make_newdata()
tumor %>% make_newdata(age=c(50))
tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55))
tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55))
# mean/modus values of unspecified variables are calculated over whole data
tumor %>% make_newdata(sex=unique(sex))
tumor %>% group_by(sex) %>% make_newdata()
# Examples for PED data
ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000))
ped %>% make_newdata(age=c(50, 55))
# if time information is specified, other time variables will be specified
# accordingly and offset calculated correctly
ped %>% make_newdata(tend = c(1000), age = c(50, 55))
ped %>% make_newdata(tend = unique(tend))
ped %>% group_by(sex) %>% make_newdata(tend = unique(tend))
# tend is set to the end point of respective interval:
ped <- tumor %>% as_ped(Surv(days, status)~.)
seq_range(ped$tend, 3)
make_newdata(ped, tend = seq_range(tend, 3))
Create matrix components for cumulative effects
Description
These functions are called internally by get_cumulative and
should usually not be called directly.
Usage
make_time_mat(data, nz)
make_latency_mat(data, tz)
make_lag_lead_mat(data, tz, ll_fun = function(t, tz) t >= tz)
make_z_mat(data, z_var, nz, ...)
Arguments
data |
A data set (or similar) from which meta information on cut-points, interval-specific time, covariates etc. can be obtained. |
z_var |
Which should be transformed into functional covariate format
suitable to fit cumulative effects in |
Calculate the modus
Description
Calculate the modus
Usage
modus(var)
Arguments
var |
A atomic vector |
Create nested data frame from data with time-dependent covariates
Description
Provides methods to nest data with time-dependent covariates (TDCs).
A formula must be provided where the right hand side (RHS) contains
the structure of the TDCs
Usage
nest_tdc(data, formula, ...)
## Default S3 method:
nest_tdc(data, formula, ...)
## S3 method for class 'list'
nest_tdc(data, formula, ...)
Arguments
data |
A suitable data structure (e.g. unnested data frame with
concurrent TDCs or a list where each element is a data frame, potentially
containing TDCs as specified in the RHS of |
formula |
A two sided formula with a two part RHS, where the second part indicates the structure of the TDC structure. |
... |
Further arguments passed to methods. |
Time until nuclear power plant construction in different regions.
Description
This dataset originates from IAEA and contains 730 power. The data contains the following variables:
- months
Construction time
- status
Event indicator (0 = censored, 1 = construction finished).
- region
Continent, Africa/Asia, America, Europe, Soviet Union and Warsaw Pact
Usage
nuclear
Format
An object of class data.frame with 724 rows and 3 columns.
Fit a piece-wise exponential additive model
Description
A thin wrapper around gam, however, some arguments are
prespecified:
family=poisson() and offset=data$offset.
These two can not be overwritten. In many cases it will also be advisable to
set method="REML".
Usage
pamm(formula, data = list(), ..., trafo_args = NULL, engine = "gam")
is.pamm(x)
## S3 method for class 'pamm'
print(x, ...)
## S3 method for class 'pamm'
summary(object, ...)
## S3 method for class 'pamm'
plot(x, ...)
Arguments
formula |
A GAM formula, or a list of formulae (see |
data |
A data frame or list containing the model response variable and
covariates required by the formula. By default the variables are taken
from |
... |
Further arguments passed to |
trafo_args |
Deprecated. A named list passed to |
engine |
Character name of the function that will be called to fit the
model. The intended entries are |
x |
Any R object. |
object |
An object of class |
See Also
Examples
ped <- tumor[1:100, ] %>%
as_ped(Surv(days, status) ~ complications, cut = seq(0, 3000, by = 50))
pam <- pamm(ped_status ~ s(tend) + complications, data = ped)
summary(pam)
## Deprecated: trafo_args inline transformation (use as_ped() instead)
# ped2 <- as_ped(tumor[1:100, ], Surv(days, status) ~ complications)
# pamm(ped_status ~ s(tend) + complications, data = ped2)
Fit a competing-risks PAMM to interval-censored data via multiple imputation
Description
Competing-risks extension of pamm_ic. The event time is drawn
from the all-cause conditional hazard within (L, R] and a cause is
assigned: observed causes are retained (with the time drawn so that it follows
the cause-specific conditional density, via rejection), unknown causes are
sampled with probability proportional to the cause-specific hazards at the
imputed time (see impute_ic_cr). Each completed data set is
transformed with as_ped_cr (cause-specific hazards) and re-fit.
Cf. Delord & Genin (2016) for MI of interval-censored competing-risks data.
Usage
pamm_ic_cr(
formula,
data,
cause,
model_formula = NULL,
cut = NULL,
max_time = NULL,
m = 10L,
iter = 1L,
censor_code = 0L,
id = "id",
engine = "gam",
...
)
Arguments
formula |
A two-sided formula whose left-hand side is an interval-censored
response |
data |
A data frame in standard (one row per subject) format. |
cause |
Name of the column in |
model_formula |
Optional model formula passed to |
cut |
Optional fixed vector of interval cut-points shared across all
imputations. If |
max_time |
Optional cap on the cut-points. |
m |
Number of imputations (default 10). |
iter |
Number of impute-refit iterations per imputation chain (default
|
censor_code |
Value of |
id |
Name of the subject identifier column. |
engine |
Estimation engine passed to |
... |
Further arguments passed to |
Value
An object of class pamm_ic with type = "cr"; fits
are cause-specific (stacked ped_cr) pamm objects and
cause_levels records the competing causes.
See Also
Pooling of multiple-imputation PAMM fits
Description
Inference for interval-censored PAMMs (pamm_ic) pools the
m re-fits by drawing from each fit's empirical-Bayes posterior
N(\hat\beta^{(m)}, V_\beta^{(m)}) and propagating every draw through the
quantity of interest using that fit's own design matrix, then taking
empirical quantiles of the combined draws. Because mgcv's identifiability
constraints make the (centered) spline basis depend on each imputed data set,
the design matrix is not shared across fits, so each fit must be
evaluated with its own lpmatrix. Before empirical quantiles are taken,
the per-fit prediction draws are shifted on the quantity-of-interest scale so
their between-imputation component has Rubin's finite-m variance
(1 + 1/M)B rather than the raw mixture variance
(M - 1)B/M. Point estimates are the average of the per-fit point
estimates (the MI estimate).
Details
These methods are dispatched automatically by add_hazard,
add_cumu_hazard, add_surv_prob and
add_cif when given a pamm_ic object.
Parse the factor level from a factor-by smooth label
Description
Fallback for the rare case where by.level is unavailable. A label such as
"s(tend):metastasesyes" encodes the level ("yes") as the suffix of
the by-variable name ("metastases").
Usage
parse_by_level(label, by, lvls)
Survival data of critically ill ICU patients
Description
A data set containing the survival time (or hospital release time) among other covariates. The full data is available here. The following variables are provided:
- Year
The year of ICU Admission
- CombinedicuID
Intensive Care Unit (ICU) ID
- CombinedID
Patient identificator
- Survdays
Survival time of patients. Here it is assumed that patients survive until t=30 if released from hospital.
- PatientDied
Status indicator; 1=death, 0=censoring
- survhosp
Survival time in hospital. Here it is assumed that patients are censored at time of hospital release (potentially informative)
- Gender
Male or female
- Age
The patients age at Admission
- AdmCatID
Admission category: medical, surgical elective or surgical emergency
- ApacheIIScore
The patient's Apache II Score at Admission
- BMI
Patient's Body Mass Index
- DiagID2
Diagnosis at admission in 9 categories
Usage
patient
Format
An object of class data.frame with 2000 rows and 12 columns.
Extract interval information and median/modus values for covariates
Description
Given an object of class ped, returns data frame with one row for each
interval containing interval information, mean values for numerical
variables and modus for non-numeric variables in the data set.
Usage
ped_info(ped)
## S3 method for class 'ped'
ped_info(ped)
Arguments
ped |
An object of class |
Value
A data frame with one row for each unique interval in ped.
See Also
Examples
ped <- tumor[1:4,] %>% as_ped(Surv(days, status)~ sex + age)
ped_info(ped)
Pool a list of (stripped) imputation fits into a pooled summary object
Description
Combines the m imputation fits with Rubin's rules: the pooled
parametric coefficients use \bar Q and
V = \bar W + (1 + 1/m) B. Smooth-term p-values are pooled with the
median-p rule (see references in strip_pamm_fit). Because
mgcv smooth coefficients can use different centered bases across
imputations, this returns a plain pooled summary object rather than a
gam: add_*() methods evaluate every imputation fit with its own
design matrix for predictions.
Usage
pool_pamm_fits(fits, smry, skeleton = NULL)
Arguments
fits |
List of stripped imputation fits. |
smry |
List of |
skeleton |
Optional full (unstripped) fit used only to supply a common training-grid model frame for smooth-term FMI summaries. |
S3 method for pamm objects for compatibility with package pec
Description
S3 method for pamm objects for compatibility with package pec
Usage
## S3 method for class 'pamm'
predictSurvProb(object, newdata, times, ...)
Arguments
object |
A fitted model from which to extract predicted survival probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted survival probabilities. |
times |
A vector of times in the range of the response variable, e.g. times when the response is a survival object, at which to return the survival probabilities. |
... |
Additional arguments that are passed on to the current method. |
Extract information on concurrent effects
Description
Extract information on concurrent effects
Usage
prep_concurrent(x, formula, ...)
## S3 method for class 'list'
prep_concurrent(x, formula, ...)
Arguments
x |
A suitable object from which variables contained in
|
... |
Further arguments passed to methods. |
Fit a PAMM to interval-censored data via multiple imputation
Description
Fits a piecewise exponential additive (mixed) model to interval-censored
time-to-event data using a multiple-imputation (MI) and re-fit strategy: exact
event times are repeatedly drawn from the model-based conditional distribution
p(T \mid L < T \le R, x, \theta) (see impute_ic_times),
with \theta drawn from the imputation model's asymptotic posterior
before each imputation ("proper" MI – this is what makes the pooled
intervals calibrated),
each completed data set is transformed to PED format with the standard
(right-censored) pipeline and re-fit, and the resulting fits are pooled for
inference with the existing add_* family (see add_surv_prob
and the pamm_ic methods).
Usage
## S3 method for class 'pamm_ic'
print(x, ...)
## S3 method for class 'pamm_ic'
summary(object, ...)
## S3 method for class 'summary.pamm_ic'
print(x, ...)
pamm_ic(
formula,
data,
model_formula = NULL,
cut = NULL,
max_time = NULL,
m = 10L,
iter = 1L,
init = c("midpoint", "uniform"),
id = "id",
engine = "gam",
...
)
Arguments
x, object |
A |
... |
Further arguments passed to |
formula |
A two-sided formula whose left-hand side is an interval-censored
response |
data |
A data frame in standard (one row per subject) format. |
model_formula |
Optional model formula passed to |
cut |
Optional fixed vector of interval cut-points shared across all
imputations. If |
max_time |
Optional cap on the cut-points. |
m |
Number of imputations (default 10). |
iter |
Number of impute-refit iterations per imputation chain (default
|
init |
Initialiser for the first fit: |
id |
Name of the subject identifier column. |
engine |
Estimation engine passed to |
Details
An imputed event time is an exact event time, so once imputation has produced
it, the entire downstream pipeline (split_data -> pamm
-> add_*) is reused unchanged. The interval cut-points are resolved once
and shared across all imputations, but mgcv's smooth bases and
centering constraints can still differ across completed data sets. Pooled
predictions therefore evaluate each fitted imputation model with its own
design matrix; object$pooled is a summary container, not a
gam-like model for direct predict() or plot() calls.
Value
An object of class pamm_ic: a list with
fitsthe
mimputation fits, each slimmed (viastrip_pamm_fit) to drop per-observation slots so memory does not scale with the number of imputations; they still supportcoef,vcovandpredict(type = "lpmatrix"), which is all the pooledadd_*methods need.pooleda pooled summary container with Rubin-pooled parametric coefficients and covariance, pooled parametric/smooth tables with median-p values (
$p.table,$s.table), parametric coefficient FMI diagnostics ($fmi.table) and smooth-term FMI five-number summaries over the training grid ($smooth.fmi).init_fitthe (slimmed) initialiser/imputation model.
unstable_chainsindices of imputation chains flagged as numerically unstable (extreme coefficients or coefficient SEs on the log-hazard scale; also raised as a
warning). Degenerate chains can arise – silently, withoutmgcvwarnings – when iterating flexible time-varying models on small samples.- others
the parsed bounds
ic, the sharedcut, and metadata.
print/summary report the pooled summary; add_* compute
pooled quantities of interest from fits.
See Also
impute_ic_times, add_surv_prob,
strip_pamm_fit
Ensure all breakpoints are present in newdata for cumulative calculations
Description
Checks whether all cut points up to the maximum observed time are present in
newdata. If not, expands the data frame to include the missing
breakpoints via expand_df. In either case the function guarantees that
an interval-length column (default intlen) exists on return.
Existing grouping is preserved after expansion.
Usage
reconstruct_cutpoints(newdata, object, time_var, interval_length)
Arguments
newdata |
A data frame with a time column and, optionally, grouping.
Must carry the |
object |
A fitted PAM/PAMM model object, passed to |
time_var |
Character name of the time variable (e.g. |
interval_length |
Character name of the interval-length column. If
absent from |
Value
A data frame with all required breakpoints present and an
interval_length column guaranteed to exist.
Reconstruct intlen from time variable and stored cut points
Description
Computes interval lengths from the sorted unique values of the time variable
in newdata. This is used by add_* functions that need intlen for cumulative
calculations. If tstart is not available, the first interval length is
taken as the first sorted time value (implicitly assuming a 0-origin time
scale).
Usage
reconstruct_intlen(newdata, time_var = "tend", interval_length = "intlen")
Arguments
newdata |
A data frame with a time column (default |
time_var |
Character name of the time variable. Defaults to
|
interval_length |
Character name of the interval-length column to create. |
Value
The input data frame with an intlen column added.
Resolve requested terms against the model's plottable smooths
Description
Resolve requested terms against the model's plottable smooths
Usage
resolve_terms(smooth_tbl, terms)
Arguments
smooth_tbl |
Output of |
terms |
A character vector of requested terms, or |
Draw random numbers from piece-wise exponential distribution.
Description
This is a copy of the same function from rpexp from package
msm.
Copied here to reduce dependencies.
Usage
rpexp(n = 1, rate = 1, t = 0)
Arguments
n |
number of observations. If |
rate |
vector of rates. |
t |
vector of the same length as |
Draw coefficients from their approximate posterior distribution
Description
Simulation based confidence intervals are calculated by drawing coefficient
vectors from their asymptotic (posterior) distribution, a multivariate
normal with mean get_coefs and covariance get_Vp.
For scam models this means that draws are obtained on the scale of
the re-parametrized (partially exponentiated) coefficients, i.e., based on
the same normal approximation that underlies the reported standard errors of
the model (the exact posterior of the constrained coefficients is not
Gaussian, so individual draws may violate the shape constraints slightly).
Usage
sample_coefs(object, nsim, ...)
## Default S3 method:
sample_coefs(object, nsim, ...)
Arguments
object |
A fitted model object. |
nsim |
Number of draws. |
... |
Further arguments passed to methods. |
Value
A matrix with nsim rows, one coefficient vector per row, on
the scale of the design matrix returned by make_X.
Extract information of the sample contained in a data set
Description
Given a data set and grouping variables, this function returns mean values
for numeric variables and modus for characters and factors. Usually
this function should not be called directly but will rather be called
as part of a call to make_newdata.
Usage
sample_info(x)
## S3 method for class 'data.frame'
sample_info(x)
## S3 method for class 'ped'
sample_info(x)
## S3 method for class 'fped'
sample_info(x)
Arguments
x |
A data frame (or object that inherits from |
Value
A data frame containing sample information (for each group).
If applied to an object of class ped, the sample means of the
original data is returned.
Note: When applied to a ped object, that doesn't contain covariates
(only interval information), returns data frame with 0 columns.
Generate a sequence over the range of a vector
Description
Stolen from here
Usage
seq_range(x, n, by, trim = NULL, expand = NULL, pretty = FALSE)
Arguments
x |
A numeric vector |
n, by |
Specify the output sequence either by supplying the
length of the sequence with I recommend that you name these arguments in order to make it clear to the reader. |
trim |
Optionally, trim values off the tails.
|
expand |
Optionally, expand the range by |
pretty |
If |
Examples
x <- rcauchy(100)
seq_range(x, n = 10)
seq_range(x, n = 10, trim = 0.1)
seq_range(x, by = 1, trim = 0.1)
# Make pretty sequences
y <- runif (100)
seq_range(y, n = 10)
seq_range(y, n = 10, pretty = TRUE)
seq_range(y, n = 10, expand = 0.5, pretty = TRUE)
seq_range(y, by = 0.1)
seq_range(y, by = 0.1, pretty = TRUE)
Draw hazard trajectories from a model's sampling distribution
Description
Internal seam used by the simulation-based confidence interval helpers
(get_sim_ci, get_sim_ci_cumu, get_sim_ci_surv).
It returns a matrix of nsim draws of the (response-scale) hazard, one
column per draw and one row per row of newdata. The default method
draws coefficient vectors via sample_coefs and evaluates the
linear predictor make_X(object, newdata) %*% z; other backends (e.g.
a bootstrap ensemble that has no coefficient covariance) can provide their own
method to obtain simulation-based intervals from the same machinery.
Usage
sim_hazard(object, newdata, nsim = 100L, ...)
## Default S3 method:
sim_hazard(object, newdata, nsim = 100L, ...)
Arguments
object |
A fitted model object. |
newdata |
A data frame for which hazards are predicted. |
nsim |
Number of draws. |
... |
Further arguments passed to methods. |
Value
A numeric matrix with nrow(newdata) rows and nsim
columns of hazard draws on the response scale. The draws are produced once for
the whole newdata, so the callers can share one set of draws across
groups by passing the full (grouped) data.
See Also
The package website (https://adibender.github.io/pammtools/)
has worked examples of implementing get_hazard and
sim_hazard for new estimation backends: the articles
“Defining a new backend: gradient boosting with xgboost” (a bootstrap
tree ensemble) and “Bayesian Baseline PAMMs” (a brms model,
drawing from the posterior).
Simulate survival times from the piece-wise exponential distribution
Description
Simulate survival times from the piece-wise exponential distribution
Usage
sim_pexp(formula, data, cut)
Arguments
formula |
An extended formula that specifies the linear predictor.
If you want to include a smooth baseline
or time-varying effects, use |
data |
A data set with variables specified in |
cut |
A sequence of time-points starting with 0. |
Examples
library(survival)
library(dplyr)
library(pammtools)
# set number of observations/subjects
n <- 250
# create data set with variables which will affect the hazard rate.
df <- cbind.data.frame(x1 = runif (n, -3, 3), x2 = runif (n, 0, 6)) %>%
as_tibble()
# the formula which specifies how covariates affet the hazard rate
f0 <- function(t) {
dgamma(t, 8, 2) *6
}
form <- ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)
set.seed(24032018)
sim_df <- sim_pexp(form, df, 1:10)
head(sim_df)
plot(survfit(Surv(time, status)~1, data = sim_df ))
# for control, estimate with Cox PH
mod <- coxph(Surv(time, status) ~ x1 + pspline(x2), data=sim_df)
coef(mod)[1]
layout(matrix(1:2, nrow=1))
termplot(mod, se = TRUE)
# and using PAMs
layout(1)
ped <- sim_df %>% as_ped(Surv(time, status)~., max_time=10)
library(mgcv)
pam <- gam(ped_status ~ s(tend) + x1 + s(x2), data=ped, family=poisson, offset=offset)
coef(pam)[2]
plot(pam, page=1)
## Not run:
# Example 2: Functional covariates/cumulative coefficients
# function to generate one exposure profile, tz is a vector of time points
# at which TDC z was observed
rng_z = function(nz) {
as.numeric(arima.sim(n = nz, list(ar = c(.8, -.6))))
}
# two different exposure times for two different exposures
tz1 <- 1:10
tz2 <- -5:5
# generate exposures and add to data set
df <- df %>%
add_tdc(tz1, rng_z) %>%
add_tdc(tz2, rng_z)
df
# define tri-variate function of time, exposure time and exposure z
ft <- function(t, tmax) {
-1*cos(t/tmax*pi)
}
fdnorm <- function(x) (dnorm(x,1.5,2)+1.5*dnorm(x,7.5,1))
wpeak2 <- function(lag) 15*dnorm(lag,8,10)
wdnorm <- function(lag) 5*(dnorm(lag,4,6)+dnorm(lag,25,4))
f_xyz1 <- function(t, tz, z) {
ft(t, tmax=10) * 0.8*fdnorm(z)* wpeak2(t - tz)
}
f_xyz2 <- function(t, tz, z) {
wdnorm(t-tz) * z
}
# define lag-lead window function
ll_fun <- function(t, tz) {t >= tz}
ll_fun2 <- function(t, tz) {t - 2 >= tz}
# simulate data with cumulative effect
sim_df <- sim_pexp(
formula = ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)|
fcumu(t, tz1, z.tz1, f_xyz=f_xyz1, ll_fun=ll_fun) +
fcumu(t, tz2, z.tz2, f_xyz=f_xyz2, ll_fun=ll_fun2),
data = df,
cut = 0:10)
## End(Not run)
Simulate data for competing risks scenario
Description
Simulate data for competing risks scenario
Usage
sim_pexp_cr(formula, data, cut)
Simulated data with cumulative effects
Description
This is data simulated using the sim_pexp function.
It contains two time-constant and two time-dependent covariates (observed
on different exposure time grids). The code used for simulation is
contained in the examples of ?sim_pexp.
Usage
simdf_elra
Format
An object of class nested_fdf (inherits from sim_df, tbl_df, tbl, data.frame) with 250 rows and 9 columns.
New basis for penalized lag selection
Description
Originally proposed in Obermeier et al., 2015, Flexible Distributed Lags for Modelling Earthquake Data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12077. Here extended in order to penalize lead times in addition to lag times. Ideally the lag-lead window would then be selected in a data-driven fashion. Treat as experimental.
Usage
## S3 method for class 'fdl.smooth.spec'
smooth.construct(object, data, knots)
Arguments
object |
An object handled by mgcv |
data |
The data set |
knots |
A vector of knots |
Turn a single mgcv smooth into zero or more curve specifications
Description
Turn a single mgcv smooth into zero or more curve specifications
Usage
smooth_term_rows(s, data)
Arguments
s |
A single smooth object from |
data |
A data frame containing the variables used to fit the model. |
Function to transform data without time-dependent covariates into piece-wise exponential data format
Description
Function to transform data without time-dependent covariates into piece-wise exponential data format
Usage
split_data(
formula,
data,
cut = NULL,
max_time = NULL,
multiple_id = FALSE,
...
)
Arguments
formula |
A two sided formula with a |
data |
Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed. |
cut |
Split points, used to partition the follow-up into intervals.
If unspecified, all unique event times will be used. For competing risks,
when |
max_time |
If |
multiple_id |
Are occurences of same id allowed (per transition).
Defaults to |
... |
Further arguments passed to the |
See Also
Split data to obtain recurrent event data in PED format
Description
Currently, the input data must be in start-stop notation for each spell and contain a colum that indicates the spell (event number).
Usage
split_data_multistate(
formula,
data,
transition = character(),
cut = NULL,
max_time = NULL,
event = 1L,
min_events = 1L,
timescale = c("gap", "calendar"),
...
)
Arguments
formula |
A two sided formula with a |
data |
Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed. |
transition |
A character indicating the column in data that indicates the event/episode number for recurrent events. |
cut |
Split points, used to partition the follow-up into intervals.
If unspecified, all unique event times will be used. For competing risks,
when |
max_time |
If |
event |
The value that encodes the occurrence of an event in the data set. |
min_events |
Minimum number of events for each event number. |
timescale |
Defines the timescale for the recurrent event data transformation.
Defaults to |
... |
Further arguments passed to the |
See Also
Time until staphylococcus aureaus infection in children, with possible recurrence
Description
This dataset originates from the Drakenstein child health study. The data contains the following variables:
- id
Randomly generated unique child ID
- t.start
The time at which the child enters the risk set for the $k$-th event
- t.stop
Time of $k$-th infection or censoring
.
- enum
Event number. Maximum of 6.
- hiv
Usage
staph
Format
An object of class tbl_df (inherits from tbl, data.frame) with 374 rows and 6 columns.
Slim down a fitted PAMM for storage inside a pamm_ic object
Description
Removes the per-observation slots (model frame, fitted values, residuals,
working weights, ...) and the call (which captures the full PED data),
none of which are needed for the downstream multiple-imputation pooling: the
pooled add_* methods only require each fit's coefficients,
Vp/Ve and the smooth/parametric structure used by
predict(type = "lpmatrix"). Stripping makes the stored size independent
of the data set size, so memory does not blow up with many imputations.
Usage
strip_pamm_fit(fit)
Arguments
fit |
A fitted |
Value
The same object with large per-observation slots removed; class and
everything needed for predict/coef/vcov are retained.
Extract fixed coefficient table from model object
Description
Given a model object, returns a data frame with columns variable,
coef (coefficient), ci_lower (lower 95\
ci_upper (upper 95\
Usage
tidy_fixed(x, ...)
## S3 method for class 'gam'
tidy_fixed(x, intercept = FALSE, ...)
## S3 method for class 'scam'
tidy_fixed(x, intercept = FALSE, ...)
## S3 method for class 'coxph'
tidy_fixed(x, ...)
Arguments
x |
A model object. |
... |
Currently not used. |
intercept |
Should intercept also be returned? Defaults to |
Examples
library(survival)
gc <- coxph(Surv(days, status)~age + sex, data = tumor)
tidy_fixed(gc)
Extract random effects in tidy data format.
Description
Extract random effects in tidy data format.
Usage
tidy_re(x, keep = c("fit", "main", "xlab", "ylab"), ...)
Arguments
x |
a fitted |
keep |
A vector of variables to keep. |
... |
Further arguments passed to |
See Also
Extract 1d smooth objects in tidy data format.
Description
Extract 1d smooth objects in tidy data format.
Usage
tidy_smooth(
x,
keep = c("x", "fit", "se", "xlab", "ylab"),
ci = TRUE,
conf_level = 0.95,
...
)
Arguments
x |
a fitted |
keep |
A vector of variables to keep. |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
conf_level |
Numeric scalar in (0, 1). Confidence level used for the
returned confidence intervals when |
... |
Further arguments passed to |
Extract 2d smooth objects in tidy format.
Description
Extract 2d smooth objects in tidy format.
Usage
tidy_smooth2d(
x,
keep = c("x", "y", "fit", "se", "xlab", "ylab", "main"),
ci = FALSE,
conf_level = 0.95,
...
)
Arguments
x |
a fitted |
keep |
A vector of variables to keep. |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
conf_level |
Numeric scalar in (0, 1). Confidence level used for the
returned confidence intervals when |
... |
Further arguments passed to |
Stomach area tumor data
Description
Information on patients treated for a cancer disease located in the stomach area. The data set includes:
- days
Time from operation until death in days.
- status
Event indicator (0 = censored, 1 = death).
- age
The subject's age.
- sex
The subject's sex (male/female).
- charlson_score
Charlson comorbidity score, 1-6.
- transfusion
Has subject received transfusions (no/yes).
- complications
Did major complications occur during operation (no/yes).
- metastases
Did the tumor develop metastases? (no/yes).
- resection
Was the operation accompanied by a major resection (no/yes).
Usage
tumor
Format
An object of class tbl_df (inherits from tbl, data.frame) with 776 rows and 9 columns.
Warn if new t_j are used
Description
Warn if new t_j are used
Usage
warn_about_new_time_points(object, newdata, ...)
## S3 method for class 'pamm'
warn_about_new_time_points(object, newdata, ...)
Warn if new t_j are used
Description
Warn if new t_j are used
Usage
## S3 method for class 'glm'
warn_about_new_time_points(object, newdata, time_var, ...)