--- title: "Simulate Correlated Progression-Free Survival and Overall Survival Using a Gumbel Copula" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulate Correlated Progression-Free Survival and Overall Survival Using a Gumbel Copula} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, cache.path = 'cache/simulatePfsAndOsGumbel/', comment = '#>', dpi = 300, out.width = '100%' ) ``` ```{r setup, echo = FALSE, message = FALSE} library(dplyr) library(knitr) library(ggplot2) library(TrialSimulator) if (!requireNamespace('copula', quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } ``` Progression-free survival (PFS) and overall survival (OS) are common time-to-event endpoints in oncology trials. Because progression or death defines a PFS event, simulated trial data should always satisfy $\mbox{PFS} \leq \mbox{OS}$. At the same time, the two endpoints are often highly correlated, so independent marginal simulation is usually not appropriate. This vignette describes the copula-based generator implemented in `TrialSimulator::CorrelatedPfsAndOs2()`. The function is intended for simulation settings where the user wants to specify three interpretable quantities: - the marginal median PFS, - the marginal median OS, - Kendall's tau between the observed, uncensored PFS and OS times. The method uses a Gumbel survival copula for a latent time to progression (TTP) and OS, and then defines $$ \mbox{PFS} = \min(\mbox{TTP}, \mbox{OS}). $$ This construction guarantees $\mbox{PFS} \leq \mbox{OS}$ for every simulated patient. It also keeps both marginal PFS and OS exponential, so the inputs can be specified directly through their medians. This is a useful distinction from the illness-death model in `TrialSimulator::CorrelatedPfsAndOs3()`. The illness-death model gives a clinically interpretable transition structure, but OS generated from that model can have a time-varying hazard ratio between treatment arms. Therefore, if OS will be analyzed using a proportional hazards Cox model and the simulation should be consistent with that analysis model, the copula-based method described here is often more appropriate. ## Model Let $X$ denote latent TTP and let $Y$ denote OS. The model assumes exponential marginal survival functions $$ \Pr(X > x) = \exp(-\lambda_X x), \qquad \Pr(Y > y) = \exp(-\lambda_Y y), $$ and a Gumbel--Hougaard survival copula $$ \Pr(X > x, Y > y) = \exp\left[ -\left\{(\lambda_X x)^\theta + (\lambda_Y y)^\theta\right\}^{1/\theta} \right], \qquad \theta \geq 1. $$ The observed PFS time is $$ P = \min(X, Y). $$ Under this model, OS is exponential with rate $\lambda_Y$. PFS is also exponential, with survival function $$ \Pr(P > t) = \Pr(X > t, Y > t) = \exp\left[ -\left\{(\lambda_X^\theta + \lambda_Y^\theta)t^\theta\right\}^{1/\theta} \right] = \exp(-\lambda_P t), $$ where $$ \lambda_P = \left(\lambda_X^\theta + \lambda_Y^\theta\right)^{1/\theta}. $$ Therefore, if the target medians are $m_P$ for PFS and $m_Y$ for OS, $$ \lambda_P = \frac{\log(2)}{m_P}, \qquad \lambda_Y = \frac{\log(2)}{m_Y}. $$ For a candidate value of $\theta$, the latent TTP rate is then $$ \lambda_X = \left(\lambda_P^\theta - \lambda_Y^\theta\right)^{1/\theta}. $$ This requires $m_P < m_Y$, as expected for PFS and OS. ## Choosing the Copula Parameter For the Gumbel copula, Kendall's tau between latent TTP and OS is $$ \tau(X,Y) = 1 - \frac{1}{\theta}. $$ However, `CorrelatedPfsAndOs2()` asks for Kendall's tau between the observed, uncensored endpoints $P=\min(X,Y)$ and $Y$, not between latent TTP and OS. These are not the same quantity. The formula below is derived in the appendix. Under the model above, $$ \tau(P,Y) = 1 - \frac{1}{\theta} \left[ 1 - \left(\frac{m_P}{m_Y}\right)^\theta \right]. $$ Equivalently, writing $\tau_X = \tau(X,Y)$ and $\theta = 1/(1-\tau_X)$, $$ \tau(P,Y) = 1 - (1-\tau_X) \left[ 1 - \left(\frac{m_P}{m_Y}\right)^{1/(1-\tau_X)} \right]. $$ `CorrelatedPfsAndOs2()` solves this equation numerically for the latent Kendall's tau $\tau_X$, converts it to $\theta$, and then simulates from the Gumbel copula. For any finite $\theta$, $\tau(P,Y)$ is larger than $\tau(X,Y)$: $$ \tau(P,Y)-\tau(X,Y) = \frac{1}{\theta} \left(\frac{m_P}{m_Y}\right)^\theta > 0. $$ Intuitively, even if latent TTP and OS have a weaker association, observed PFS contains deaths by construction, which increases the association between observed PFS and OS. The requested value of $\tau(P,Y)$ has a lower bound. When $\theta=1$, TTP and OS are independent, but PFS still contains OS deaths through the definition $P=\min(X,Y)$. In this limiting case, $$ \tau(P,Y) = \frac{m_P}{m_Y}. $$ Thus, for fixed medians $m_P$ and $m_Y$, `CorrelatedPfsAndOs2()` can only target $$ \frac{m_P}{m_Y} \leq \tau(P,Y) < 1. $$ If the requested Kendall's tau is too small for the two medians, the function stops with an error. ## Example Suppose we want to simulate PFS with median 5 months and OS with median 11 months, targeting Kendall's tau of 0.6 between observed, uncensored PFS and OS times. ```{r example-endpoint, results = 'asis'} pfs_and_os <- endpoint(name = c('pfs', 'os'), type = c('tte', 'tte'), generator = CorrelatedPfsAndOs2, median_pfs = 5, median_os = 11, kendall = 0.6, pfs_name = 'pfs', os_name = 'os') pfs_and_os ``` For verification only, we can call the generator directly. This is not the recommended way to use `TrialSimulator` for simulation studies; in practice, the generator should usually be supplied to `endpoint()` and used inside the trial simulation workflow. Direct calls are helpful here because they let us check the marginal medians, the observed Kendall's tau, and the ordering $\mbox{PFS} \leq \mbox{OS}$ before adding enrollment, censoring, treatment arms, and analyses. ```{r direct-generator} set.seed(123) dat <- CorrelatedPfsAndOs2(n = 10000, median_pfs = 5, median_os = 11, kendall = 0.6) head(dat, 2) ``` The simulation should approximately recover the requested medians and Kendall's tau between observed, uncensored PFS and OS times. ```{r validation} with(dat, median(pfs)) with(dat, median(os)) with(dat, cor(pfs, os, method = 'kendall')) with(dat, all(pfs <= os)) ``` Because this generator returns event indicators equal to 1 for both endpoints, censoring and staggered enrollment should be handled by the broader trial simulation workflow rather than by this endpoint generator. ## Practical Guidance This generator is most useful when the simulation objective is to specify marginal PFS and OS medians and a rank-based association between the two observed endpoints. Kendall's tau is often more stable and interpretable than Pearson correlation for skewed time-to-event variables, and it is directly connected to the Gumbel copula parameter. There are still modeling assumptions to keep in mind: - PFS and OS are both marginally exponential. - Dependence is induced through a Gumbel survival copula on latent TTP and OS. - The Gumbel copula only allows non-negative dependence. - The requested Kendall's tau is for observed PFS and OS event times before censoring is applied. - The attainable Kendall's tau between observed PFS and OS is constrained by the PFS/OS median ratio. For simulations that require transition-specific hazards or a mechanistic disease process, the illness-death model implemented in `TrialSimulator::CorrelatedPfsAndOs3()` may be more appropriate. For simulations where the main requirement is a direct medians-plus-Kendall's-tau specification, especially when OS will be analyzed under a proportional hazards Cox model, `CorrelatedPfsAndOs2()` provides a compact alternative. ## Appendix: Kendall's Tau for Observed PFS and OS This appendix gives the derivation behind the formula used in `CorrelatedPfsAndOs2()`. Let $X$ and $Y$ be nonnegative random variables with exponential margins $$ X \sim \operatorname{Exp}(\lambda_X), \qquad Y \sim \operatorname{Exp}(\lambda_Y), $$ and joint survival function $$ S_{X,Y}(x,y) = \Pr(X>x, Y>y) = \exp\left[ -\left\{(\lambda_X x)^\theta + (\lambda_Y y)^\theta\right\}^{1/\theta} \right], \qquad \theta \ge 1. $$ Define $P=\min(X,Y)$. We want Kendall's tau between $P$ and $Y$. Standardize the margins by setting $$ A=\lambda_X X, \qquad B=\lambda_Y Y. $$ Then the joint survival function of $(A,B)$ is $$ \Pr(A>a, B>b) = \exp\left\{-(a^\theta+b^\theta)^{1/\theta}\right\}. $$ Use the polar-type transformation $$ R=(A^\theta+B^\theta)^{1/\theta}, \qquad W=\frac{A^\theta}{A^\theta+B^\theta}. $$ A Jacobian calculation gives the joint density $$ f_{R,W}(r,w)=\frac{1}{\theta} e^{-r}(r+\theta-1), \qquad r>0,\; 0 c. \end{cases} $$ Using the survival-copula representation of Kendall's tau, $$ \tau(P,Y)=4E\{S_{P,Y}(P,Y)\}-1. $$ When $W \le c$, $P=X$ and $$ S_{P,Y}(P,Y)=S_{X,Y}(X,Y)=e^{-R}. $$ When $W>c$, $P=Y$. Since $$ S_{P,Y}(y,y)=\Pr(P>y,Y>y)=\Pr(X>y,Y>y), $$ we have, using $Y=B/\lambda_Y=R(1-W)^{1/\theta}/\lambda_Y$, $$ S_{P,Y}(Y,Y) = \exp\left[ -R\left(\frac{1-W}{d}\right)^{1/\theta} \right]. $$ Therefore, $$ E\{S_{P,Y}(P,Y)\} = E\{e^{-R}\mathbf 1(W\le c)\} + E\left[ e^{-R((1-W)/d)^{1/\theta}} \mathbf 1(W>c) \right]. $$ By independence of $R$ and $W$, $$ E\{S_{P,Y}(P,Y)\} = cL_R(1) + \int_c^1 L_R\left\{\left(\frac{1-w}{d}\right)^{1/\theta}\right\} dw, $$ where $L_R(s)=E(e^{-sR})$. From the density of $R$, $$ L_R(s) = \frac{1}{\theta} \int_0^\infty e^{-(1+s)r}(r+\theta-1)\,dr = \frac{\theta+(\theta-1)s}{\theta(1+s)^2}. $$ In particular, $$ L_R(1)=\frac{2\theta-1}{4\theta}. $$ For the integral term, use the substitution $$ t=\left(\frac{1-w}{d}\right)^{1/\theta}, \qquad dw=-d\theta t^{\theta-1}dt. $$ Then $$ \int_c^1 L_R\left\{\left(\frac{1-w}{d}\right)^{1/\theta}\right\} dw = d\theta\int_0^1 t^{\theta-1}L_R(t)\,dt. $$ Substituting $L_R(t)$ gives $$ d\int_0^1 \frac{\theta t^{\theta-1}+(\theta-1)t^\theta}{(1+t)^2} dt. $$ Because $$ \frac{d}{dt}\left(\frac{t^\theta}{1+t}\right) = \frac{\theta t^{\theta-1}+(\theta-1)t^\theta}{(1+t)^2}, $$ the integral is $$ d\left[\frac{t^\theta}{1+t}\right]_0^1 = \frac{d}{2}. $$ Thus, $$ E\{S_{P,Y}(P,Y)\} = c\frac{2\theta-1}{4\theta}+\frac{d}{2} = \frac{2\theta-1+d}{4\theta}, $$ using $c+d=1$. Finally, $$ \tau(P,Y) = 4E\{S_{P,Y}(P,Y)\}-1 = \frac{\theta-1+d}{\theta} = 1-\frac{1}{\theta} \left( \frac{\lambda_X^\theta}{\lambda_X^\theta+\lambda_Y^\theta} \right). $$ Since $P$ has rate $$ \lambda_P = \left(\lambda_X^\theta+\lambda_Y^\theta\right)^{1/\theta}, $$ this can also be written as $$ \tau(P,Y) = 1-\frac{1}{\theta} \left( \frac{\lambda_X^\theta}{\lambda_P^\theta} \right) = 1-\frac{1}{\theta} \left[ 1-\left(\frac{\lambda_Y}{\lambda_P}\right)^\theta \right]. $$ Because $\lambda_Y/\lambda_P = m_P/m_Y$, the formula used by `CorrelatedPfsAndOs2()` is $$ \tau(P,Y) = 1-\frac{1}{\theta} \left[ 1-\left(\frac{m_P}{m_Y}\right)^\theta \right]. $$