G-Computation or standardization for the Cox, Fine-Gray and binomial regression models for survival data

Klaus Holst & Thomas Scheike

2025-01-11

G-computation for the Cox and Fine-Gray models

Computing the standardized estimate (G-estimation) based on the Cox or Fine-Gray model : \[ \hat S(t,A=a) = n^{-1} \sum_i S(t,A=a,X_i) \] and this estimator has influence function \[ S(t,A=a,X_i) - S(t,A=a) + E( D_{A_0(t), \beta} S(t,A=a,X_i) ) \epsilon_i(t) \] where \(\epsilon_i(t)\) is the iid decomposition of \((\hat A(t) - A(t), \hat \beta- \beta)\).

These estimates have a causal interpration under the assumption of no-unmeasured confounders, and even without the causal assumptions this standardization can still be a useful summary measure.

First looking cumulative incidence via the Fine-Gray model for the two causes and making a plot of the standardized cumulative incidence for cause 1.

set.seed(100)

data(bmt); bmt$time <- bmt$time+runif(nrow(bmt))*0.001
dfactor(bmt) <- tcell~tcell
bmt$event <- (bmt$cause!=0)*1

fg1 <- cifreg(Event(time,cause)~tcell+platelet+age,bmt,cause=1,
          cox.prep=TRUE,propodds=NULL)
summary(survivalG(fg1,bmt,50))
#> risk:
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.4331 0.02749 0.3793 0.4870 6.321e-56
#> risk1   0.2727 0.05863 0.1577 0.3876 3.313e-06
#> 
#> Average Treatment effects (G-estimator) :
#>     Estimate Std.Err   2.5%    97.5% P-value
#> ps0  -0.1605 0.06353 -0.285 -0.03597 0.01153
#> 
#> Average Treatment effect risk-ratio (G-estimator) :
#>        Estimate  Std.Err      2.5%     97.5%    P-value
#> [ps0] 0.6295004 0.139248 0.3565794 0.9024214 0.00779742
#> 
#> Average Treatment effect (1-risk=survival)-ratio (G-estimator) :
#> NULL

fg2 <- cifreg(Event(time,cause)~tcell+platelet+age,bmt,cause=2,
          cox.prep=TRUE,propodds=NULL)
summary(survivalG(fg2,bmt,50))
#> risk:
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.2127 0.02314 0.1674 0.2581 3.757e-20
#> risk1   0.3336 0.06799 0.2003 0.4668 9.281e-07
#> 
#> Average Treatment effects (G-estimator) :
#>     Estimate Std.Err     2.5%  97.5% P-value
#> ps0   0.1208 0.07189 -0.02009 0.2617 0.09285
#> 
#> Average Treatment effect risk-ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%    97.5%   P-value
#> [ps0] 1.567915 0.3627528 0.8569321 2.278897 0.1174496
#> 
#> Average Treatment effect (1-risk=survival)-ratio (G-estimator) :
#> NULL

cif1time <- survivalGtime(fg1,bmt)
plot(cif1time,type="risk"); 

Now looking at the survival probability

ss <- phreg(Surv(time,event)~tcell+platelet+age,bmt)
sss <- survivalG(ss,bmt,50)
summary(sss)
#> risk:
#>       Estimate Std.Err   2.5%  97.5%    P-value
#> risk0   0.6539 0.02709 0.6008 0.7070 9.218e-129
#> risk1   0.5640 0.05971 0.4470 0.6811  3.531e-21
#> 
#> Average Treatment effects (G-estimator) :
#>     Estimate Std.Err    2.5%   97.5% P-value
#> ps0 -0.08992  0.0629 -0.2132 0.03337  0.1529
#> 
#> Average Treatment effect risk-ratio (G-estimator) :
#>        Estimate    Std.Err      2.5%    97.5%   P-value
#> [ps0] 0.8624974 0.09446477 0.6773499 1.047645 0.1455042
#> 
#> Average Treatment effect (1-risk=survival)-ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%    97.5%   P-value
#> [ps0] 1.259836 0.1894627 0.8884963 1.631176 0.1702385

Gtime <- survivalGtime(ss,bmt)
plot(Gtime)

G-computation for the binomial regression

We compare with the similar estimates using the Doubly Robust estimating equations using binregATE. The standardization from the G-computation can also be computed using a specialized function that takes less memory and is quicker (for large data).


## survival situation
sr1 <- binregATE(Event(time,event)~tcell+platelet+age,bmt,cause=1,
         time=40, treat.model=tcell~platelet+age)
summary(sr1)
#> 
#>    n events
#>  408    241
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept)  0.676409  0.137007  0.407880  0.944939  0.0000
#> tcell1      -0.023675  0.346994 -0.703770  0.656420  0.9456
#> platelet    -0.492952  0.246158 -0.975412 -0.010492  0.0452
#> age          0.343939  0.115561  0.117444  0.570434  0.0029
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  1.96680 1.50363 2.5727
#> tcell1       0.97660 0.49472 1.9279
#> platelet     0.61082 0.37704 0.9896
#> age          1.41049 1.12462 1.7690
#> 
#> Average Treatment effects (G-formula) :
#>             Estimate    Std.Err       2.5%      97.5% P-value
#> treat0     0.6230976  0.0273827  0.5694284  0.6767667  0.0000
#> treat1     0.6177595  0.0731712  0.4743466  0.7611723  0.0000
#> treat:1-0 -0.0053381  0.0783973 -0.1589940  0.1483179  0.9457
#> 
#> Average Treatment effects (double robust) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.623337  0.027508  0.569422  0.677253  0.0000
#> treat1     0.644397  0.085942  0.475954  0.812840  0.0000
#> treat:1-0  0.021059  0.090305 -0.155935  0.198054  0.8156

## relative risk effect 
estimate(coef=sr1$riskDR,vcov=sr1$var.riskDR,f=function(p) p[2]/p[1],null=1)
#>          Estimate Std.Err   2.5% 97.5% P-value
#> [treat1]    1.034  0.1453 0.7489 1.319  0.8162
#> 
#>  Null Hypothesis: 
#>   [treat1] = 1

## competing risks 
br1 <- binregATE(Event(time,cause)~tcell+platelet+age,bmt,cause=1,
         time=40,treat.model=tcell~platelet+age)
summary(br1)
#> 
#>    n events
#>  408    157
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept) -0.191519  0.130883 -0.448044  0.065007  0.1434
#> tcell1      -0.712880  0.351489 -1.401786 -0.023974  0.0425
#> platelet    -0.531919  0.244495 -1.011119 -0.052718  0.0296
#> age          0.432939  0.107314  0.222607  0.643271  0.0001
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  0.82570 0.63888 1.0672
#> tcell1       0.49023 0.24616 0.9763
#> platelet     0.58748 0.36381 0.9486
#> age          1.54178 1.24933 1.9027
#> 
#> Average Treatment effects (G-formula) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.417746  0.027030  0.364768  0.470724  0.0000
#> treat1     0.267097  0.061849  0.145874  0.388319  0.0000
#> treat:1-0 -0.150649  0.067578 -0.283100 -0.018199  0.0258
#> 
#> Average Treatment effects (double robust) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.417320  0.027122  0.364163  0.470478  0.0000
#> treat1     0.231149  0.060651  0.112275  0.350023  0.0001
#> treat:1-0 -0.186171  0.066053 -0.315633 -0.056710  0.0048

and using the specialized function

br1 <- binreg(Event(time,cause)~tcell+platelet+age,bmt,cause=1,time=40)
Gbr1 <- binregG(br1,data=bmt)
summary(Gbr1)
#> risk:
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.4177 0.02727 0.3643 0.4712 5.588e-53
#> risk1   0.2671 0.06183 0.1459 0.3883 1.562e-05
#> 
#> Average Treatment effects (G-estimator) :
#>    Estimate Std.Err    2.5%    97.5% P-value
#> p1  -0.1506 0.06759 -0.2831 -0.01817 0.02583
#> 
#> Average Treatment effect risk-ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%     97.5%    P-value
#> [p1] 0.6393758 0.1538101 0.3379136 0.9408381 0.01904716
#> 
#> Average Treatment effect (1-risk=survival)-ratio (G-estimator) :
#> NULL

## contrasting average age to +2-sd age, Avalues
Gbr2 <- binregG(br1,data=bmt,varname="age",Avalues=c(0,2))
summary(Gbr2)
#> risk:
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.3932 0.02537 0.3434 0.4429 3.566e-54
#> risk2   0.5997 0.05544 0.4911 0.7084 2.874e-27
#> 
#> Average Treatment effects (G-estimator) :
#>    Estimate Std.Err   2.5%  97.5%   P-value
#> p1   0.2066 0.04998 0.1086 0.3045 3.584e-05
#> 
#> Average Treatment effect risk-ratio (G-estimator) :
#>      Estimate   Std.Err     2.5%    97.5%      P-value
#> [p1] 1.525375 0.1324356 1.265806 1.784945 7.277532e-05
#> 
#> Average Treatment effect (1-risk=survival)-ratio (G-estimator) :
#> NULL

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: aarch64-apple-darwin24.2.0
#> Running under: macOS Sequoia 15.2
#> 
#> Matrix products: default
#> BLAS:   /Users/klaus/.asdf/installs/R/4.4.2/lib/R/lib/libRblas.dylib 
#> LAPACK: /Users/klaus/.asdf/installs/R/4.4.2/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1  cowplot_1.1.3  mets_1.3.5     timereg_2.0.6  survival_3.8-3
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9          future_1.34.0       generics_0.1.3     
#>  [4] lattice_0.22-6      listenv_0.9.1       digest_0.6.37      
#>  [7] magrittr_2.0.3      evaluate_1.0.1      grid_4.4.2         
#> [10] mvtnorm_1.3-2       fastmap_1.2.0       jsonlite_1.8.9     
#> [13] Matrix_1.7-1        scales_1.3.0        isoband_0.2.7      
#> [16] codetools_0.2-20    numDeriv_2016.8-1.1 jquerylib_0.1.4    
#> [19] lava_1.8.1          cli_3.6.3           rlang_1.1.4        
#> [22] parallelly_1.41.0   future.apply_1.11.3 munsell_0.5.1      
#> [25] withr_3.0.2         cachem_1.1.0        yaml_2.3.10        
#> [28] tools_4.4.2         parallel_4.4.2      ucminf_1.2.2       
#> [31] dplyr_1.1.4         colorspace_2.1-1    globals_0.16.3     
#> [34] vctrs_0.6.5         R6_2.5.1            lifecycle_1.0.4    
#> [37] MASS_7.3-64         pkgconfig_2.0.3     bslib_0.8.0        
#> [40] pillar_1.10.1       gtable_0.3.6        glue_1.8.0         
#> [43] Rcpp_1.0.13-1       xfun_0.50           tibble_3.2.1       
#> [46] tidyselect_1.2.1    knitr_1.49          farver_2.1.2       
#> [49] htmltools_0.5.8.1   labeling_0.4.3      rmarkdown_2.29     
#> [52] compiler_4.4.2