---
title: "Bayesian Functional Cox Regression with `refundBayes::fcox_bayes`"
output: rmarkdown::html_vignette
date: "2026-03-03"
vignette: >
  %\VignetteIndexEntry{Bayesian Functional Cox Regression}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

This vignette provides a detailed guide to the `fcox_bayes()` function in the `refundBayes` package, which fits Bayesian Functional Cox Regression (FCR) models for time-to-event outcomes using Stan. The function extends the scalar-on-function regression framework to survival analysis with right censoring, and is designed with a syntax similar to `mgcv::gam`.

The methodology follows the tutorial by Jiang, Crainiceanu, and Cui (2025), *Tutorial on Bayesian Functional Regression Using Stan*, published in *Statistics in Medicine*.

# Install the `refundBayes` Package

The `refundBayes` package can be installed from GitHub:

```r
library(remotes)
remotes::install_github("https://github.com/ZirenJiang/refundBayes")
```

# Statistical Model

## The Functional Cox Regression Model

The Functional Cox Regression (FCR) model extends the classical Cox proportional hazards model to settings where one or more predictors are functional (i.e., curves or trajectories observed over a continuum).

For subject $i = 1, \ldots, n$, let $T_i$ be the event time and $C_i$ the censoring time. The observed data consist of $[Y_i, \delta_i, \mathbf{Z}_i, \{W_i(t_m), t_m \in \mathcal{T}\}]$, where:

- $Y_i = \min(T_i, C_i)$ is the observed follow-up time,
- $\delta_i$ is the censoring indicator with $\delta_i = 0$ if the event is observed ($Y_i = T_i$) and $\delta_i = 1$ if the event is censored ($Y_i < T_i$),
- $\mathbf{Z}_i$ is a $p \times 1$ vector of scalar predictors,
- $\{W_i(t_m), t_m \in \mathcal{T}\}$ for $m = 1, \ldots, M$ is a functional predictor observed over a domain $\mathcal{T}$.

The domain $\mathcal{T}$ is **not** restricted to $[0,1]$; it is determined by the actual time points in the data (e.g., $\mathcal{T} = [1, 1440]$ for minute-level 24-hour activity data). See the section on `tmat`, `lmat`, and `wmat` below for details.

The model assumes a proportional hazards structure:

$$h_i(t) = h_0(t) \exp(\eta_i)$$

where $h_0(t)$ is the baseline hazard function, and $\eta_i$ is the linear predictor defined as:

$$\eta_i = \eta_0 + \int_{\mathcal{T}} W_i(s)\beta(s)\,ds + \mathbf{Z}_i^t \boldsymbol{\gamma}$$

Here $\eta_0$ is the intercept, $\beta(\cdot)$ is the functional coefficient, and $\boldsymbol{\gamma}$ is the vector of scalar regression coefficients. The integral is approximated by a Riemann sum using the time point matrix `tmat` and the integration weight matrix `lmat` (see below).

The log-likelihood for this model has the form:

$$\ell(\mathbf{Y}, \boldsymbol{\delta}; h_0, \boldsymbol{\eta}) = \sum_{i=1}^n \left[ (1 - \delta_i)\left\{\log h_0(y_i) + \eta_i - H_0(y_i)\exp(\eta_i)\right\} + \delta_i\left\{-H_0(y_i)\exp(\eta_i)\right\} \right]$$

where $H_0(t) = \int_0^t h_0(u)\,du$ is the cumulative baseline hazard function.

## Modeling the Baseline Hazard with M-Splines

Unlike the classical Cox model, which leaves the baseline hazard unspecified and uses partial likelihood, the Bayesian approach requires a full likelihood specification. The `fcox_bayes()` function models the baseline hazard function $h_0(t)$ using M-splines:

$$h_0(t) = \sum_{l=1}^L c_l M_l(t; \mathbf{k}, \tau)$$

where $M_l(t; \mathbf{k}, \tau)$ denotes the $l$-th M-spline basis function with knots $\mathbf{k}$ and degree $\tau$, and $\mathbf{c} = (c_1, \ldots, c_L)^t$ are the spline coefficients. The constraints $c_l \geq 0$ and $\sum_{l=1}^L c_l > 0$ ensure that the baseline hazard is non-negative and non-trivial. M-splines are a family of non-negative, piecewise polynomial basis functions that integrate to one over their support.

The cumulative baseline hazard function is modeled using I-splines, which are the integrated forms of M-splines:

$$H_0(t) = \sum_{l=1}^L c_l I_l(t; \mathbf{k}, \tau)$$

where $I_l(t; \mathbf{k}, \tau) = \int_0^t M_l(u; \mathbf{k}, \tau)\,du$. Because M-splines are non-negative, the resulting I-spline cumulative hazard is non-decreasing by construction.

## Identifiability Constraint

The I-spline coefficients $\mathbf{c}$ and the intercept $\eta_0$ are not simultaneously identifiable: for any $a > 0$, the parameter pairs $(\mathbf{c}, \eta_0)$ and $(a\mathbf{c}, \eta_0 - \log a)$ yield the same hazard function. To resolve this, the model imposes the constraint $\sum_{l=1}^L c_l = 1$ by assigning a non-informative Dirichlet prior $D(\mathbf{c}; \boldsymbol{\alpha})$ with $\boldsymbol{\alpha} = (1, \ldots, 1)$ to the coefficients. Consequently, the intercept defaults to `FALSE` in `fcox_bayes()`.

## Functional Coefficient via Penalized Splines

As in the SoFR model, the functional coefficient $\beta(s)$ is represented using $K$ spline basis functions $\psi_1(s), \ldots, \psi_K(s)$:

$$\beta(s) = \sum_{k=1}^K b_k \psi_k(s)$$

The spline coefficients $\mathbf{b}$ are reparametrised using the spectral decomposition of the penalty matrix $\mathbf{S}$ (the Wahba-O'Sullivan smoothing penalty) to obtain transformed coefficients $\tilde{\mathbf{b}} = (\tilde{\mathbf{b}}_r^t, \tilde{\mathbf{b}}_f^t)^t$, where:

- $\tilde{\mathbf{b}}_r$ (random effects) receive a $N(\mathbf{0}, \sigma_b^2 \mathbf{I})$ prior,
- $\tilde{\mathbf{b}}_f$ (fixed effects) receive non-informative priors.

This reparametrisation ensures numerical stability and efficient sampling in Stan.

### The Role of `tmat`, `lmat`, and `wmat`

The integral $\int_{\mathcal{T}} W_i(s)\beta(s)\,ds$ is approximated by the Riemann sum $\sum_{m=1}^M L_m W_i(t_m)\psi_k(t_m)$, constructed from three user-supplied matrices:

- **`tmat`** (an $n \times M$ matrix): contains the time points $t_m$ at which the functional predictor is observed. The $(i, m)$-th entry equals $t_m$. The range of values in `tmat` determines the domain of integration $\mathcal{T}$. For example, if the functional predictor is observed at minutes $1, 2, \ldots, 1440$ within a day, then `tmat` has entries ranging from $1$ to $1440$ and $\mathcal{T} = [1, 1440]$. There is no requirement that the domain be rescaled to $[0, 1]$.

- **`lmat`** (an $n \times M$ matrix): contains the integration weights $L_m = t_{m+1} - t_m$ for the Riemann sum approximation. For equally spaced time points with spacing $\Delta t$, every entry of `lmat` equals $\Delta t$. These weights, together with `tmat`, fully specify how the numerical integration is performed and over what domain.

- **`wmat`** (an $n \times M$ matrix): contains the functional predictor values. The $i$-th row contains the $M$ observed values $W_i(t_1), \ldots, W_i(t_M)$ for subject $i$.

In the formula `s(tmat, by = lmat * wmat, bs = "cc", k = 10)`, the `mgcv` infrastructure uses `tmat` to construct the spline basis $\psi_k(t_m)$ at the observed time points, and the `by = lmat * wmat` argument provides the element-wise product $L_m \cdot W_i(t_m)$ that enters the Riemann sum. The basis functions are evaluated on the scale of `tmat`, and the integration weights in `lmat` ensure that the discrete sum correctly approximates the integral over the actual domain $\mathcal{T}$ --- regardless of whether it is $[0, 1]$, $[1, 1440]$, or any other interval.

Note that for all subjects, the time points are assumed to be identical so that $t_{im} = t_m$ for all $i = 1, \ldots, n$. Thus every row of `tmat` is the same, and every row of `lmat` is the same. The matrices are replicated across rows to match the `mgcv` syntax, which expects all terms in the formula to have the same dimensions.

## Full Bayesian Model

The complete Bayesian Functional Cox Regression model is:

$$\begin{cases}
\mathbf{Y} \sim \ell(\mathbf{Y}, \boldsymbol{\delta}; h_0, \boldsymbol{\eta}) \\
\boldsymbol{\eta} = \eta_0 \mathbf{J}_n + \tilde{\mathbf{X}}_r^t \tilde{\mathbf{b}}_r + \tilde{\mathbf{X}}_f^t \tilde{\mathbf{b}}_f + \mathbf{Z}^t \boldsymbol{\gamma} \\
\tilde{\mathbf{b}}_r \sim N(\mathbf{0}, \sigma_b^2 \mathbf{I}) \\
\eta_0 \sim p(\eta_0);\; \tilde{\mathbf{b}}_f \sim p(\tilde{\mathbf{b}}_f);\; \boldsymbol{\gamma} \sim p(\boldsymbol{\gamma});\; \sigma_b^2 \sim p(\sigma_b^2) \\
h_0(t) = \sum_{l=1}^L c_l M_l(t; \mathbf{k}, \tau) \\
\mathbf{c} \sim D(\mathbf{c}; \boldsymbol{\alpha})
\end{cases}$$

where most components are shared with the SoFR model, except for the Cox likelihood (first line) and the baseline hazard specification via M-splines with a Dirichlet prior on $\mathbf{c}$ (last two lines).

## Joint Modeling with FPCA (Optional)

When functional predictors are observed with measurement error, the `fcox_bayes()` function supports a joint modeling approach that simultaneously estimates FPCA scores and fits the Cox regression model. In this case, the model regresses on the latent trajectory $D_i(t)$ rather than the observed (noisy) function $W_i(t) = D_i(t) + \epsilon_i(t)$, properly accounting for the uncertainty in the score estimates.

# The `fcox_bayes()` Function

## Usage

```r
fcox_bayes(
  formula,
  data,
  cens,
  joint_FPCA = NULL,
  intercept = FALSE,
  runStan = TRUE,
  niter = 3000,
  nwarmup = 1000,
  nchain = 3,
  ncores = 1
)
```

## Arguments

| Argument | Description |
|:---------|:------------|
| `formula` | Functional regression formula, using the same syntax as `mgcv::gam`. The left-hand side should be the observed survival time $Y_i = \min(T_i, C_i)$. Functional predictors are specified using the `s()` term with `by = lmat * wmat` to encode the Riemann sum integration. |
| `data` | A data frame containing all scalar and functional variables used in the model, as well as the survival time as the response variable. |
| `cens` | A binary vector indicating censoring status for each subject, following the convention: $\delta_i = 0$ if the event is observed and $\delta_i = 1$ if censored. Must have the same length as the number of observations in `data`. |
| `joint_FPCA` | A logical (`TRUE`/`FALSE`) vector of the same length as the number of functional predictors, indicating whether to jointly model FPCA for each functional predictor. Default is `NULL`, which sets all entries to `FALSE` (no joint FPCA). |
| `intercept` | Logical. Whether to include an intercept term in the linear predictor. Default is `FALSE`, because the intercept is not identifiable simultaneously with the M-spline coefficients for the baseline hazard (see the Identifiability Constraint section above). |
| `runStan` | Logical. Whether to run the Stan program. If `FALSE`, the function only generates the Stan code and data without sampling. Default is `TRUE`. |
| `niter` | Total number of Bayesian posterior sampling iterations (including warmup). Default is `3000`. |
| `nwarmup` | Number of warmup (burn-in) iterations. Default is `1000`. |
| `nchain` | Number of Markov chains for posterior sampling. Default is `3`. |
| `ncores` | Number of CPU cores to use when executing the chains in parallel. Default is `1`. |

## Return Value

The function returns a list of class `"refundBayes"` containing the following elements:

| Element | Description |
|:--------|:------------|
| `stanfit` | The Stan fit object (class `stanfit`). Can be used for convergence diagnostics, traceplots, and additional summaries via the `rstan` package. |
| `spline_basis` | Basis functions used to reconstruct the functional coefficients from the posterior samples. |
| `stancode` | A character string containing the generated Stan model code. |
| `standata` | A list containing the data passed to the Stan model. |
| `int` | A vector of posterior samples for the intercept term $\eta_0$. `NULL` by default since `intercept = FALSE`. |
| `scalar_coef` | A matrix of posterior samples for scalar coefficients $\boldsymbol{\gamma}$, where each row is one posterior sample and each column corresponds to one scalar predictor. `NULL` if no scalar predictors are included. |
| `func_coef` | A list of posterior samples for functional coefficients. Each element is a matrix where each row is one posterior sample and each column corresponds to one location on the functional domain. |
| `baseline_hazard` | A list containing posterior samples of baseline hazard parameters: `bhaz` (baseline hazard values) and `cbhaz` (cumulative baseline hazard values). |
| `family` | The family type: `"Cox"`. |

## Formula Syntax

The formula follows the `mgcv::gam` syntax. The left-hand side is the observed survival time (not a `Surv` object). The key component for specifying functional predictors is:

```r
s(tmat, by = lmat * wmat, bs = "cc", k = 10)
```

where:

- `tmat`: an $n \times M$ matrix of time points defining the domain $\mathcal{T}$ over which the functional predictor is observed and the integral is computed. The domain adapts to the actual values in `tmat` (e.g., $[0, 1]$, $[1, 1440]$, etc.).
- `lmat`: an $n \times M$ matrix of Riemann sum weights ($L_m = t_{m+1} - t_m$), controlling the numerical integration over $\mathcal{T}$.
- `wmat`: an $n \times M$ matrix of functional predictor values.
- `bs`: the type of spline basis (e.g., `"cr"` for cubic regression splines, `"cc"` for cyclic cubic regression splines).
- `k`: the number of basis functions.

Scalar predictors are included as standard formula terms. The censoring information is provided separately via the `cens` argument, not through the formula.

# Example: Bayesian Functional Cox Regression

We demonstrate the `fcox_bayes()` function using a simulated example dataset with a time-to-event outcome and one functional predictor.

## Load and Prepare Data

```{r load-data, eval=F}
## Load the example data
Func_Cox_Data <- readRDS("data/example_data_Cox.rds")

## Prepare the functional predictor and censoring indicator
Func_Cox_Data$wmat <- Func_Cox_Data$MIMS
Func_Cox_Data$cens <- 1 - Func_Cox_Data$event
```

The example dataset contains:

- `survtime`: the observed survival time $Y_i = \min(T_i, C_i)$,
- `event`: a binary event indicator ($1$ = event occurred, $0$ = censored),
- `X1`: a scalar predictor,
- `tmat`: the $n \times M$ time point matrix (defines the domain $\mathcal{T}$),
- `lmat`: the $n \times M$ Riemann sum weight matrix (defines the integration weights over $\mathcal{T}$),
- `MIMS`: the $n \times M$ functional predictor matrix (physical activity data).

Note that the censoring indicator `cens` is constructed as `1 - event`, following the convention where $\delta_i = 0$ indicates an observed event and $\delta_i = 1$ indicates censoring.

## Fit the Bayesian Functional Cox Model

```{r fit-model, eval=F}
library(refundBayes)

refundBayes_FCox <- refundBayes::fcox_bayes(
  survtime ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
  data = Func_Cox_Data,
  cens = Func_Cox_Data$cens,
  runStan = TRUE,
  niter = 1500,
  nwarmup = 500,
  nchain = 3,
  ncores = 3
)
```

In this call:

- The formula specifies the observed survival time `survtime` as the response, with one scalar predictor `X1` and one functional predictor encoded via `s(tmat, by = lmat * wmat, bs = "cc", k = 10)`.
- The spline basis is evaluated at the time points stored in `tmat`, and the integration over the domain $\mathcal{T}$ (determined by `tmat`) is approximated using the weights in `lmat`.
- The censoring vector `cens` is supplied separately, with $0$ = event observed and $1$ = censored.
- `bs = "cc"` uses cyclic cubic regression splines, appropriate for periodic functional predictors (e.g., 24-hour activity patterns).
- `k = 10` specifies 10 basis functions. In practice, 30--40 basis functions are recommended for moderately smooth functional data on dense grids.
- The `intercept` argument is not specified and defaults to `FALSE`, which is appropriate for Cox models due to the identifiability constraint with the baseline hazard.
- The sampler runs 3 chains in parallel, each with 1500 total iterations (500 warmup + 1000 posterior samples).

## Visualization

The `plot()` method for `refundBayes` objects displays the estimated functional coefficient $\hat{\beta}(t)$ along with pointwise 95% credible intervals:

```{r plot-model, eval=F}
library(ggplot2)
plot(refundBayes_FCox)
```

## Extracting Posterior Summaries

Posterior summaries of the functional coefficient can be computed directly from the `func_coef` element:

```{r posterior-summary, eval=F}
## Posterior mean of the functional coefficient
mean_curve <- apply(refundBayes_FCox$func_coef[[1]], 2, mean)

## Pointwise 95% credible interval
upper_curve <- apply(refundBayes_FCox$func_coef[[1]], 2,
                     function(x) quantile(x, prob = 0.975))
lower_curve <- apply(refundBayes_FCox$func_coef[[1]], 2,
                     function(x) quantile(x, prob = 0.025))
```

The posterior samples in `func_coef[[1]]` are stored as a $Q \times M$ matrix, where $Q$ is the number of posterior samples and $M$ is the number of time points on the functional domain.

## Comparison with Frequentist Results

The Bayesian results can be compared with frequentist estimates obtained via `mgcv::gam`, which handles Cox proportional hazards models through its `cox.ph()` family using partial likelihood:

```{r freq-comparison, eval=F}
library(mgcv)

## Fit frequentist functional Cox model using mgcv
fit_freq <- gam(
  survtime ~ s(tmat, by = lmat * wmat, bs = "cc", k = 10) + X1,
  data = Func_Cox_Data,
  family = cox.ph(),
  weights = Func_Cox_Data$event
)

## Extract frequentist estimates
freq_result <- plot(fit_freq)
```

Note that the frequentist approach uses `cox.ph()` family with partial likelihood and a Breslow-type nonparametric estimator for the baseline hazard, while the Bayesian approach models the baseline hazard parametrically using M-splines. Despite this difference, simulation studies in Jiang et al. (2025) show that both approaches achieve comparable performance in terms of estimation accuracy and coverage.

## Inspecting the Generated Stan Code

Setting `runStan = FALSE` allows you to inspect or modify the Stan code before running the model:

```{r inspect-code, eval=F}
## Generate Stan code without running the sampler
fcox_code <- refundBayes::fcox_bayes(
  survtime ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
  data = Func_Cox_Data,
  cens = Func_Cox_Data$cens,
  runStan = FALSE
)

## Print the generated Stan code
cat(fcox_code$stancode)
```

The generated Stan code includes a `functions` block that defines custom log-likelihood functions for the Cox model (`cox_log_lpdf` for observed events and `cox_log_lccdf` for censored observations), in addition to the standard `data`, `parameters`, and `model` blocks.

# Practical Recommendations

- **Number of basis functions (`k`)**: For illustrative purposes, `k = 10` is often used. In practice, 30--40 basis functions are recommended for moderately smooth functional data observed on dense grids.
- **Spline type (`bs`)**: Use `"cr"` (cubic regression splines) for general functional predictors. Use `"cc"` (cyclic cubic regression splines) when the functional predictor is periodic (e.g., 24-hour activity patterns).
- **Intercept**: The intercept defaults to `FALSE` for Cox models because it is not simultaneously identifiable with the baseline hazard M-spline coefficients. In most cases, this default should not be changed.
- **Censoring vector**: Ensure the `cens` vector correctly follows the convention $\delta_i = 0$ for observed events and $\delta_i = 1$ for censored observations. If your data has an event indicator (1 = event, 0 = censored), use `cens = 1 - event`.
- **Number of iterations**: Cox models may require more iterations than standard SoFR models. A common setup is `niter = 3000` with `nwarmup = 1000`. Watch for divergent transitions in the Stan output; if they occur, consider increasing the number of warmup iterations or adjusting the `adapt_delta` parameter.
- **Convergence diagnostics**: After fitting, examine traceplots and $\hat{R}$ statistics using the `rstan` package (e.g., `rstan::traceplot(refundBayes_FCox$stanfit)`) to ensure that the Markov chains have converged.
- **Joint FPCA**: When functional predictors are measured with substantial noise, consider setting `joint_FPCA = TRUE` for the relevant predictor to jointly estimate FPCA scores and regression coefficients within the Cox model.

# References

- Jiang, Z., Crainiceanu, C., and Cui, E. (2025). Tutorial on Bayesian Functional Regression Using Stan. *Statistics in Medicine*, 44(20--22), e70265.
- Crainiceanu, C. M., Goldsmith, J., Leroux, A., and Cui, E. (2024). *Functional Data Analysis with R*. CRC Press.
- Cui, E., Crainiceanu, C. M., and Leroux, A. (2021). Additive Functional Cox Model. *Journal of Computational and Graphical Statistics*, 30(3), 780--793.
- Brilleman, S. L., Elci, E. M., Novik, J. B., and Wolfe, R. (2020). Bayesian Survival Analysis Using the rstanarm R Package. arXiv preprint, arXiv:2002.09633.
- Cox, D. (1972). Regression Models and Life-Tables. *Journal of the Royal Statistical Society, Series B*, 34(2), 187--220.
- Carpenter, B., Gelman, A., Hoffman, M. D., et al. (2017). Stan: A Probabilistic Programming Language. *Journal of Statistical Software*, 76(1), 1--32.