---
title: "Getting started with `aftPenCDA` package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with `aftPenCDA` package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%",
  fig.width = 7,
  fig.height = 4,
  fig.align = "center",
  dpi = 150
)
```

## Overview

`aftPenCDA` is an R package for fitting penalized accelerated failure time (AFT) models using induced smoothing. The package supports variable selection for both right-censored and clustered partly interval-censored survival data.

Several penalty functions are implemented, including broken adaptive ridge (BAR), LASSO, adaptive LASSO (ALASSO), and SCAD. For variance estimation, the package provides both a closed-form estimator and a perturbation-based estimator.

Core computational routines are implemented in C++ via Rcpp (RcppArmadillo backend) to ensure scalability for high-dimensional settings.

## Methodological background

The accelerated failure time (AFT) model with rank-based estimating equations involves nonsmooth objective functions, which pose challenges for numerical optimization.

Induced smoothing replaces the nonsmooth estimating equations with smooth approximations, allowing the use of gradient-based methods. This approach avoids direct optimization of nonsmooth rank-based estimating equations, significantly improving computational efficiency.

This leads to a quadratic approximation of the objective function. By applying a Cholesky decomposition, the problem is transformed into a least-squares-type formulation, which enables efficient coordinate descent updates for penalized estimation in high-dimensional settings.

The resulting formulation enables efficient computation even when the number of covariates is large relative to the sample size.

## Installation

You can install the development version of `aftPenCDA` from GitHub:

```{r eval = FALSE}
devtools::install_github("seonsy/aftPenCDA")
```

## Main functions

The main functions in `aftPenCDA` are:

- `aftpen()`: penalized AFT model for right-censored data
- `aftpen_pic()`: penalized AFT model for clustered partly interval-censored data

Both functions support the following penalty types:

- `"BAR"`: Broken Adaptive Ridge
- `"LASSO"`: LASSO penalty
- `"ALASSO"`: Adaptive LASSO penalty
- `"SCAD"`: Smoothly Clipped Absolute Deviation penalty

## Example 1: Right-censored data

We simulate right-censored survival data under an AFT model and fit the penalized estimator.

```{r,eval=FALSE}
library(aftPenCDA)

set.seed(1)

n <- 100
p <- 10

beta0 <- c(1, 1, 1, rep(0, p - 3))
x <- matrix(rnorm(n * p), n, p)

T <- exp(x %*% beta0 + rnorm(n))
C <- rexp(n, rate = 0.5)

y <- pmin(T, C)
d <- as.numeric(T <= C)

dt <- data.frame(y = y, d = d, x)
```

We fit the model using the BAR penalty.

```{r,eval=FALSE}
fit_bar <- aftpen(dt, lambda = 0.1, se = "CF", type = "BAR")
fit_bar$beta
```

Other penalties are also available.

```{r,eval=FALSE}
fit_lasso  <- aftpen(dt, lambda = 0.1, se = "CF", type = "LASSO")
fit_alasso <- aftpen(dt, lambda = 0.1, se = "CF", type = "ALASSO")
fit_scad   <- aftpen(dt, lambda = 0.1, se = "CF", type = "SCAD")
```

## Example 2: Clustered partly interval-censored data

We generate clustered partly interval-censored data and apply the proposed method.

```{r,eval=FALSE}
set.seed(1)

## simplified generator for clustered partly interval-censored data
n <- 100
p <- 2
beta0 <- c(1,1)
clu_rate <- 0.5
exactrates <- 0.8
left <- 0.001
right <- 0.01

## cluster-level frailty and informative cluster sizes
eta <- 1 / clu_rate
v <- rgamma(n, shape = eta, rate = eta)
m <- ifelse(v > median(v), 5, 3)
id <- rep(seq_len(n), m)
vi <- rep(v, m)

## subject-level covariates and failure times
N <- sum(m)
x <- matrix(rnorm(N * p), ncol = p)
colnames(x) <- paste0("x", seq_len(p))

T <- as.vector(exp(x %*% beta0 + vi * log(rexp(N))))

## build (L, R, delta)
L <- R <- delta <- numeric(N)
index <- rbinom(N, 1, exactrates)

for (i in seq_len(N)) {
  if (index[i] == 1) {
    L[i] <- T[i]
    R[i] <- T[i]
    delta[i] <- 1
  } else {
    U <- cumsum(c(1e-8, runif(10, left, right)))
    LL <- U[-length(U)]
    RR <- U[-1]

    if (T[i] < min(LL)) {
      L[i] <- 1e-8
      R[i] <- min(LL)
      delta[i] <- 0
    } else if (T[i] > max(RR)) {
      L[i] <- max(RR)
      R[i] <- 1e8
      delta[i] <- 0
    } else {
      idd <- which(T[i] > LL & T[i] < RR)

      if (length(idd) == 1) {
        L[i] <- LL[idd]
        R[i] <- RR[idd]
        delta[i] <- 0
      } else {
        L[i] <- T[i]
        R[i] <- T[i]
        delta[i] <- 1
      }
    }
  }
}

dt_pic <- data.frame(
  L = L,
  R = R,
  delta = delta,
  id = id,
  x1 = x[, 1],
  x2 = x[, 2]
)
```

We fit the model using the BAR penalty.

```{r,eval=FALSE}
fit_pic <- aftpen_pic(dt_pic, lambda = 0.0005, se = "CF", type = "BAR")
fit_pic$beta
```

Other penalties are also available for partly interval-censored data.

```{r,eval=FALSE}
fit_pic_lasso  <- aftpen_pic(dt_pic, lambda = 0.001, se = "CF", type = "LASSO")
fit_pic_alasso <- aftpen_pic(dt_pic, lambda = 0.001, se = "CF", type = "ALASSO")
fit_pic_scad   <- aftpen_pic(dt_pic, lambda = 0.001, se = "CF", type = "SCAD")
```

## Variance estimation

The argument `se` specifies the variance estimation method.

- `"CF"`: closed-form estimator
- `"ZL"`: perturbation-based estimator

For example:

```{r,eval=FALSE}
fit_zl <- aftpen(dt, lambda = 0.1, se = "ZL", type = "BAR")
```

## References

Wang, You-Gan, and Yudong Zhao (2008). “Weighted Rank Regression for Clustered Data Analysis.” *Biometrics* **64**(1), 39--45.

Dai, L., K. Chen, Z. Sun, Z. Liu, and G. Li (2018). “Broken Adaptive Ridge Regression and Its Asymptotic Properties.” *Journal of Multivariate Analysis* **168**, 334--351.

Zeng, Donglin, and D. Y. Lin (2008).“Efficient Resampling Methods for Nonsmooth Estimating Functions.” *Biostatistics* **9**(2), 355--363.

Tibshirani, Robert (1996).“Regression Shrinkage and Selection via the Lasso.” *Journal of the Royal Statistical Society: Series B* **58**(1), 267--288.

Fan, Jianqing, and Runze Li (2001). “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” *Journal of the American Statistical Association* **96**(456), 1348--1360.

Zou, Hui (2006).“The Adaptive Lasso and Its Oracle Properties.” *Journal of the American Statistical Association* **101**(476), 1418--1429.