--- title: "Mathematical Foundations of Series Systems" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Mathematical Foundations of Series Systems} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Series System Theory A **series system** consists of $m$ independent components, each with lifetime $T_j$ and survival function $S_j(t) = P(T_j > t)$. The system fails when *any* component fails, so the system lifetime is: $$T_{sys} = \min(T_1, \ldots, T_m)$$ Since the components are independent: $$S_{sys}(t) = P(T_{sys} > t) = P(T_1 > t, \ldots, T_m > t) = \prod_{j=1}^{m} S_j(t)$$ Taking the negative logarithm gives the cumulative hazard: $$H_{sys}(t) = -\log S_{sys}(t) = \sum_{j=1}^{m} H_j(t)$$ Differentiating yields the hazard rate: $$h_{sys}(t) = \frac{d}{dt} H_{sys}(t) = \sum_{j=1}^{m} h_j(t)$$ This is the fundamental result: **the system hazard is the sum of component hazards**. ## Key Properties Let's verify these relationships numerically. ### Property 1: Hazard Sum ```{r hazard-sum} library(serieshaz) sys <- dfr_dist_series(list( dfr_weibull(shape = 2, scale = 100), dfr_exponential(0.01), dfr_gompertz(a = 0.001, b = 0.05) )) h_sys <- hazard(sys) h1 <- component_hazard(sys, 1) h2 <- component_hazard(sys, 2) h3 <- component_hazard(sys, 3) t_vals <- c(10, 50, 100, 200) for (t0 in t_vals) { lhs <- h_sys(t0) rhs <- h1(t0) + h2(t0) + h3(t0) cat(sprintf("t=%3d: h_sys=%.6f, sum h_j=%.6f, diff=%.2e\n", t0, lhs, rhs, abs(lhs - rhs))) } ``` ### Property 2: Survival Product ```{r surv-product} S_sys <- surv(sys) comp1 <- component(sys, 1) comp2 <- component(sys, 2) comp3 <- component(sys, 3) S1 <- surv(comp1) S2 <- surv(comp2) S3 <- surv(comp3) for (t0 in t_vals) { lhs <- S_sys(t0) rhs <- S1(t0) * S2(t0) * S3(t0) cat(sprintf("t=%3d: S_sys=%.6f, prod S_j=%.6f, diff=%.2e\n", t0, lhs, rhs, abs(lhs - rhs))) } ``` ### Property 3: Cumulative Hazard Sum ```{r cumhaz-sum} # The system's cumulative hazard is the sum of component cumulative hazards H_sys <- cum_haz(sys) H1 <- cum_haz(comp1) H2 <- cum_haz(comp2) H3 <- cum_haz(comp3) for (t0 in t_vals) { lhs <- H_sys(t0) rhs <- H1(t0) + H2(t0) + H3(t0) cat(sprintf("t=%3d: H_sys=%.6f, sum H_j=%.6f, diff=%.2e\n", t0, lhs, rhs, abs(lhs - rhs))) } ``` ### Property 4: Density Formula The system density is: $$f_{sys}(t) = h_{sys}(t) \cdot S_{sys}(t) = \left[\sum_{j=1}^m h_j(t)\right] \exp\left[-\sum_{j=1}^m H_j(t)\right]$$ ```{r density-formula} f_sys <- density(sys) for (t0 in t_vals) { from_density <- f_sys(t0) from_hS <- h_sys(t0) * S_sys(t0) cat(sprintf("t=%3d: f(t)=%.8f, h(t)*S(t)=%.8f, diff=%.2e\n", t0, from_density, from_hS, abs(from_density - from_hS))) } ``` The `density()` method and the manual $h(t) \cdot S(t)$ calculation agree to machine precision. Note that at $t = 200$, both the density and survival are effectively zero --- by that time, the combined Weibull + exponential + Gompertz hazard has driven the cumulative hazard so high that virtually all systems have already failed. ## The Parameter Layout The series system stores parameters from all components as a single flat vector. This is critical for optimization — MLE fitting via `optim()` requires a single parameter vector. ```{r layout-demo} sys <- dfr_dist_series(list( dfr_weibull(shape = 2, scale = 100), # 2 params dfr_exponential(0.05), # 1 param dfr_gompertz(a = 0.01, b = 0.1) # 2 params )) # Full parameter vector (5 elements) params(sys) # Layout maps global indices to components param_layout(sys) # Component 1 uses par[1:2], component 2 uses par[3], component 3 uses par[4:5] ``` When an optimizer proposes a new parameter vector `par = c(p1, p2, p3, p4, p5)`, the series system internally slices it: component 1 gets `par[1:2]`, component 2 gets `par[3]`, and component 3 gets `par[4:5]`. ## Analytical vs. Numerical Cumulative Hazard The cumulative hazard $H(t) = \int_0^t h(u)\,du$ is needed for survival, CDF, and log-likelihood computations. When all components provide an analytical `cum_haz_rate`, the series system computes $H_{sys}(t) = \sum_j H_j(t)$ exactly. Otherwise, it falls back to numerical integration. ```{r analytical-check} # All standard distributions provide analytical cumulative hazard sys_analytical <- dfr_dist_series(list( dfr_weibull(shape = 2, scale = 100), dfr_exponential(0.05) )) cat("Has analytical cum_haz:", !is.null(sys_analytical$cum_haz_rate), "\n") # A custom dfr_dist without cum_haz_rate forces numerical fallback custom <- dfr_dist( rate = function(t, par, ...) par[1] * t^(par[1] - 1), par = c(1.5) ) sys_numerical <- dfr_dist_series(list( custom, dfr_exponential(0.05) )) cat("Has analytical cum_haz:", !is.null(sys_numerical$cum_haz_rate), "\n") ``` The analytical path is faster and more precise, but the numerical fallback works correctly for any valid hazard function. ## Score Function: Decomposed Gradient The log-likelihood for a series system with observed data $(t_i, \delta_i)$ is: $$\ell(\boldsymbol{\theta}) = \sum_{i} \left[\delta_i \log h_{sys}(t_i; \boldsymbol{\theta}) - H_{sys}(t_i; \boldsymbol{\theta})\right]$$ where $\delta_i = 1$ for exact observations, $\delta_i = 0$ for right-censored, and $\delta_i = -1$ for left-censored. ### The naive approach A generic gradient computation would apply `numDeriv::grad` to the full log-likelihood, perturbing the entire parameter vector $\boldsymbol{\theta}$ of length $P$. Each perturbation re-evaluates all $m$ component hazards at all $n$ observations, for a cost of $O(P \cdot n \cdot m)$. ### How `serieshaz` decomposes the gradient The series structure admits a per-component decomposition. Because $H_{sys} = \sum_j H_j$ and $h_{sys} = \sum_j h_j$, the gradient with respect to component $k$'s parameters $\boldsymbol{\theta}_k$ separates into two terms: $$\frac{\partial \ell}{\partial \boldsymbol{\theta}_k} = \underbrace{ \sum_{i:\,\delta_i=1} \frac{1}{h_{sys}(t_i)} \frac{\partial h_k(t_i)}{\partial \boldsymbol{\theta}_k} }_{\text{hazard derivative term}} \;-\; \underbrace{ \sum_{i} \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k} }_{\text{cumulative hazard derivative term}}$$ The two terms are computed differently: | Term | Method | Cost | |------|--------|------| | Hazard derivative | `numDeriv::jacobian` on $h_k(t, \boldsymbol{\theta}_k)$ | $O(p_k \cdot n_{\text{exact}})$ per component | | Cumulative hazard derivative | **Analytical** via all-censored trick (when the component provides `score_fn`) | $O(n)$ per component | | Cumulative hazard derivative | `numDeriv::grad` on $\sum_i H_k(t_i)$ (fallback) | $O(p_k \cdot n)$ per component | Total cost: $O(P \cdot n)$, versus $O(P \cdot n \cdot m)$ for the naive approach --- roughly an $m$-fold speedup. ### The all-censored trick The key insight for the cumulative hazard derivative is that a component's own `score_fn` computes: $$\text{score}_k(\boldsymbol{\theta}_k) = \sum_i \delta_i \frac{\partial \log h_k(t_i)}{\partial \boldsymbol{\theta}_k} - \sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}$$ If we call `score_fn` with all observations marked as right-censored ($\delta_i = 0$), the first sum vanishes and we get: $$\text{score}_k\big|_{\delta=0} = -\sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}$$ This extracts the cumulative hazard derivative analytically --- no numerical differentiation needed. All built-in distributions (exponential, Weibull, Gompertz, log-logistic) provide `score_fn`, so this analytical path is the common case. ### Numerical demonstration ```{r score-decomposed} sys <- dfr_dist_series(list( dfr_weibull(shape = 2, scale = 100), dfr_exponential(0.01), dfr_gompertz(a = 0.001, b = 0.05) )) set.seed(42) times <- sampler(sys)(200) df <- data.frame(t = times, delta = rep(1L, 200)) par0 <- params(sys) sc_fn <- score(sys) ll_fn <- loglik(sys) # Decomposed score (what serieshaz computes internally) score_val <- sc_fn(df, par = par0) # Independent finite-difference verification eps <- 1e-6 fd_score <- numeric(length(par0)) for (k in seq_along(par0)) { par_p <- par_m <- par0 par_p[k] <- par0[k] + eps par_m[k] <- par0[k] - eps fd_score[k] <- (ll_fn(df, par = par_p) - ll_fn(df, par = par_m)) / (2 * eps) } result <- data.frame( Parameter = c("shape", "scale", "rate", "a", "b"), Decomposed = round(score_val, 6), FiniteDiff = round(fd_score, 6), AbsDiff = formatC(abs(score_val - fd_score), format = "e", digits = 1) ) print(result, row.names = FALSE) ``` The decomposed score and the finite-difference approximation agree to high precision across all five parameters, spanning three different distribution families. ### Score at the MLE At the maximum likelihood estimate, the score should be approximately zero (the first-order optimality condition). We compare the score magnitude at the true parameters (generally nonzero with finite $n$) versus at the MLE: ```{r score-at-mle} sys2 <- dfr_dist_series(list( dfr_weibull(shape = 2, scale = 100), dfr_exponential(0.01) )) set.seed(42) df2 <- data.frame(t = sampler(sys2)(300), delta = rep(1L, 300)) sc2 <- score(sys2) cat("Score at true params:", round(sc2(df2), 4), "\n") solver <- fit(sys2) result <- suppressWarnings(solver(df2, par = c(1.5, 80, 0.02))) cat("MLE: ", round(coef(result), 4), "\n") cat("Score at MLE: ", round(sc2(df2, par = coef(result)), 6), "\n") ``` The score at the MLE is orders of magnitude smaller than at the true parameters, confirming the optimizer has found the stationary point. (The residual is due to finite optimizer tolerance, not a problem with the score computation.) ### Decomposed Hessian The Hessian also exploits the series structure. It has a natural block decomposition into within-component and cross-component terms: **Within-component block** ($k = l$): $$\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top} = \sum_{i:\,\delta_i=1} \left[ \frac{1}{h_{sys}(t_i)} \frac{\partial^2 h_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top} - \frac{1}{h_{sys}(t_i)^2} \frac{\partial h_k}{\partial \boldsymbol{\theta}_k} \frac{\partial h_k^\top}{\partial \boldsymbol{\theta}_k} \right] - \sum_i \frac{\partial^2 H_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top}$$ **Cross-component block** ($k \neq l$): $$\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_l^\top} = -\sum_{i:\,\delta_i=1} \frac{1}{h_{sys}(t_i)^2} \frac{\partial h_k}{\partial \boldsymbol{\theta}_k} \frac{\partial h_l^\top}{\partial \boldsymbol{\theta}_l}$$ The cross-component blocks are free --- they reuse the rate Jacobians already computed for the score. The cumulative hazard Hessian $\partial^2 H_k / \partial \boldsymbol{\theta}_k^2$ is extracted analytically via the all-censored trick on `hess_fn` when available. The rate Hessian $\partial^2 h_k / \partial \boldsymbol{\theta}_k^2$ uses `numDeriv::hessian` per-component per observation. ```{r hessian-demo} H_fn <- hess_loglik(sys2) hess_val <- H_fn(df2, par = coef(result)) evals <- round(eigen(hess_val, symmetric = TRUE)$values, 2) cat("Hessian eigenvalues at MLE:", evals, "\n") cat("Hessian is symmetric:", all.equal(hess_val, t(hess_val)), "\n") ``` All eigenvalues are negative, confirming the MLE is a local maximum. (For non-identifiable models like exponential-only series, one eigenvalue would be near zero.) ## Special Cases ### Exponential Series = Exponential(sum of rates) When all components are exponential with rates $\lambda_1, \ldots, \lambda_m$, the system hazard is constant: $h_{sys}(t) = \sum_j \lambda_j$. This is equivalent to a single exponential with rate $\lambda = \sum_j \lambda_j$. ```{r exp-special} rates <- c(0.1, 0.2, 0.3) sys <- dfr_dist_series(lapply(rates, dfr_exponential)) equiv <- dfr_exponential(sum(rates)) h_sys <- hazard(sys) h_eq <- hazard(equiv) S_sys <- surv(sys) S_eq <- surv(equiv) for (t0 in c(1, 5, 10, 50)) { cat(sprintf("t=%2d: h_sys=%.4f h_eq=%.4f | S_sys=%.6f S_eq=%.6f\n", t0, h_sys(t0), h_eq(t0), S_sys(t0), S_eq(t0))) } ``` ### Identical Weibull Components When all $m$ components are Weibull with the same shape $k$ and scale $\lambda$, the system is also Weibull with shape $k$ and scale $\lambda / m^{1/k}$: ```{r weibull-identical} m <- 3 shape <- 2 scale <- 100 sys <- dfr_dist_series(replicate( m, dfr_weibull(shape = shape, scale = scale), simplify = FALSE )) equiv <- dfr_weibull(shape = shape, scale = scale / m^(1/shape)) S_sys <- surv(sys) S_eq <- surv(equiv) for (t0 in c(10, 30, 50)) { cat(sprintf("t=%2d: S_sys=%.6f S_equiv=%.6f diff=%.2e\n", t0, S_sys(t0), S_eq(t0), abs(S_sys(t0) - S_eq(t0)))) } ``` ### Single Component (Degenerate Case) A series system with one component is equivalent to that component: ```{r single-component} single <- dfr_dist_series(list(dfr_weibull(shape = 2, scale = 100))) direct <- dfr_weibull(shape = 2, scale = 100) h1 <- hazard(single) h2 <- hazard(direct) S1 <- surv(single) S2 <- surv(direct) for (t0 in c(10, 50, 100)) { cat(sprintf("t=%3d: h=%.6f/%.6f S=%.6f/%.6f\n", t0, h1(t0), h2(t0), S1(t0), S2(t0))) } ```