---
title: "Series System Distributions: Overview"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Series System Distributions: Overview}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## The Problem

Consider a server with three independent failure modes:

- **Disk** — mechanical wear follows a Weibull distribution
- **Memory** — random bit-flip failures follow an exponential distribution
- **Power supply** — aging degradation follows a Gompertz distribution

The server fails when *any* component fails. This is a **series system** — like a chain that breaks at its weakest link. The `serieshaz` package lets you compose these failure modes into a single distribution that captures the full system behavior.

## What Is a Series System?

Given $m$ independent components with hazard functions $h_1(t), \ldots, h_m(t)$, the series system hazard is simply the **sum**:

$$h_{sys}(t) = \sum_{j=1}^{m} h_j(t, \theta_j)$$

Equivalently, the system survival is the **product** of component survivals:

$$S_{sys}(t) = \prod_{j=1}^{m} S_j(t, \theta_j)$$

This follows directly from component independence: the probability that the system survives to time $t$ is the probability that *all* components survive to time $t$.

## Quick Start

```{r quick-start}
library(serieshaz)

# Define three failure mode components
disk    <- dfr_weibull(shape = 2, scale = 500)
memory  <- dfr_exponential(0.001)
psu     <- dfr_gompertz(a = 0.0001, b = 0.02)

# Compose into a series system
server <- dfr_dist_series(list(disk, memory, psu))
print(server)
```

The system object has 5 parameters total: 2 from Weibull + 1 from exponential + 2 from Gompertz. You can see how they map to components:

```{r layout}
param_layout(server)
```

## Everything from flexhaz Works

Because `dfr_dist_series` inherits from `dfr_dist`, all standard distribution methods work out of the box — no special code needed.

### Hazard and Survival

```{r hazard-surv}
h <- hazard(server)
S <- surv(server)

# Evaluate at t = 100 hours
cat(sprintf("System hazard at t=100:  %.6f\n", h(100)))
cat(sprintf("System survival at t=100: %.4f\n", S(100)))
```

The hazard is low ($\approx 0.0025$) because the Weibull scale is 500 and the exponential rate is only 0.001. The survival probability of $\approx 84\%$ at $t = 100$ tells us most servers are still running at 100 hours.

### CDF and Density

```{r cdf-density}
F_sys <- cdf(server)
f_sys <- density(server)

cat(sprintf("P(T <= 100) = %.4f\n", F_sys(100)))
cat(sprintf("f(100)      = %.6f\n", f_sys(100)))
```

The CDF and survival sum to 1: $F(100) + S(100) = 1$. The density $f(t) = h(t) \cdot S(t)$ gives the instantaneous failure rate weighted by the probability of surviving to that point.

### Sampling

```{r sampling}
samp <- sampler(server)
set.seed(42)
times <- samp(5)
times
```

### Log-Likelihood

```{r loglik}
ll <- loglik(server)
df <- data.frame(t = c(100, 200, 150, 300, 250), delta = c(1, 1, 0, 1, 0))
ll(df)
```

### Fitting (MLE)

Fitting a 5-parameter model (Weibull + exponential + Gompertz) from system-level data alone is challenging --- the components' hazard contributions overlap, making individual parameters hard to identify. For a cleaner demonstration, we fit a simpler 2-component model:

```{r fit-demo}
# Simpler 2-component model for a clean fit demo
fit_model <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01)
))
true_par <- c(2, 100, 0.01)

set.seed(42)
fit_samp <- sampler(fit_model)
train <- data.frame(t = fit_samp(500), delta = rep(1L, 500))

solver <- fit(fit_model)
result <- suppressWarnings(solver(train, par = c(1.5, 80, 0.02)))

cat("True parameters:  ", true_par, "\n")
cat("Fitted parameters:", round(coef(result), 3), "\n")
```

With 500 observations and a mixed-type model (Weibull + exponential), the MLE recovers the parameters reasonably well. The Weibull's increasing hazard shape is distinguishable from the exponential's constant hazard, making this system identifiable. See `vignette("series-fitting")` for a thorough treatment of identifiability, censoring, and model selection.

## System Introspection

The package provides functions specifically for understanding series systems.

### Component Count and Extraction

```{r introspection}
ncomponents(server)  # 3

# Extract the Weibull (disk) component
disk_comp <- component(server, 1)
params(disk_comp)
```

### Component Hazard Decomposition

```{r hazard-decomp}
# Get per-component hazard closures
h1 <- component_hazard(server, 1)  # disk
h2 <- component_hazard(server, 2)  # memory
h3 <- component_hazard(server, 3)  # PSU

# At t = 200, which component contributes most?
t0 <- 200
hazards <- c(Disk = h1(t0), Memory = h2(t0), PSU = h3(t0))
cat(sprintf("Disk:   %.6f (%.1f%%)\n", hazards[1], 100 * hazards[1] / sum(hazards)))
cat(sprintf("Memory: %.6f (%.1f%%)\n", hazards[2], 100 * hazards[2] / sum(hazards)))
cat(sprintf("PSU:    %.6f (%.1f%%)\n", hazards[3], 100 * hazards[3] / sum(hazards)))
cat(sprintf("System: %.6f\n", h(t0)))
cat(sprintf("Sum:    %.6f (matches system)\n", sum(hazards)))
```

At $t = 200$, the PSU's Gompertz degradation dominates system risk (~68%), with the disk's Weibull wear-out contributing ~20% and the memory's constant exponential hazard only ~12%. This decomposition is the key advantage of series system modeling: it reveals *which* failure mode drives system risk at any given age. Early on, the constant memory hazard is the largest contributor, but the accelerating Gompertz term overtakes it by around $t = 150$.

### Failure Attribution via Sampling

```{r failure-attr}
set.seed(42)
mat <- sample_components(server, n = 10000)

# System lifetime = min across components
t_sys <- apply(mat, 1, min)

# Which component caused each failure?
failing <- apply(mat, 1, which.min)
cat("Failure proportions:\n")
round(table(failing) / length(failing), 3)
```

## The Ecosystem

`serieshaz` builds on three packages:

- **[algebraic.dist](https://github.com/queelius/algebraic.dist)** — Base distribution generics: `hazard`, `surv`, `cdf`, `inv_cdf`, `sampler`, `params`
- **[likelihood.model](https://github.com/queelius/likelihood.model)** — Statistical inference: `loglik`, `score`, `hess_loglik`, `fit`, `assumptions`
- **[flexhaz](https://github.com/queelius/flexhaz)** — DFR distributions: `dfr_dist`, `dfr_exponential`, `dfr_weibull`, `dfr_gompertz`, `dfr_loglogistic`

The inheritance chain is: `dfr_dist_series` → `dfr_dist` → `likelihood_model` → `univariate_dist` → `dist`. Every method defined at any level in this chain works automatically on series systems.

## Where to Go Next

- `vignette("series-math")` — Mathematical foundations and derivations
- `vignette("series-fitting")` — MLE fitting workflows, identifiability, and diagnostics
- `vignette("series-advanced")` — Nested systems, custom components, and failure attribution