---
title: "Getting Started with HHBayes"
subtitle: "A Complete Walkthrough: Simulate, Fit, Analyze, and Visualize"
author: "Ke Li"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with HHBayes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 10,
  fig.height = 5.5,
  warning = FALSE,
  message = FALSE
)
```

## Overview

**HHBayes** provides an end-to-end pipeline for household infectious disease
transmission research. This vignette walks through a complete analysis — from
defining a study scenario, through simulation, data preparation, Bayesian model
fitting, and visualization — using a single reproducible example.

The pipeline has five stages:

1. **Setup** — Define the study timeline, surveillance data, household structure,
   contact patterns, and interventions.
2. **Simulate** — Generate multi-household outbreak data with realistic viral
   dynamics.
3. **Prepare** — Transform simulated (or observed) data into the format required
   by the Stan model.
4. **Fit** — Run the Bayesian transmission model via Hamiltonian Monte Carlo.
5. **Visualize** — Inspect posteriors, epidemic curves, and transmission chains.

```{r load-packages}
library(HHBayes)
library(dplyr)
library(rstan)
library(ggpubr)
```

---

## Part 1: Study Setup

Before simulating, we define four components: the study timeline, external
surveillance data, the household composition rules, and the within-household
contact structure.

### 1.1 Study Timeline

All simulations are anchored to a calendar start and end date. These dates
define the duration of the simulation in days.

```{r setup-dates}
study_start <- "2024-07-01"
study_end   <- "2025-06-30"

cat("Study duration:",
    as.integer(as.Date(study_end) - as.Date(study_start)) + 1, "days\n")
```

### 1.2 Surveillance Data (Community Forcing)

Community-acquired infections are driven by an external surveillance curve.
Supply a dataframe with two columns: `date` and `cases`. The case counts are
automatically normalized to [0, 1] and interpolated to daily resolution inside
the simulator.

Here we create a synthetic Gaussian-shaped epidemic curve peaking in the middle
of the study:

```{r setup-surveillance}
dates_weekly <- seq(from = as.Date(study_start),
                    to = as.Date(study_end), by = "week")

surveillance_data <- data.frame(
  date  = dates_weekly,
  cases = 0.1 +
    100 * exp(-0.0002 * (as.numeric(dates_weekly - mean(dates_weekly)))^2) +
    abs(rnorm(length(dates_weekly), mean = 0, sd = 10))
)

plot(surveillance_data$date, surveillance_data$cases,
     type = "l", lwd = 2, col = "steelblue",
     xlab = "Date", ylab = "Weekly cases",
     main = "Synthetic Surveillance Curve")
```

If you have real surveillance data, simply replace this dataframe. If no
surveillance data or seasonal forcing is provided, the community force of
infection remains constant throughout the study.

### 1.3 Contact Matrix (Role Mixing Weights)

Within a household, not all members contact each other equally. A
`role_mixing_matrix` defines the relative contact intensity between each pair
of roles. Rows represent the **source** (infector) and columns represent the
**target** (infectee):

```{r setup-contact}
role_mixing_weights <- matrix(c(
# Target:  Infant Toddler Adult Elderly
           0.0,   0.5,    1.0,  0.5,    # Source: Infant
           0.5,   0.9,    0.7,  0.5,    # Source: Toddler
           1.0,   0.7,    0.6,  0.7,    # Source: Adult
           0.5,   0.5,    0.7,  0.0     # Source: Elderly
), nrow = 4, byrow = TRUE,
   dimnames = list(
     c("infant", "toddler", "adult", "elderly"),
     c("infant", "toddler", "adult", "elderly")))

role_mixing_weights
```

**Reading this matrix:**

- **Adult -> Infant = 1.0** (high): Adults are primary caregivers for infants.
- **Toddler -> Toddler = 0.9** (high): Toddler siblings play together
  frequently.
- **Elderly -> Infant = 0.5** (moderate): Some grandparent caregiving.
- **Elderly -> Elderly = 0.0** (none): In this scenario, we assume at most one
  elderly per household, so self-contact is irrelevant.
- **Infant -> Infant = 0.0** (none): Infants do not infect each other directly.

When a household is generated (e.g., with roles `c("adult", "adult", "infant",
"toddler")`), the simulator expands this 4x4 role matrix into a 4x4
individual-level contact matrix by looking up each pair's roles.

### 1.4 Household Composition

Each household is randomly assembled according to a demographic profile. The
`household_profile_list` controls the probability distribution over household
compositions:

```{r setup-household}
household_profile <- list(
  prob_adults   = c(0, 0, 1),        # P(0, 1, 2 adults) — always 2 parents
  prob_infant   = 1.0,               # P(infant present) — always 1 infant
  prob_siblings = c(0, 0.8, 0.2),    # P(0, 1, 2 toddlers) — 80% one, 20% two
  prob_elderly  = c(0.7, 0.1, 0.2)   # P(0, 1, 2 elderly) — 70% none
)
```

With this profile, most households will have **2 adults + 1 infant + 1
toddler** (the most common draw), with some households also including 1-2
elderly members.

You can modify this to represent different demographic settings:

```{r setup-household-examples, eval = FALSE}
# Nuclear Western family
nuclear <- list(
  prob_adults   = c(0.05, 0.15, 0.80),
  prob_infant   = 0.5,
  prob_siblings = c(0.40, 0.45, 0.15),
  prob_elderly  = c(0.95, 0.04, 0.01)
)

# Multi-generational Asian family
multigenerational <- list(
  prob_adults   = c(0, 0, 1),
  prob_infant   = 1.0,
  prob_siblings = c(0.05, 0.30, 0.65),
  prob_elderly  = c(0.20, 0.50, 0.30)
)
```

### 1.5 Intervention Strategies (Covariates)

Interventions are defined via `covariates_config`. Each entry specifies:

- **`name`**: Column name in the output data.
- **`efficacy`**: Fractional reduction (0 to 1). A value of 0.6 means 60%
  reduction.
- **`effect_on`**: What is reduced — `"susceptibility"`, `"infectivity"`, or
  `"both"`.
- **`coverage`**: Role-specific probability of receiving the intervention.

```{r setup-intervention}
sim_config <- list(
  list(
    name      = "vacc_status",
    efficacy  = 0,             # Set to 0 for baseline (no effect)
    effect_on = "both",
    coverage  = list(
      infant  = 0,
      toddler = 0,
      adult   = 0,
      elderly = 0
    )
  )
)
```

In this baseline example, `efficacy = 0` means the vaccination covariate column
is created in the output but has no actual effect on transmission. This is
useful for testing the pipeline before introducing real intervention effects.

**To simulate an actual intervention**, set nonzero efficacy and coverage:

```{r setup-intervention-real, eval = FALSE}
# Real vaccination scenario
vacc_config <- list(
  list(
    name      = "vacc_status",
    efficacy  = 0.5,           # 50% reduction in susceptibility and infectivity
    effect_on = "both",
    coverage  = list(
      infant  = 0.00,          # Not eligible
      toddler = 0.30,
      adult   = 0.80,
      elderly = 0.90
    )
  ),
  # You can add multiple interventions — they stack multiplicatively
  list(
    name      = "masked",
    efficacy  = 0.3,
    effect_on = "both",
    coverage  = list(infant = 0, toddler = 0.1, adult = 0.7, elderly = 0.6)
  )
)
```

**How stacking works:** A vaccinated (efficacy = 0.5) and masked (efficacy =
0.3) adult has their susceptibility multiplied by
`(1 - 0.5) * (1 - 0.3) = 0.35`, i.e., a 65% total reduction.

---

## Part 2: Simulation

### 2.1 Run the Simulator

With all components defined, we run the simulation. Key parameters to note:

- **`viral_testing = "viral load"`**: Uses log10 viral load scale (higher =
  more virus).
- **`model_type = "ODE"`**: Within-host viral dynamics are generated by solving
  a target-cell-limited ODE model (vs `"empirical"` for parametric curves).
- **`infectious_shape/scale`**: Gamma distribution for the infectious period.
  With `shape = 10, scale = 1`, the mean is 10 days.
- **`waning_shape/scale`**: Gamma distribution for post-recovery immunity. With
  `shape = 6, scale = 10`, mean immunity lasts 60 days before waning.
- **`surveillance_interval = 4`**: Routine testing every 4 days.

```{r simulate}
sim_res <- simulate_multiple_households_comm(
  n_households    = 50,
  viral_testing   = "viral load",
  model_type      = "ODE",
  infectious_shape = 10,
  infectious_scale = 1,
  waning_shape    = 6,
  waning_scale    = 10,
  surveillance_interval = 4,
  start_date      = study_start,
  end_date        = study_end,
  surveillance_df = surveillance_data,
  covariates_config      = sim_config,
  household_profile_list = household_profile,
  role_mixing_matrix     = role_mixing_weights,
  seed = 123
)
```

### 2.2 Inspect the Output

The simulation returns a list with two dataframes.

**`hh_df`**: One row per person per infection episode.

```{r inspect-hh}
head(sim_res$hh_df)
```

```{r inspect-hh-str}
cat("Columns:", paste(names(sim_res$hh_df), collapse = ", "), "\n")
cat("Total person-episodes:", nrow(sim_res$hh_df), "\n")
cat("Unique people:", nrow(distinct(sim_res$hh_df, hh_id, person_id)), "\n")
```

**`diagnostic_df`**: Simulated test results at each surveillance time point.

```{r inspect-diag}
head(sim_res$diagnostic_df)
```

```{r inspect-diag-str}
cat("Columns:", paste(names(sim_res$diagnostic_df), collapse = ", "), "\n")
cat("Total test records:", nrow(sim_res$diagnostic_df), "\n")
cat("Positive tests:", sum(sim_res$diagnostic_df$test_result), "\n")
```

### 2.3 Summarize Attack Rates

`summarize_attack_rates()` computes primary attack rates (proportion ever
infected at least once) and reinfection summaries, both overall and by role:

```{r attack-rates}
rates <- summarize_attack_rates(sim_res)
```

**Overall primary attack rate:**

```{r ar-overall}
print(rates$primary_overall)
```

**Primary attack rate by role:**

```{r ar-by-role}
print(rates$primary_by_role)
```

**Reinfection summary:**

```{r ar-reinf}
print(rates$reinf_overall)
print(rates$reinf_by_role)
```

Roles with higher `phi_by_role` (susceptibility) will tend to have higher
attack rates. Reinfections occur when immunity wanes (controlled by
`waning_shape` and `waning_scale`).

### 2.4 Plot the Epidemic Curve

`plot_epidemic_curve()` overlays the simulated infections (stacked bars,
colored by age group) with the external surveillance curve (black line). Both
are aggregated into bins of `bin_width` days:

```{r epi-curve, fig.width=10, fig.height=6}
my_plot <- plot_epidemic_curve(
  sim_res,
  surveillance_data,
  start_date_str = study_start,
  bin_width = 7   # Weekly bins
)
print(my_plot)
```

The left y-axis shows simulated positive samples; the right y-axis shows
surveillance case counts. The two series are independently scaled so their
peaks align visually.

---

## Part 3: Preparing Data for Stan

The Bayesian model requires a specifically formatted input list. The
`prepare_stan_data()` function handles column renaming, infection window
imputation, viral load gap-filling, seasonal forcing construction, covariate
matrix assembly, and prior specification.

### 3.1 Join Covariates to Diagnostic Data

Covariates (e.g., vaccination status) are stored in `hh_df` but need to be
merged into `diagnostic_df` before passing to Stan:

```{r stan-join}
# Extract a 1-row-per-person covariate table
person_covariates <- sim_res$hh_df %>%
  dplyr::select(hh_id, person_id, vacc_status) %>%
  dplyr::distinct()

# Merge into diagnostic data
df_for_stan <- sim_res$diagnostic_df %>%
  dplyr::left_join(person_covariates, by = c("hh_id", "person_id"))

cat("Rows in df_for_stan:", nrow(df_for_stan), "\n")
cat("Columns:", paste(names(df_for_stan), collapse = ", "), "\n")
```

### 3.2 Define Priors

HHBayes supports flexible priors for all key model parameters. Each prior is
specified as a list with `dist` (one of `"normal"`, `"uniform"`, `"lognormal"`)
and `params` (a length-2 vector):

```{r stan-priors}
my_priors <- list(
  beta1      = list(dist = "normal",    params = c(-5, 1)),
  beta2      = list(dist = "normal",    params = c(-5, 1)),
  alpha      = list(dist = "normal",    params = c(-4, 1)),
  covariates = list(dist = "normal",    params = c(0, 2)),
  gen_shape  = list(dist = "lognormal", params = c(1.5, 0.5)),
  gen_rate   = list(dist = "lognormal", params = c(0.0, 0.5)),
  ct50       = list(dist = "normal",    params = c(3, 1)),
  slope      = list(dist = "lognormal", params = c(0.4, 0.5))
)
```

| Prior | Parameter | Distribution | Meaning |
|---|---|---|---|
| `beta1` | Log transmission rate 1 | Normal(-5, 1) | Baseline contact-driven transmission |
| `beta2` | Log transmission rate 2 | Normal(-5, 1) | Viral-load-dependent transmission |
| `alpha` | Log community rate | Normal(-4, 1) | Community importation rate |
| `covariates` | Covariate coefficients | Normal(0, 2) | Effect of interventions (centered at 0 = no effect) |
| `gen_shape` | Generation interval shape | LogNormal(1.5, 0.5) | Shape of the infectiousness profile |
| `gen_rate` | Generation interval rate | LogNormal(0.0, 0.5) | Rate of the infectiousness profile |
| `ct50` | Viral load reference | Normal(3, 1) | Reference point for VL-infectiousness function |
| `slope` | Dose-response slope | LogNormal(0.4, 0.5) | How steeply infectiousness scales with VL |

If any prior is omitted, sensible defaults are used.

### 3.3 Viral Load Imputation Parameters

The Stan data preparation step can fill gaps in observed viral data using
theoretical trajectory curves. These role-specific parameters define the shape
of the double-exponential viral load curve:

```{r stan-vl-params}
VL_params_list <- list(
  adult   = list(v_p = 4.14, t_p = 5.09, lambda_g = 2.31, lambda_d = 2.71),
  infant  = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  toddler = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  elderly = list(v_p = 2.95, t_p = 5.10, lambda_g = 3.15, lambda_d = 0.87)
)
```

| Parameter | Description |
|---|---|
| `v_p` | Peak log10 viral load |
| `t_p` | Time to peak (days post-infection) |
| `lambda_g` | Growth rate (ascending limb) |
| `lambda_d` | Decay rate (descending limb) |

Note that infants reach a higher peak (`v_p = 5.84`) faster (`t_p = 4.09`)
and decay more slowly (`lambda_d = 1.01`) compared to adults.

### 3.4 Build the Stan Input

```{r stan-prepare}
stan_input <- prepare_stan_data(
  df_clean          = df_for_stan,
  surveillance_df   = surveillance_data,
  study_start_date  = as.Date(study_start),
  study_end_date    = as.Date(study_end),
  use_vl_data       = 1,
  model_type        = "ODE",
  ODE_params_list   = NULL,
  covariates_susceptibility = NULL,  # No covariates in this baseline
  covariates_infectivity    = NULL,
  imputation_params = VL_params_list,
  priors            = my_priors,
  role_mixing_matrix = role_mixing_weights
)

cat("Stan data prepared successfully.\n")
cat("Number of individuals (N):", stan_input$N, "\n")
cat("Number of time steps (T):", stan_input$T, "\n")
cat("Number of households (H):", stan_input$H, "\n")
cat("Number of roles (R):", stan_input$R, "\n")
```

The returned `stan_input` is a named list containing all arrays, matrices, and
scalars expected by the Stan model.

---

## Part 4: Fitting the Bayesian Model

### 4.1 Run HMC Sampling

`fit_household_model()` calls `rstan::sampling()` on the pre-compiled Stan
model included with the package. We exclude internal bookkeeping parameters
from the output using `pars` + `include = FALSE`:

```{r fit-model, eval = FALSE}
options(mc.cores = parallel::detectCores())

fit <- fit_household_model(
  stan_input,
  pars    = c("log_phi_by_role_raw",
              "log_kappa_by_role_raw",
              "log_beta1",
              "log_beta2",
              "log_alpha_comm",
              "g_curve_est",
              "V_term_calc"),
  include = FALSE,    # Exclude these internal parameters from saved output
  iter    = 1000,     # For testing; use 2000+ for real analysis
  warmup  = 500,      # For testing; use 1000+ for real analysis
  chains  = 1         # For testing; use 4 for real analysis
)
```

**Recommended settings for publication-quality results:**

```{r fit-model-real, eval = FALSE}
fit <- fit_household_model(
  stan_input,
  pars    = c("log_phi_by_role_raw", "log_kappa_by_role_raw",
              "log_beta1", "log_beta2", "log_alpha_comm",
              "g_curve_est", "V_term_calc"),
  include = FALSE,
  iter    = 2000,
  warmup  = 1000,
  chains  = 4
)
```

### 4.2 View Summary

```{r fit-summary, eval = FALSE}
print(fit, probs = c(0.025, 0.5, 0.975))
```

The summary table shows posterior medians and 95% credible intervals for all
saved parameters. Key parameters to look for:

| Parameter | Interpretation |
|---|---|
| `beta1` | Within-household baseline transmission rate |
| `beta2` | Viral-load-dependent transmission rate |
| `alpha_comm` | Community importation rate |
| `phi_by_role[1..4]` | Susceptibility multipliers (Adult, Infant, Toddler, Elderly) |
| `kappa_by_role[1..4]` | Infectivity multipliers |
| `gen_shape`, `gen_rate` | Generation interval distribution parameters |
| `Ct50`, `slope_ct` | Viral load dose-response curve parameters |

### 4.3 Convergence Diagnostics

Check that all `Rhat` values are close to 1.0 (< 1.05) and that effective
sample sizes (`n_eff`) are adequate:

```{r fit-diagnostics, eval = FALSE}
# Quick check
fit_summary <- rstan::summary(fit)$summary
cat("Max Rhat:", max(fit_summary[, "Rhat"], na.rm = TRUE), "\n")
cat("Min n_eff:", min(fit_summary[, "n_eff"], na.rm = TRUE), "\n")

# Trace plots (requires bayesplot)
if (requireNamespace("bayesplot", quietly = TRUE)) {
  bayesplot::mcmc_trace(fit, pars = c("beta1", "beta2", "alpha_comm"))
}
```

If `Rhat > 1.05` or you see divergent transitions, try increasing `iter`,
`warmup`, or the `adapt_delta` control parameter (default is 0.95 in
`fit_household_model`).

---

## Part 5: Visualization

### 5.1 Posterior Distributions

`plot_posterior_distributions()` shows violin + box plots of the posterior
distributions of role-specific susceptibility (`phi`) and infectivity
(`kappa`) on a log10 scale:

```{r plot-posteriors, eval = FALSE}
p_post <- plot_posterior_distributions(fit)
print(p_post)
```

The dashed horizontal line at 0 represents the reference level (log10(1) = 0).
Roles above the line are more susceptible/infectious than the baseline; roles
below are less.

### 5.2 Covariate Effects (Forest Plot)

If covariates were included in the model, `plot_covariate_effects()` shows
their posterior effect estimates:

```{r plot-covariates, eval = FALSE}
plot_covariate_effects(fit, stan_input)
```

Each row is a covariate. The point shows the posterior median, and the
horizontal bar shows the 95% credible interval. An interval that crosses zero
indicates no statistically clear effect.

### 5.3 Transmission Chain Reconstruction

`reconstruct_transmission_chains()` uses posterior parameter estimates to
compute the probability that each infected person was infected by each
possible source (or from the community):

```{r plot-chains, eval = FALSE}
chains <- reconstruct_transmission_chains(
  fit,
  stan_input,
  min_prob_threshold = 0.001   # Keep links with >= 0.1% probability
)

head(chains)
```

The output has one row per potential link:

| Column | Description |
|---|---|
| `target` | Index of the infected person |
| `hh_id` | Household ID |
| `day` | Day of infection |
| `source` | Index of the infector, or `"Community"` |
| `prob` | Posterior probability of this transmission link |

### 5.4 Household Timeline

`plot_household_timeline()` visualizes the infection history of a single
household. Each person is shown as a horizontal track, colored by role. Infected
periods are highlighted, and arrows show probable transmission links:

```{r plot-timeline, eval = FALSE}
p_hh <- plot_household_timeline(
  chains,
  stan_input,
  target_hh_id = 11     # Plot household #11
)
print(p_hh)
```

You can adjust which links are shown using `prob_cutoff` (e.g., `prob_cutoff
= 0.05` to show only links with at least 5% probability):

```{r plot-timeline-filtered, eval = FALSE}
p_hh_filtered <- plot_household_timeline(
  chains, stan_input,
  target_hh_id = 11,
  prob_cutoff  = 0.05,
  plot_width   = 11,
  plot_height  = 7
)
print(p_hh_filtered)
```

---

## Appendix A: Complete Script

For convenience, here is the entire pipeline as a single copy-pasteable script:

```{r full-script, eval = FALSE}
library(HHBayes)
library(dplyr)
library(rstan)
library(ggpubr)

# ==============================================================================
# 1. SETUP
# ==============================================================================

study_start <- "2024-07-01"
study_end   <- "2025-06-30"

# Surveillance data
dates_weekly <- seq(as.Date(study_start), as.Date(study_end), by = "week")
surveillance_data <- data.frame(
  date  = dates_weekly,
  cases = 0.1 + 100 * exp(-0.0002 * (as.numeric(dates_weekly -
    mean(dates_weekly)))^2) + abs(rnorm(length(dates_weekly), 0, 10))
)

# Contact matrix
role_mixing_weights <- matrix(c(
  0.0, 0.5, 1.0, 0.5,
  0.5, 0.9, 0.7, 0.5,
  1.0, 0.7, 0.6, 0.7,
  0.5, 0.5, 0.7, 0.0
), nrow = 4, byrow = TRUE,
dimnames = list(
  c("infant", "toddler", "adult", "elderly"),
  c("infant", "toddler", "adult", "elderly")))

# Household profile
household_profile <- list(
  prob_adults   = c(0, 0, 1),
  prob_infant   = 1.0,
  prob_siblings = c(0, 0.8, 0.2),
  prob_elderly  = c(0.7, 0.1, 0.2)
)

# Intervention (baseline: no effect)
sim_config <- list(
  list(name = "vacc_status", efficacy = 0, effect_on = "both",
       coverage = list(infant = 0, toddler = 0, adult = 0, elderly = 0))
)

# ==============================================================================
# 2. SIMULATE
# ==============================================================================

sim_res <- simulate_multiple_households_comm(
  n_households = 50, viral_testing = "viral load", model_type = "ODE",
  infectious_shape = 10, infectious_scale = 1,
  waning_shape = 6, waning_scale = 10,
  surveillance_interval = 4,
  start_date = study_start, end_date = study_end,
  surveillance_df = surveillance_data,
  covariates_config = sim_config,
  household_profile_list = household_profile,
  role_mixing_matrix = role_mixing_weights,
  seed = 123
)

rates <- summarize_attack_rates(sim_res)
print(rates$primary_by_role)

plot_epidemic_curve(sim_res, surveillance_data,
                    start_date_str = study_start, bin_width = 7)

# ==============================================================================
# 3. PREPARE FOR STAN
# ==============================================================================

person_covariates <- sim_res$hh_df %>%
  select(hh_id, person_id, vacc_status) %>% distinct()

df_for_stan <- sim_res$diagnostic_df %>%
  left_join(person_covariates, by = c("hh_id", "person_id"))

my_priors <- list(
  beta1 = list(dist = "normal", params = c(-5, 1)),
  beta2 = list(dist = "normal", params = c(-5, 1)),
  alpha = list(dist = "normal", params = c(-4, 1)),
  covariates = list(dist = "normal", params = c(0, 2)),
  gen_shape = list(dist = "lognormal", params = c(1.5, 0.5)),
  gen_rate  = list(dist = "lognormal", params = c(0.0, 0.5)),
  ct50  = list(dist = "normal", params = c(3, 1)),
  slope = list(dist = "lognormal", params = c(0.4, 0.5))
)

VL_params_list <- list(
  adult   = list(v_p = 4.14, t_p = 5.09, lambda_g = 2.31, lambda_d = 2.71),
  infant  = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  toddler = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  elderly = list(v_p = 2.95, t_p = 5.10, lambda_g = 3.15, lambda_d = 0.87)
)

stan_input <- prepare_stan_data(
  df_clean = df_for_stan, surveillance_df = surveillance_data,
  study_start_date = as.Date(study_start),
  study_end_date = as.Date(study_end),
  use_vl_data = 1, model_type = "ODE",
  imputation_params = VL_params_list, priors = my_priors,
  role_mixing_matrix = role_mixing_weights
)

# ==============================================================================
# 4. FIT MODEL
# ==============================================================================

options(mc.cores = parallel::detectCores())

fit <- fit_household_model(
  stan_input,
  pars = c("log_phi_by_role_raw", "log_kappa_by_role_raw",
           "log_beta1", "log_beta2", "log_alpha_comm",
           "g_curve_est", "V_term_calc"),
  include = FALSE,
  iter = 1000, warmup = 500, chains = 1
)

# ==============================================================================
# 5. RESULTS
# ==============================================================================

print(fit, probs = c(0.025, 0.5, 0.975))
plot_posterior_distributions(fit)

chains <- reconstruct_transmission_chains(fit, stan_input,
                                           min_prob_threshold = 0.001)
plot_household_timeline(chains, stan_input, target_hh_id = 11)
```

---

## Session Info

```{r session-info}
sessionInfo()
```