---
title: "Reproducibility: identical inputs, identical outputs"
author: "Package cre.dcf"
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Reproducibility: identical inputs, identical outputs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(cre.dcf)
library(yaml)

```

## Purpose

This vignette evaluates strict reproducibility of run_case() under identical inputs. In a deterministic DCF pipeline, two invocations on the same configuration should yield byte-for-byte identical results for core metrics and tabular outputs. The checks proceed in two layers:

Exact identity via serialization (byte-level comparison).

Numerical guardrail printing (max absolute differences) to diagnose any minute deviations that may occur under heterogeneous BLAS/lapack builds or platform-specific floating-point behavior.

Passing the first layer establishes computational determinism; the second layer is provided for transparency.

## Load and validate a canonical configuration

```{r}
# 1) Load a canonical configuration

cfg_path <- system.file("extdata", "preset_default.yml", package = "cre.dcf")
stopifnot(nzchar(cfg_path))

cfg <- yaml::read_yaml(cfg_path)
stopifnot(is.list(cfg), length(cfg) > 0)

# Optional but recommended: validate before use

cfg <- cre.dcf::cfg_validate(cfg)

cat("✓ Configuration loaded and validated.\n")

```

## Two runs under identical conditions

```{r}
# 2) Run twice under identical conditions

case1 <- run_case(cfg)
case2 <- run_case(cfg)

# 3) Build compact metric vectors (named for clarity)

m1 <- c(
irr_project = case1$all_equity$irr_project,
npv_project = case1$all_equity$npv_project,
irr_equity  = case1$leveraged$irr_equity,
npv_equity  = case1$leveraged$npv_equity
)

m2 <- c(
irr_project = case2$all_equity$irr_project,
npv_project = case2$all_equity$npv_project,
irr_equity  = case2$leveraged$irr_equity,
npv_equity  = case2$leveraged$npv_equity
)

# 4) Byte-for-byte identity on metrics (primary assertion)

bytes_equal_metrics <- identical(serialize(m1, NULL), serialize(m2, NULL))
stopifnot(bytes_equal_metrics)

# 5) Byte-for-byte identity on key tables (secondary assertions)

# (a) all-equity cashflows

ae1 <- case1$all_equity$cashflows
ae2 <- case2$all_equity$cashflows

# (b) leveraged cashflows

lv1 <- case1$leveraged$cashflows
lv2 <- case2$leveraged$cashflows

# (c) comparison summary (if present)

sm1 <- case1$comparison$summary
sm2 <- case2$comparison$summary

stopifnot(
identical(serialize(ae1, NULL), serialize(ae2, NULL)),
identical(serialize(lv1, NULL), serialize(lv2, NULL)),
identical(serialize(sm1, NULL), serialize(sm2, NULL))
)

cat(
"\nReproducibility diagnostics (byte-level):\n",
"  • Core metrics: byte-level identity confirmed.\n",
"  • All-equity cashflows: byte-level identity confirmed.\n",
"  • Leveraged cashflows: byte-level identity confirmed.\n",
"  • Comparison summary: byte-level identity confirmed.\n"
)

```

These assertions guarantee that, on the current platform, run_case() behaves as a deterministic mapping from the YAML configuration to the computed outputs.

## Numerical drift diagnostics (for transparency)

To document the absence (or presence) of any small numerical differences, we compute maximum absolute differences on numeric columns. If byte-level identity holds, all these diagnostics should be exactly zero; they are nevertheless informative if a future platform breaks strict identity.

```{r}
max_abs_diff <- function(x, y) {

# Vector case

if (is.null(dim(x)) && is.null(dim(y))) {
if (is.numeric(x) && is.numeric(y) && length(x) == length(y)) {
return(max(abs(x - y), na.rm = TRUE))
} else {
return(NA_real_)
}
}

# Data frame case

if (is.data.frame(x) && is.data.frame(y)) {
common <- intersect(names(x), names(y))
if (length(common) == 0L) return(NA_real_)

numeric_cols <- common[
  vapply(x[common], is.numeric, TRUE) &
  vapply(y[common], is.numeric, TRUE)
]

if (length(numeric_cols) == 0L) return(NA_real_)

mx <- 0
for (nm in numeric_cols) {
  d <- max(abs(x[[nm]] - y[[nm]]), na.rm = TRUE)
  if (is.finite(d) && d > mx) mx <- d
}
return(mx)


}

# Fallback

NA_real_
}

cat(sprintf("  • Max |Δ| metrics (numeric drift): %s\n",
formatC(max_abs_diff(m1, m2), format = "g")))
cat(sprintf("  • Max |Δ| AE cashflows: %s\n",
formatC(max_abs_diff(ae1, ae2), format = "g")))
cat(sprintf("  • Max |Δ| LV cashflows: %s\n",
formatC(max_abs_diff(lv1, lv2), format = "g")))
cat(sprintf("  • Max |Δ| comparison summary: %s\n",
formatC(max_abs_diff(sm1, sm2), format = "g")), "\n")

```

## Interpretation

The results obtained in this vignette show that:

Core scalar metrics (irr_project, npv_project, irr_equity, npv_equity) are exactly identical at the byte level across two runs with the same configuration.

Key tabular outputs (all-equity cashflows, leveraged cashflows, comparison summary) are also byte-identical.

The maximum absolute differences on numeric entries are zero, confirming the absence of hidden numerical drift in this environment.

This establishes that run_case() behaves as a deterministic map from a validated YAML configuration to a structured output object, which is a strong form of reproducibility in computational finance:

identical YAML files ⇒ identical DCF outputs,

any divergence in valuation metrics must therefore be traced to explicit differences in configuration, not to stochastic elements or unstable numerical routines.