--- title: "Replicating the Paper Results" author: "Avishek Bhandari" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Replicating the Paper Results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, fig.align = "center" ) ``` # Overview This vignette walks through the full replication pipeline that backs the paper *"Two-Stage Detection and Attribution of Cross-Border Financial Contagion Channels"*. The package `contagionchannels` exposes a deliberately small set of verbs that map one-to-one onto the empirical sections of the manuscript: a Stage 1 *detection* step that estimates a directed Wavelet-Quantile Transfer Entropy (WQTE) network, and a Stage 2 *attribution* step that decomposes the detected linkages into five economically interpretable channels (Trade, Financial, Geopolitical, Behavioral, Monetary_Policy) using a battery of identification strategies. The headline numbers reproduced below are the ones reported in the published tables. Every code chunk is annotated so that a careful reader can follow how each row of Tables 1, 2, and 6 is built, and how Figures 1-7 are assembled. We intentionally make the most expensive Stage 1 estimation `eval = FALSE` so the vignette compiles in seconds; all illustrative chunks use a single sub-period to keep run-time below CRAN's check threshold. # 1. Setup ```{r dependencies} library(contagionchannels) library(xts) library(dplyr) library(tidyr) library(ggplot2) library(igraph) ``` The package leans on a handful of well-established CRAN dependencies: `waveslim` for the maximal-overlap discrete wavelet transform, `quantreg` for the conditional quantile machinery used inside WQTE, `RTransferEntropy` for shuffled bias correction, `AER` for IV/2SLS, `hdm` for Belloni-Chernozhukov- Hansen LASSO IV, `lpirfs` for local projections, and `igraph` for community detection. The `run_contagion_pipeline()` wrapper handles loading them on demand; we expose the full chain here so each step is auditable. # 2. Loading the bundled datasets Three datasets ship with the package and are sufficient to reproduce every empirical statement in the paper. ```{r data-load} data(g20_returns) data(channel_proxies) data(crisis_periods) dim(g20_returns) range(index(g20_returns)) length(crisis_periods) names(crisis_periods) ``` `g20_returns` is an `xts` object of daily log returns for 18 G20 equity indices spanning 2 January 2006 through 31 March 2026 (5,036 trading days after holiday alignment). `channel_proxies` is a `data.frame` keyed on `Date` holding the raw component series that feed the five composite channels. `crisis_periods` is a named `list` of length eight whose entries are length-two `Date` vectors marking the start and end of each sub-period: `PreCrisis`, `GFC`, `ESDC`, `CSC`, `PreCOVID`, `COVID`, `RusUkr`, and `MidEastTariffs`. # 3. Building the v2 channel composites The paper uses a v2 specification for the five channel composites that re-balances the components and applies a unit-variance standardisation within each rolling window. The helper `build_channel_composites()` consumes the raw proxy grid and returns a `data.frame` with one column per channel. ```{r composites} channels <- build_channel_composites(channel_proxies) head(channels[, c("Date", "Trade", "Financial", "Geopolitical", "Behavioral", "Monetary_Policy")], 3) ``` Each composite is a unit-variance latent factor extracted by a one-factor PCA on its respective component block. The composites are signed so that positive values indicate *tightening* of the channel (e.g., wider FRA-OIS spreads on the Financial channel; higher GPR on the Geopolitical channel), ensuring sign coherence across periods. # 4. Stage 1: WQTE detection Stage 1 estimates a directed information-flow matrix at wavelet scale 5 (corresponding to dyadic horizons of 32-64 trading days, i.e. roughly the quarterly business-cycle band) and at the median quantile `tau = 0.50`. For each ordered pair $(i,j)$ of markets we compute the bias-corrected wavelet coefficient transfer entropy \[ \widehat{T}_{i \to j}^{(s,\tau)} \;=\; T_{i \to j}^{(s,\tau)} \;-\; \frac{1}{B}\sum_{b=1}^{B} T_{i^{(b)} \to j}^{(s,\tau)}, \] where the second term is the mean over `B = 100` shuffled-source surrogates. ```{r stage1-full, eval = FALSE} F_full <- compute_wqte_matrix( returns = g20_returns, scale = 5, tau = 0.50, n_cores = 4 ) ``` A fast illustrative version restricted to the Pre-Crisis sub-period is small enough to evaluate inline: ```{r stage1-precrisis} pc_dates <- crisis_periods$PreCrisis returns_pc <- g20_returns[paste0(pc_dates[1], "/", pc_dates[2])] F_pc <- compute_wqte_matrix( returns = returns_pc, scale = 5, tau = 0.50, n_cores = 1 ) dim(F_pc) round(F_pc[1:4, 1:4], 4) ``` ## Absolute thresholding Rather than a period-specific adaptive cut, the paper fixes the WQTE threshold at the 75th percentile of the *Pre-Crisis* WQTE distribution and applies that absolute level to every other sub-period. This makes density comparable across regimes. ```{r threshold} F_pc_offdiag <- F_pc[upper.tri(F_pc) | lower.tri(F_pc)] abs_thr <- quantile(F_pc_offdiag, probs = 0.75, na.rm = TRUE) abs_thr ``` # 5. Stage 1 results: density and centrality (Table 1) With the threshold pinned, network density at scale 5 / `tau = 0.50` ranges from **14.05% to 32.03%** across the eight sub-periods, providing meaningful period-to-period variation. The helper `summarise_stage1()` returns a tidy table with density, mean WQTE, and the dominant transmitter / receiver per period. ```{r stage1-table, eval = FALSE} stage1_tbl <- summarise_stage1( returns_xts = g20_returns, periods = crisis_periods, scale = 5, tau = 0.50, abs_threshold = abs_thr ) stage1_tbl ``` The expected contents reproduce Table 1 of the paper: | Period | Density | Mean WQTE | Top Transmitter | Top Receiver | |-----------------|--------:|----------:|:----------------|:-------------| | PreCrisis | 0.2516 | 0.0287 | USA | EUR | | GFC | 0.3203 | 0.0421 | USA | KOR | | ESDC | 0.2871 | 0.0356 | DEU | ITA | | CSC | 0.1763 | 0.0224 | CHN | BRA | | PreCOVID | 0.1405 | 0.0198 | USA | DEU | | COVID | 0.2944 | 0.0388 | USA | GBR | | RusUkr | 0.2031 | 0.0254 | DEU | TUR | | MidEastTariffs | 0.1842 | 0.0231 | USA | SAU | # 6. Stage 2: IV/2SLS attribution (Table 2) The Stage 2 attribution regresses each detected directional flow $\widehat{F}_{i\to j,t}$ on the five channel composites using channel-specific external instruments. For the Financial channel the instrument is the lagged FRA-OIS spread; for Trade, lagged Baltic Dry shipping rates; for Geopolitical, lagged GPR-Daily; for Behavioral, lagged VIX innovation; for Monetary_Policy, lagged shadow-rate surprises. The function `iv_2sls_attribute()` returns a list with point estimates, robust standard errors, the Sargan-Hansen J-statistic, and a per-channel share decomposition. ```{r stage2-pc, eval = FALSE} links_pc <- which(F_pc >= abs_thr, arr.ind = TRUE) channels_pc <- channels[channels$Date >= pc_dates[1] & channels$Date <= pc_dates[2], ] iv_pc <- iv_2sls_attribute( returns_period = returns_pc, channels_period = channels_pc, links = links_pc, cluster_se = TRUE ) iv_pc$shares ``` Iterating over all eight sub-periods reproduces Table 2: | Period | Trade | Financial | Geopolitical | Behavioral | Monetary | |-----------------|------:|----------:|-------------:|-----------:|---------:| | PreCrisis | 0.184 | **0.359**| 0.092 | 0.143 | 0.222 | | GFC |**0.279**| 0.241| 0.108 | 0.198 | 0.174 | | ESDC | 0.156 | **0.395**| 0.121 | 0.142 | 0.186 | | CSC | 0.198 | 0.221 | 0.143 | 0.185 |**0.253** | | PreCOVID | 0.181 | **0.316**| 0.116 | 0.184 | 0.203 | | COVID | 0.142 | 0.237 | **0.275** | 0.175 | 0.171 | | RusUkr | 0.198 | 0.213 | 0.193 | 0.120 |**0.276** | | MidEastTariffs | 0.211 | 0.182 | 0.156 | 0.133 |**0.318** | The dominant channel is bolded per period and matches the paper's headline: **Pre-Crisis Financial 35.9%**, **GFC Trade 27.9%**, **ESDC Financial 39.5%**, **CSC Monetary 25.3%**, **Pre-COVID Financial 31.6%**, **COVID Geopolitical 27.5%**, **RusUkr Monetary 27.6%**, and **MidEastTariffs Monetary 31.8%**. Note also that the Behavioral channel never exceeds 22% in any period. # 7. Cross-method comparison: LP and Rigobon (Table 6) Identification rests on more than one strategy. The paper triangulates across IV/2SLS, Jordà local projections at horizon 5 (LP-h5), and Rigobon's heteroskedasticity-based identification. ```{r lp-rigobon, eval = FALSE} lp_pc <- local_projections( returns_period = returns_pc, channels_period = channels_pc, links = links_pc, horizons = c(1, 5, 22) ) rig_pc <- rigobon_id( returns_period = returns_pc, channels_period = channels_pc, links = links_pc, regime_split = "vix_high_low" ) lp_pc$shares_h5 rig_pc$shares ``` A period is labelled **identification-robust** when the dominant channel agrees across IV/2SLS, LP-h5, and Rigobon. Reproducing Table 6 of the paper: | Period | IV/2SLS dom. | LP-h5 dom. | Rigobon dom. | Status | |-----------------|:---------------|:-------------|:-------------|:------------------| | PreCrisis | Financial | Financial | Financial | **Robust** | | GFC | Trade | Behavioral | Financial | Fragile | | ESDC | Financial | Financial | Financial | **Robust** | | CSC | Monetary | Behavioral | Monetary | Fragile | | PreCOVID | Financial | Trade | Financial | Fragile | | COVID | Geopolitical | Behavioral | Trade | Fragile | | RusUkr | Monetary | Geopolitical | Monetary | Fragile | | MidEastTariffs | Monetary | Trade | Monetary | Fragile | Only **Pre-Crisis** and **ESDC** are identification-robust; the remaining six periods are method-fragile and must be discussed with appropriate caveats. ## Sargan over-identification For the most contested periods, the J-test rejects exogeneity in the majority of links. The Sargan rejection rates reported in the paper are **67.3% (GFC)**, **100% (COVID)**, and **65.5% (ESDC)**; on this basis the GFC and COVID rows are demoted to *exploratory* in the published narrative. ```{r sargan-table, eval = FALSE} sargan_rates <- summarise_sargan( returns_xts = g20_returns, channels = channels, periods = crisis_periods, abs_threshold = abs_thr ) sargan_rates[, c("Period", "RejectRate")] ``` # 8. Bootstrap confidence intervals Each Stage 2 share is paired with a wild-cluster bootstrap interval. The default uses 999 Rademacher draws clustered at the directional-link level; this is computationally heavy and we mark it `eval = FALSE`. ```{r bootstrap, eval = FALSE} boot_pc <- bootstrap_attribution( fit = iv_pc, B = 999, type = "wild_cluster", cluster = "link" ) boot_pc$ci_95 ``` # 9. Cinelli-Hazlett robustness values The robustness value (RV) reports the *minimum* unobserved-confounder strength required to overturn an estimated channel share. A high RV indicates a finding that is hard to break by an omitted variable; the paper flags any share with `RV >= 0.20` as *quantitatively robust*. ```{r rv, eval = FALSE} rv_pc <- cinelli_hazlett_rv( theta = iv_pc$shares, se = iv_pc$se, df = iv_pc$df_residual ) round(rv_pc, 3) ``` For the Pre-Crisis Financial share the RV is well above 0.20, consistent with Table 7 of the paper. # 10. Visualisation walkthrough (Figures 1-7) The package bundles three plotting helpers built on `ggplot2` and a `network`/`igraph` back-end. Each one corresponds to a numbered figure in the manuscript. ```{r fig-attribution, eval = FALSE} plot_attribution_stack( shares_long = bind_rows(lapply(crisis_periods, function(p) iv_pc$shares)), period_order = names(crisis_periods) ) # Figure 4: stacked attribution shares ``` ```{r fig-qte, eval = FALSE} plot_qte_intensity( F_matrix = F_pc, threshold = abs_thr ) # Figure 2: WQTE heatmap ``` ```{r fig-rv, eval = FALSE} plot_robustness_value( rv_table = rv_pc, period = "PreCrisis" ) # Figure 7: RV bounding contours ``` Figures 1, 3, 5, and 6 are produced by `plot_pipeline_summary()` which arranges all panels into the multi-figure layout used in the paper. # 11. Walktrap communities Stage 1's directed adjacency is fed into igraph's walktrap algorithm to detect mesoscale communities. With four random walks of length four, the Pre-Crisis network resolves into three communities tightly aligned with the Anglo, EU, and EM blocs. ```{r communities, eval = FALSE} g_pc <- build_network(F_pc, threshold = abs_thr) comms_pc <- walktrap_communities(g_pc, steps = 4) table(membership(comms_pc)) ``` Modularity scores per period are tabulated by `summarise_communities()` and match Table 5 of the paper. # 12. End-to-end pipeline The convenience wrapper `run_contagion_pipeline()` chains every step above into a single call. It returns a tagged list with Stage 1 and Stage 2 outputs for every sub-period, the bootstrap intervals, the RV grid, and a ready-to-print summary table. ```{r pipeline, eval = FALSE} results <- run_contagion_pipeline( returns = g20_returns, channels = channels, periods = crisis_periods, scale = 5, tau = 0.50, abs_threshold = abs_thr, methods = c("iv2sls", "lasso_iv", "lp", "rigobon"), bootstrap_B = 999, n_cores = 4 ) names(results) results$summary_table ``` The `results` object is the canonical artefact for downstream analysis; the companion replication archive on Zenodo distributes a serialised version that exactly matches the published tables. With this vignette in hand, every row of Tables 1, 2, 6, and 7, every panel of Figures 1-7, and every sensitivity reported in the appendix can be regenerated from the bundled data with no external inputs. # Session info ```{r session} sessionInfo() ```