---
title: "Nowcasting revisions using the Jacobs-Van Norden model"
output: rmarkdown::html_vignette
bibliography: references.bib
biblio-style: apalike
link-citations: true
vignette: >
  %\VignetteIndexEntry{Nowcasting revisions using the Jacobs-Van Norden model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

rebuild_vignette_results <- identical(
  tolower(Sys.getenv("REVISER_REBUILD_VIGNETTE_RESULTS")),
  "true"
)

vignette_result_path <- function(name) {
  candidates <- c(
    file.path("vignettes", "precomputed", name),
    file.path("precomputed", name)
  )
  existing <- candidates[file.exists(candidates)]

  if (length(existing) > 0) {
    existing[[1]]
  } else {
    candidates[[1]]
  }
}

load_or_build_vignette_result <- function(name, builder) {
  path <- vignette_result_path(name)

  if (!rebuild_vignette_results) {
    if (!file.exists(path)) {
      stop("Missing precomputed vignette result: ", basename(path))
    }

    return(readRDS(path))
  }

  result <- builder()
  dir.create(dirname(path), recursive = TRUE, showWarnings = FALSE)
  saveRDS(result, path)
  result
}
```

This vignette describes the Jacobs-Van Norden (JVN) revision model as
implemented in `reviser::jvn_nowcast()`. The presentation follows the same
Durbin-Koopman state-space notation used in the KK vignette: observations are
linked to latent states through \(Z\), state dynamics through \(T\), and
innovations through \(R\), \(H\), and \(Q\) [@durbinTimeSeriesAnalysis2012].

The key idea of the JVN framework is that revision errors are not treated as a
single residual. Instead, they are decomposed into **news** and **noise**.
News corresponds to genuinely new information incorporated by later releases,
whereas noise corresponds to transitory measurement error that is corrected in
subsequent vintages [@jacobsModelingDataRevisions2011].

## Revision decomposition

Let \(l\) denote the number of vintages used in the model and let
\(y_t^{t+j}\) be the estimate for reference period \(t\) available in vintage
\(t+j\). Stack the vintages into

$$
y_t =
\begin{bmatrix}
y_t^{t+1} \\
y_t^{t+2} \\
\vdots \\
y_t^{t+l}
\end{bmatrix}.
$$

Let \(\tilde y_t\) denote the latent "true" value and let \(\iota_l\) be an
\(l \times 1\) vector of ones. The JVN decomposition is

$$
y_t = \iota_l \tilde y_t + \nu_t + \zeta_t,
$$

where \(\nu_t\) is the news component and \(\zeta_t\) is the noise component.

- \(\nu_t\) captures information that was unavailable when early releases were
  produced and is therefore rationally incorporated later.
- \(\zeta_t\) captures transitory measurement error that is eventually revised
  away.

This decomposition is the main attraction of the JVN model: it separates
revisions that reflect learning about the economy from revisions that reflect
mistakes in earlier measurement.

## Durbin-Koopman state-space form

In the notation of @durbinTimeSeriesAnalysis2012, the generic state-space model
is

$$
y_t = Z \alpha_t + \varepsilon_t,
\qquad
\varepsilon_t \sim N(0, H),
$$

$$
\alpha_{t+1} = T \alpha_t + R \eta_t,
\qquad
\eta_t \sim N(0, Q).
$$

The current `reviser` implementation sets \(H = 0\), so all uncertainty enters
through the transition equation. It also fixes \(Q = I\) and places the scale
parameters directly in the shock-loading matrix \(R\).

## The `reviser` implementation

`jvn_nowcast()` implements a restricted but practical version of the JVN model.
The latent true value follows an AR(\(p\)) process, and the user may include a
news block, a noise block, or both. Optional spillovers are implemented as
diagonal persistence terms in the selected measurement-error blocks.

When both news and noise are included, the state vector is

$$
\alpha_t =
\begin{bmatrix}
\tilde y_t \\
\tilde y_{t-1} \\
\vdots \\
\tilde y_{t-p+1} \\
\nu_t \\
\zeta_t
\end{bmatrix},
$$

where \(\nu_t\) and \(\zeta_t\) are both \(l \times 1\) vectors.

### Measurement equation

With \(l\) vintages and an AR(\(p\)) latent process, the observation matrix is

$$
Z =
\begin{bmatrix}
\iota_l & 0_{l \times (p - 1)} & I_l & I_l
\end{bmatrix},
$$

so the observation equation is

$$
y_t = Z \alpha_t.
$$

If only news or only noise is included, the corresponding block is simply
omitted from \(Z\).

### Transition equation

The true-value block follows the companion-form AR(\(p\)) transition

$$
\Phi =
\begin{bmatrix}
\rho_1 & \rho_2 & \cdots & \rho_p \\
1 & 0 & \cdots & 0 \\
0 & 1 & \ddots & \vdots \\
\vdots & \vdots & \ddots & 0
\end{bmatrix}.
$$

The full transition matrix can therefore be written compactly as

$$
T =
\begin{bmatrix}
\Phi & 0 & 0 \\
0 & T_{\nu} & 0 \\
0 & 0 & T_{\zeta}
\end{bmatrix},
$$

where \(T_{\nu}\) and \(T_{\zeta}\) are diagonal spillover blocks when
spillovers are enabled and zero matrices otherwise.

### Shock-loading matrix

The implementation uses \(Q = I\) and places the innovation standard deviations
inside \(R\).

- The first structural shock loads on the latent true value with coefficient
  \(\sigma_e\).
- The news shocks load negatively on the true value and positively on the news
  states in the upper-triangular pattern implied by
  `jvn_update_matrices()`. This enforces the idea that later vintages embed
  information unavailable to earlier vintages.
- The noise shocks load independently on the corresponding noise states with
  coefficients \(\sigma_{\zeta,1}, \dots, \sigma_{\zeta,l}\).

This is the main implementation detail that differs from writing every variance
parameter inside \(Q\): in `reviser`, \(Q\) is fixed and \(R\) carries the scale
parameters.

## Nested JVN specifications

The function covers the empirically relevant subclasses discussed by
@jacobsModelingDataRevisions2011.

- `include_news = TRUE`, `include_noise = FALSE`: pure news model
- `include_news = FALSE`, `include_noise = TRUE`: pure noise model
- `include_news = TRUE`, `include_noise = TRUE`: combined news-noise model
- `include_spillovers = TRUE`: diagonal persistence in the selected
  measurement-error block(s)

Because these are nested specifications, information criteria are often useful
for comparing them, although standard boundary-value caveats still apply.

## Example: Euro Area GDP revisions

We illustrate the workflow with four vintages of Euro Area GDP growth from
`reviser::gdp`.

```{r warning = FALSE, message = FALSE}
library(reviser)
library(dplyr)
library(tidyr)
library(tsbox)
library(ggplot2)

gdp_growth <- reviser::gdp |>
  tsbox::ts_pc() |>
  dplyr::filter(
    id == "EA",
    time >= min(pub_date),
    time <= as.Date("2020-01-01")
  ) |>
  tidyr::drop_na()

df <- get_nth_release(gdp_growth, n = 0:3)
df
```

The resulting data frame has one row per reference period and one column per
release, which is the format expected by `jvn_nowcast()`.

```{r warning = FALSE, message = FALSE}
fit_jvn <- load_or_build_vignette_result(
  "nowcasting-revisions-jvn-fit.rds",
  function() {
    jvn_nowcast(
      df = df,
      e = 4,
      ar_order = 2,
      h = 0,
      include_news = TRUE,
      include_noise = TRUE,
      include_spillovers = TRUE,
      spillover_news = TRUE,
      spillover_noise = TRUE,
      method = "MLE",
      standardize = FALSE,
      solver_options = list(
        method = "L-BFGS-B",
        maxiter = 100,
        se_method = "hessian"
      )
    )
  }
)

summary(fit_jvn)
```

The parameter table contains the AR coefficients, the latent-process innovation
scale \(\sigma_e\), the news and noise innovation scales, and, when selected,
the diagonal spillover persistence parameters.

```{r warning = FALSE, message = FALSE}
fit_jvn$params
```

The state named `true_lag_0` is the current latent true value.

```{r warning = FALSE, message = FALSE}
fit_jvn$states |>
  dplyr::filter(
    state == "true_lag_0",
    filter == "smoothed"
  ) |>
  dplyr::slice_tail(n = 8)
```

The default plot method shows the filtered estimate of the latent true value.

```{r warning = FALSE, message = FALSE}
plot(fit_jvn)
```

We can also inspect the smoothed news and noise states directly.

```{r warning = FALSE, message = FALSE}
fit_jvn$states |>
  dplyr::filter(
    filter == "smoothed",
    grepl("news|noise", state)
  ) |>
  ggplot(aes(x = time, y = estimate, color = state)) +
  geom_line() +
  labs(
    title = "Smoothed news and noise states",
    x = NULL,
    y = "State estimate"
  ) +
  theme_minimal()
```

## Other JVN specifications

Pure-news and pure-noise variants are obtained by switching off the unwanted
measurement-error block.

```{r eval = FALSE}
fit_news <- jvn_nowcast(
  df = df,
  e = 4,
  ar_order = 2,
  include_news = TRUE,
  include_noise = FALSE,
  include_spillovers = FALSE
)

fit_noise <- jvn_nowcast(
  df = df,
  e = 4,
  ar_order = 2,
  include_news = FALSE,
  include_noise = TRUE,
  include_spillovers = FALSE
)
```

If desired, the data can be approximately standardized before estimation using
`standardize = TRUE`. In that case, scaling metadata are returned in the
`scale` element of the fitted object.