---
title: "How forrest works"
format: html
vignette: >
%\VignetteIndexEntry{How forrest works}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: "#>"
fig.width: 7
fig.height: 4
out.width: "100%"
---
```{r setup}
#| include: false
library(forrest)
```
This vignette walks through the internals of `forrest()` step by step, using
concrete data at each stage so you can see exactly what gets built before
anything is drawn.
---
## Design principles
`forrest` follows three principles:
1. **One function, all use cases.** `forrest()` covers regression tables,
meta-analyses, subgroup analyses, dose-response patterns, and multi-model
comparisons through a uniform column-name-based interface.
2. **Data and structure are separate.** Users supply tidy data (one row = one
estimate). Visual structure — section headers, indentation, spacers — is
derived from grouping columns via `section` / `subsection`, not from
manually inserted NA rows in the data.
3. **Base graphics with a single dependency.** All drawing uses base R
`graphics` functions. The only external dependency is
[tinyplot](https://github.com/grantmcdermott/tinyplot), used solely to
initialise the plot region.
---
## Source files
| File | Purpose |
|------|---------|
| `R/forrest.R` | Exported `forrest()` — validation, section expansion, drawing pipeline |
| `R/save.R` | Exported `save_forrest()` — device dispatch for PDF/PNG/SVG/TIFF |
| `R/utils.R` | Internal helpers: `build_sections()`, `compute_dodge_groups()`, `group_colors()`, `group_shapes()`, `check_col()`, `%||%` |
| `R/draw.R` | Internal drawing helpers: `draw_diamond()`, `draw_text_panel()` |
| `R/theme.R` | Theme infrastructure: `.theme_defaults`, `.themes`, `resolve_theme()` |
---
## Starting data
We will use a small but representative data set throughout. Six studies are
grouped into three geographic regions, and each region has a pooled estimate.
```{r starting-data}
meta <- data.frame(
study = c(
"Chen (2016)", "Ibrahim (2022)",
"Bauer (2015)", "Evans (2018)", "Garcia (2020)", "Jensen (2023)",
"Fuentes (2019)"
),
region = c(
"Asia", "Asia",
"Europe", "Europe", "Europe", "Europe",
"Latin America"
),
or = c(1.081, 1.092, 1.095, 1.057, 1.086, 1.070, 1.116),
lower = c(1.038, 1.052, 1.058, 1.019, 1.050, 1.036, 1.063),
upper = c(1.126, 1.134, 1.134, 1.096, 1.123, 1.105, 1.171),
weight = c(2065, 1736, 816, 1041, 1479, 918, 567),
is_sum = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
or_text = sprintf("%.2f (%.2f\u2013%.2f)",
c(1.081, 1.092, 1.095, 1.057, 1.086, 1.070, 1.116),
c(1.038, 1.052, 1.058, 1.019, 1.050, 1.036, 1.063),
c(1.126, 1.134, 1.134, 1.096, 1.123, 1.105, 1.171))
)
meta
```
Without any structural arguments, all seven rows are drawn as plain study rows:
```{r plain}
forrest(
meta,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
weight = "weight",
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Step 1b — Section expansion via build_sections()
`build_sections()` is the function that converts the tidy data into the
display-ready expanded frame. Calling it directly shows what `forrest()`
sees before drawing.
```{r build-sections-call}
# build_sections() is an internal function; access via :::
expanded <- forrest:::build_sections(
df = meta,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
is_summary = "is_sum",
weight = "weight",
section = "region",
subsection = NULL,
section_indent = TRUE,
section_spacer = TRUE,
cols = "or_text",
section_cols = NULL
)
```
The result is a list with four elements. `$df` is the expanded data frame:
```{r expanded-df}
expanded$df[, c("study", "region", "or", "is_sum", "or_text")]
```
The three flag vectors identify which rows are structural:
```{r expanded-flags}
data.frame(
study = expanded$df$study,
is_section_header = expanded$is_section_header,
is_subsection_hdr = expanded$is_subsection_header,
is_spacer = expanded$is_spacer
)
```
Key observations:
- Row 1 (`"Asia"`) and row 5 (`"Europe"`) and row 11 (`"Latin America"`) are
section header rows — `is_section_header = TRUE`, `or = NA`.
- Data rows within each section are indented by two leading spaces.
- The blank spacer rows (`study = ""`) follow each section.
- `or_text` is `""` for all structural rows.
Passing `section = "region"` to `forrest()` triggers this expansion
automatically:
```{r section-plot}
#| fig-height: 7
forrest(
meta,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
section = "region",
weight = "weight",
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Subsection expansion
With both `section` and `subsection`, `build_sections()` inserts two levels of
headers. Here each region contains studies from different design types.
```{r subsection-data}
meta2 <- data.frame(
region = c("Europe", "Europe", "Europe", "Europe", "Asia", "Asia"),
design = c("Cohort", "Cohort", "Case-control", "Case-control",
"Cohort", "Case-control"),
study = c("Bauer (2015)", "Evans (2018)",
"Garcia (2020)", "Jensen (2023)",
"Chen (2016)", "Ibrahim (2022)"),
or = c(1.095, 1.057, 1.086, 1.070, 1.081, 1.092),
lower = c(1.058, 1.019, 1.050, 1.036, 1.038, 1.052),
upper = c(1.134, 1.096, 1.123, 1.105, 1.126, 1.134)
)
```
```{r subsection-expanded}
exp2 <- forrest:::build_sections(
df = meta2,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
is_summary = NULL,
weight = NULL,
section = "region",
subsection = "design",
section_indent = TRUE,
section_spacer = TRUE
)
data.frame(
study = exp2$df$study,
is_section_header = exp2$is_section_header,
is_subsection_header = exp2$is_subsection_header,
is_spacer = exp2$is_spacer
)
```
```{r subsection-plot}
#| fig-height: 7
forrest(
meta2,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
section = "region",
subsection = "design",
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Step 3 — Row type classification
After section expansion, `forrest()` classifies every row into one of four
types. Using the first expanded frame:
```{r row-types}
df <- expanded$df
est <- as.numeric(df$or)
is_sum <- as.logical(df$is_sum)
is_struct <- expanded$is_section_header |
expanded$is_subsection_header |
expanded$is_spacer
is_ref <- is.na(est) & !is_sum & !is_struct
is_bold <- (expanded$is_section_header |
expanded$is_subsection_header) &
nchar(trimws(df$study)) > 0L
data.frame(
study = df$study,
is_sum = is_sum,
is_struct = is_struct,
is_ref = is_ref,
is_bold = is_bold,
CI_drawn = !is_sum & !is_struct & !is_ref & !is.na(est)
)
```
The `is_ref` column would be `TRUE` for a reference-category row (user-supplied
`NA` estimate that is not a structural row). For this data there are none.
---
## Step 8 — Dodge layout
`compute_dodge_groups()` assigns visual group IDs. Consecutive rows with the
same label form one group; structural rows are always singletons.
For a non-dodged layout, each row maps to one y slot:
```{r dodge-no-dodge}
lbl <- as.character(expanded$df$study)
group_ids <- forrest:::compute_dodge_groups(lbl, is_struct)
n_vis <- max(group_ids)
# y slot for each row (top = n_vis, bottom = 1)
row_y <- (n_vis + 1L) - group_ids
data.frame(study = lbl, group_id = group_ids, y = row_y)
```
For a dodged layout with two series per label, consecutive rows sharing a label
form one group and are spread around the group centre:
```{r dodge-example-data}
dodge_ex <- data.frame(
label = rep(c("Asia", "Europe"), each = 2),
method = rep(c("Cohort", "Case-control"), 2),
or = c(1.08, 1.05, 1.09, 1.07),
lower = c(1.04, 1.01, 1.05, 1.03),
upper = c(1.13, 1.09, 1.14, 1.11)
)
```
```{r dodge-example-groups}
lbl2 <- as.character(dodge_ex$label)
grp2 <- forrest:::compute_dodge_groups(lbl2, rep(FALSE, nrow(dodge_ex)))
dodge_amt <- 0.25
n_vis2 <- max(grp2)
grp_cy <- (n_vis2 + 1L) - seq_len(n_vis2)
row_y2 <- numeric(nrow(dodge_ex))
for (g in seq_len(n_vis2)) {
idx <- which(grp2 == g)
k <- length(idx)
offsets <- seq(-(k - 1L) / 2, (k - 1L) / 2, length.out = k) * dodge_amt
row_y2[idx] <- grp_cy[g] + offsets
}
data.frame(
label = lbl2,
method = dodge_ex$method,
group_id = grp2,
y = row_y2
)
```
The two "Asia" rows are offset symmetrically around y = 2 (the group centre),
and the two "Europe" rows around y = 1:
```{r dodge-plot}
forrest(
dodge_ex,
estimate = "or",
lower = "lower",
upper = "upper",
label = "label",
group = "method",
dodge = TRUE,
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Colour assignment
`group_colors()` maps unique levels to the Okabe-Ito palette (skipping index 1,
which is near-white):
```{r group-colors}
forrest:::group_colors(c("Asia", "Europe", "Latin America"))
```
When `group` is supplied, each row's colour comes from this map:
```{r color-assignment}
grp <- c("Asia", "Asia", "Europe", "Europe", "Latin America")
col_map <- forrest:::group_colors(grp)
col_vec <- unname(col_map[grp])
data.frame(grp, colour = col_vec)
```
---
## Section-level text column annotations
`section_cols` lets specific `cols` columns show a section-level value in the
header row rather than `""`. The value comes from the first non-NA entry of the
named data column within each section.
```{r section-cols-data}
meta$k_text <- c("k = 2", "k = 2",
"k = 4", "k = 4", "k = 4", "k = 4",
"k = 1")
exp_sc <- forrest:::build_sections(
df = meta,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
is_summary = "is_sum",
weight = "weight",
section = "region",
section_cols = c(k_text = "k_text"),
cols = c("or_text", "k_text"),
section_spacer = FALSE,
section_indent = FALSE
)
exp_sc$df[, c("study", "or_text", "k_text")]
```
Header rows have `""` in `or_text` (a row-level column) and the section value
in `k_text` (declared in `section_cols`). Data rows keep their original values.
```{r section-cols-plot}
#| fig-height: 7
#| fig-width: 10
forrest(
meta,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
section = "region",
section_cols = c("k" = "k_text"),
weight = "weight",
log_scale = TRUE,
ref_line = 1,
header = "Study",
cols = c("OR (95% CI)" = "or_text", "k" = "k_text"),
widths = c(3.5, 3.5, 2.2, 1.0),
xlab = "OR (95% CI)"
)
```
---
## Reference-category rows
A row where `estimate = NA` and which is **not** auto-inserted by
`build_sections()` is a reference category. It produces no CI or point, its
label is rendered in regular (non-bold) font, and `ref_label = TRUE`
appends `" (Ref.)"` automatically.
```{r ref-category-data}
dose <- data.frame(
quartile = c("Q1", "Q2", "Q3", "Q4"),
or = c(NA, 1.21, 1.45, 1.82),
lower = c(NA, 1.08, 1.28, 1.60),
upper = c(NA, 1.36, 1.65, 2.07)
)
dose
```
With `ref_label = TRUE`, the Q1 row's label gets `" (Ref.)"` appended and no
CI is drawn:
```{r ref-category-plot}
forrest(
dose,
estimate = "or",
lower = "lower",
upper = "upper",
label = "quartile",
ref_label = TRUE,
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Summary (diamond) rows
Rows with `is_summary = TRUE` are drawn as filled diamonds by `draw_diamond()`.
The diamond's left and right tips are at `lo[i]` and `hi[i]` (the CI bounds),
its horizontal centre is at `est[i]`, and its half-height is `0.38 * cex`.
The diamond is clipped to `xlim` if the CI extends beyond the axis.
```{r diamond-data}
with_pool <- rbind(
meta[, c("study", "region", "or", "lower", "upper", "is_sum")],
data.frame(
study = "Pooled", region = "Overall",
or = 1.082, lower = 1.058, upper = 1.107,
is_sum = TRUE
)
)
```
```{r diamond-plot}
#| fig-height: 8
forrest(
with_pool,
estimate = "or",
lower = "lower",
upper = "upper",
label = "study",
section = "region",
is_summary = "is_sum",
log_scale = TRUE,
ref_line = 1,
xlab = "OR (95% CI)"
)
```
---
## Theme system
`resolve_theme()` merges user overrides with `.theme_defaults`. All six theme
keys and their defaults:
```{r theme-defaults}
forrest:::.theme_defaults
```
Built-in themes are stored as partial override lists:
```{r theme-list}
forrest:::.themes
```
A custom theme overrides only the keys you supply:
```{r custom-theme}
#| fig-height: 3.5
dat <- data.frame(
label = c("A", "B", "C"),
estimate = c(0.2, -0.1, 0.4),
lower = c(0.0, -0.3, 0.2),
upper = c(0.4, 0.1, 0.6)
)
forrest(
dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "label",
theme = list(ref_col = "#e63946", ref_lty = 1L,
grid_col = "#eeeeee", stripe_col = "#fafafa"),
stripe = TRUE,
xlab = "Coefficient (95% CI)"
)
```