---
title: "Introduction to forrest"
format: html
vignette: >
%\VignetteIndexEntry{Introduction to forrest}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
knitr:
opts_chunk:
collapse: true
comment: "#>"
fig.width: 7
fig.height: 4
out.width: "100%"
---
```{r setup}
#| include: false
library(forrest)
```
`forrest` creates publication-ready forest plots from any data frame that
contains point estimates and confidence intervals. A single function,
`forrest()`, handles the full range of use cases — regression model results,
subgroup analyses, meta-analyses, dose-response patterns, and more.
The only hard dependency is [tinyplot](https://github.com/grantmcdermott/tinyplot).
`forrest` works with base R data frames, tibbles, and data.tables.
---
## Basic forest plot
The simplest call requires only three column names: `estimate`, `lower`, and
`upper`. Here we display adjusted regression coefficients from a linear model
predicting systolic blood pressure (SBP).
```{r basic}
dat <- data.frame(
predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)",
"Current smoker", "Physically active"),
estimate = c( 0.18, -0.42, 0.11, -0.31, 0.24),
lower = c( 0.05, -0.61, -0.04, -0.52, 0.08),
upper = c( 0.31, -0.23, 0.26, -0.10, 0.40)
)
forrest(
dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
xlab = "Regression coefficient (95% CI)"
)
```
---
## Section headers from a grouping column
Pass a column name to `section` to automatically group rows under bold section
headers. `forrest()` inserts a header row wherever the section value changes,
indents the row labels within each section, and adds a blank spacer row after
each section. No manual data manipulation is required.
```{r section}
#| fig-height: 5
sub_dat <- data.frame(
subgroup = c("Sex", "Sex",
"Age group", "Age group", "Age group"),
label = c("Female", "Male",
"30\u201349 years", "50\u201369 years", "70+ years"),
estimate = c(-0.38, 0.12, 0.22, -0.15, -0.41),
lower = c(-0.58, -0.08, 0.02, -0.38, -0.66),
upper = c(-0.18, 0.32, 0.42, 0.08, -0.16)
)
forrest(
sub_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "label",
section = "subgroup",
xlab = "Regression coefficient (95% CI)",
header = "Subgroup"
)
```
Use `section_indent = FALSE` to suppress automatic indentation, and
`section_spacer = FALSE` to suppress the blank row after each section.
### Two-level hierarchy with subsection
For analyses with a nested grouping structure, combine `section` and
`subsection`. `forrest()` inserts top-level bold headers for `section` changes
and indented sub-headers for `subsection` changes within each section.
```{r subsection}
#| fig-height: 7
nested_dat <- data.frame(
domain = c(
"Physical environment", "Physical environment", "Physical environment",
"Physical environment", "Physical environment",
"Social environment", "Social environment",
"Social environment", "Social environment"
),
type = c(
"Air quality", "Air quality",
"Urban form", "Urban form", "Urban form",
"Support", "Support",
"Deprivation", "Deprivation"
),
predictor = c(
"PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)",
"Green space (%)", "Walkability", "Noise (per 10 dB)",
"Social cohesion", "Social isolation",
"Area deprivation", "Employment rate"
),
estimate = c(-0.18, 0.12, -0.22, 0.15, -0.08,
-0.11, 0.09, 0.05, -0.03),
lower = c(-0.38, -0.08, -0.42, -0.05, -0.28,
-0.31, -0.09, -0.12, -0.20),
upper = c( 0.02, 0.32, -0.02, 0.35, 0.12,
0.09, 0.27, 0.22, 0.14)
)
forrest(
nested_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
section = "domain",
subsection = "type",
xlab = "Mean difference in SBP (mmHg, 95% CI)"
)
```
---
## Adding a summary (pooled) estimate
Mark one or more rows with `is_summary = TRUE` to draw them as filled diamonds
instead of squares. This is useful for pooled estimates in meta-analyses or for
overall effects after subgroup rows.
```{r summary}
sex_dat <- data.frame(
label = c("Female", "Male", "Overall"),
estimate = c(-0.42, -0.29, -0.36),
lower = c(-0.61, -0.48, -0.50),
upper = c(-0.23, -0.10, -0.22),
is_sum = c(FALSE, FALSE, TRUE)
)
forrest(
sex_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "label",
is_summary = "is_sum",
xlab = "Regression coefficient (95% CI)",
title = "Association of female sex with SBP by subgroup"
)
```
---
## Group colouring
Pass a `group` column to colour estimates by a categorical variable. A legend
is added automatically using the Okabe-Ito colorblind-safe palette.
```{r grouped}
#| fig-height: 5
grp_dat <- data.frame(
predictor = rep(
c("Air pollution (PM2.5)", "Noise exposure",
"Green space access", "Walkability index",
"Food environment"), 2
),
domain = rep(c("Physical environment", "Social environment"),
each = 5),
estimate = c(-0.18, 0.12, -0.22, 0.15, -0.08,
0.05, -0.03, 0.09, -0.11, 0.14),
lower = c(-0.38, -0.08, -0.42, -0.05, -0.28,
-0.12, -0.20, -0.09, -0.31, -0.04),
upper = c( 0.02, 0.32, -0.02, 0.35, 0.12,
0.22, 0.14, 0.27, 0.09, 0.32)
)
forrest(
grp_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
group = "domain",
xlab = "Mean difference in SBP (mmHg, 95% CI)"
)
```
---
## Multiple estimates per row (dodge)
Set `dodge = TRUE` (or a positive number) when consecutive rows share the same
`label` value. The CIs are vertically offset within each label band, and the
label is displayed once at the group centre. Combine with `group` to colour the
series.
```{r dodge}
#| fig-height: 4
#| fig-width: 9
dodge_dat <- data.frame(
exposure = rep(
c("PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)",
"Noise (per 10 dB)", "Green space (%)", "Walkability"),
each = 2
),
period = rep(c("Childhood", "Adulthood"), 5),
estimate = c(
0.14, -0.05, 0.08, 0.12, -0.19, -0.06, 0.11, -0.03, 0.07, 0.10
),
lower = c(
-0.10, -0.26, -0.09, -0.08, -0.40, -0.25, -0.05, -0.12, -0.14, -0.09
),
upper = c(
0.38, 0.16, 0.25, 0.32, 0.02, 0.13, 0.27, 0.06, 0.28, 0.29
)
)
forrest(
dodge_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "exposure",
group = "period",
dodge = TRUE,
header = "Environmental exposure",
ref_line = 0,
xlab = "Mean difference in SBP (mmHg, 95% CI)"
)
```
A numeric value for `dodge` sets the vertical spacing between rows within a
group directly (in y-axis units). `dodge = TRUE` uses the default of `0.25`.
Structural rows (section and subsection headers, spacers) are always treated as
singleton groups and are not affected by dodging.
### Wide-format text columns alongside dodged CIs
By default, `cols` text values appear at each row's dodged y position,
keeping them aligned with their CI whiskers. Set `cols_by_group = TRUE` to
collapse each text column to one value per label group — this produces
a wide table with one row per label and one column per condition, matching the
layout commonly seen in multi-period epidemiology papers.
```{r dodge-cols}
#| fig-height: 4
#| fig-width: 11
# Add per-condition formatted text columns to the long-format data
dodge_dat$est_ci <- sprintf("%.2f (%.2f, %.2f)",
dodge_dat$estimate,
dodge_dat$lower,
dodge_dat$upper)
dodge_dat$text_child <- ifelse(
dodge_dat$period == "Childhood", dodge_dat$est_ci, ""
)
dodge_dat$text_adult <- ifelse(
dodge_dat$period == "Adulthood", dodge_dat$est_ci, ""
)
forrest(
dodge_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "exposure",
group = "period",
dodge = TRUE,
cols_by_group = TRUE,
cols = c("Childhood (95% CI)" = "text_child",
"Adulthood (95% CI)" = "text_adult"),
widths = c(2.8, 3.5, 2.2, 2.2),
header = "Environmental exposure",
ref_line = 0,
xlab = "Mean difference in SBP (mmHg, 95% CI)"
)
```
---
## Point shapes
Pass a `shape` column to assign different point characters per category.
Use together with `group` and `dodge` to encode two categorical dimensions
at once — for example, colour = time period and shape = sex.
```{r shape}
#| fig-height: 5
#| fig-width: 9
shape_dat <- data.frame(
exposure = rep(
c("PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)",
"Noise (per 10 dB)", "Green space (%)", "Walkability"),
each = 4
),
period = rep(rep(c("Childhood", "Adulthood"), each = 2), 5),
sex = rep(c("Female", "Male"), 10),
estimate = c(
0.22, -0.08, -0.10, 0.18,
0.11, 0.05, 0.00, 0.14,
-0.31, 0.06, -0.09, -0.02,
0.08, 0.12, -0.04, 0.03,
0.17, -0.06, 0.22, 0.01
),
lower = c(
-0.20, -0.38, -0.28, -0.14,
-0.12, -0.22, -0.22, -0.14,
-0.56, -0.24, -0.38, -0.30,
-0.18, -0.14, -0.22, -0.15,
-0.09, -0.38, -0.04, -0.27
),
upper = c(
0.64, 0.22, 0.08, 0.50,
0.34, 0.32, 0.22, 0.42,
-0.06, 0.36, 0.20, 0.26,
0.34, 0.38, 0.14, 0.21,
0.43, 0.26, 0.48, 0.29
)
)
forrest(
shape_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "exposure",
group = "period",
shape = "sex",
dodge = TRUE,
ref_line = 0,
xlab = "Mean difference in SBP (mmHg, 95% CI)"
)
```
The shape legend appears at `legend_shape_pos` (`"bottomright"` by default).
Set `legend_shape_pos = NULL` to suppress it.
---
## Adding text columns
Use the `cols` argument — a named character vector mapping display headers to
column names in `data` — to show formatted statistics alongside the plot.
```{r text-cols}
#| fig-height: 4.5
#| fig-width: 10
tc_dat <- data.frame(
predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)",
"Current smoker", "Physically active"),
estimate = c( 0.18, -0.42, 0.11, -0.31, 0.24),
lower = c( 0.05, -0.61, -0.04, -0.52, 0.08),
upper = c( 0.31, -0.23, 0.26, -0.10, 0.40),
coef_ci = c(
" 0.18 ( 0.05, 0.31)",
"-0.42 (-0.61, -0.23)",
" 0.11 (-0.04, 0.26)",
"-0.31 (-0.52, -0.10)",
" 0.24 ( 0.08, 0.40)"
),
pval = c("0.006", "<0.001", "0.148", "0.003", "0.009")
)
forrest(
tc_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
header = "Predictor",
cols = c("Coef (95% CI)" = "coef_ci", "P-value" = "pval"),
widths = c(2.8, 4, 2.5, 1.2),
xlab = "Regression coefficient (95% CI)"
)
```
### Section-level annotations in text columns
When `section` is active, text columns show `""` for section header rows by
default. Use `section_cols` — a named character vector with the same syntax as
`cols` — to populate specific columns in section header rows with a
section-level value (e.g. number of studies, total N). The name must match a
name in `cols`; the value is a column in `data` whose first non-NA entry in
each section is used.
```{r section-cols}
#| fig-height: 5
#| fig-width: 11
sc_dat <- data.frame(
subgroup = c("Sex", "Sex", "Age group", "Age group", "Age group"),
label = c("Female", "Male", "30\u201349 y", "50\u201369 y", "70+ y"),
estimate = c(-0.38, 0.12, 0.22, -0.15, -0.41),
lower = c(-0.58, -0.08, 0.02, -0.38, -0.66),
upper = c(-0.18, 0.32, 0.42, 0.08, -0.16),
coef_ci = c(
"-0.38 (-0.58, -0.18)",
" 0.12 (-0.08, 0.32)",
" 0.22 ( 0.02, 0.42)",
"-0.15 (-0.38, 0.08)",
"-0.41 (-0.66, -0.16)"
),
# Constant within each section — section header will show this value
k_text = c("k = 2", "k = 2", "k = 3", "k = 3", "k = 3")
)
forrest(
sc_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "label",
section = "subgroup",
section_cols = c("k" = "k_text"),
header = "Subgroup",
cols = c("Coef (95% CI)" = "coef_ci", "k" = "k_text"),
widths = c(2.5, 4, 2.5, 1.2),
xlab = "Regression coefficient (95% CI)"
)
```
---
## Alternating row stripes
Set `stripe = TRUE` for a subtle alternating row background that aids
readability with many rows.
```{r stripe}
#| fig-height: 6
stripe_dat <- data.frame(
label = c(
"Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)",
"Current smoker", "Physically active",
"Alcohol intake", "Sleep duration", "Depressive symptoms"
),
estimate = c( 0.42, -0.18, 0.31, -0.07, 0.25, -0.12, 0.19, -0.34),
lower = c( 0.22, -0.38, 0.12, -0.24, 0.06, -0.30, 0.01, -0.52),
upper = c( 0.62, 0.02, 0.50, 0.10, 0.44, 0.06, 0.37, -0.16)
)
forrest(
stripe_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "label",
stripe = TRUE,
xlab = "Regression coefficient (95% CI)"
)
```
---
## Themes
`forrest()` ships three built-in themes. Pass the theme name as a character
string, or supply a named list of style overrides for full control.
```{r themes-minimal}
#| fig-height: 3.5
theme_dat <- data.frame(
predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)"),
estimate = c( 0.18, -0.42, 0.11),
lower = c( 0.05, -0.61, -0.04),
upper = c( 0.31, -0.23, 0.26)
)
# "minimal" theme — lighter gridlines and softer reference line
forrest(
theme_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
theme = "minimal",
title = 'theme = "minimal"',
xlab = "Coefficient (95% CI)"
)
```
```{r themes-classic}
#| fig-height: 3.5
# "classic" theme — dotted gridlines and solid black reference line
forrest(
theme_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
theme = "classic",
title = 'theme = "classic"',
xlab = "Coefficient (95% CI)"
)
```
```{r themes-custom}
#| fig-height: 3.5
# Custom theme — override individual style keys
forrest(
theme_dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
theme = list(ref_col = "#e63946", ref_lty = 1L, grid_col = "#eeeeee"),
title = "Custom theme (red reference line)",
xlab = "Coefficient (95% CI)"
)
```
---
## Saving plots
Use `save_forrest()` to export a plot to PDF, PNG, SVG, or TIFF. Pass a
zero-argument function that calls `forrest()`.
```{r save}
#| eval: false
save_forrest(
file = "my_forest_plot.pdf",
plot = function() {
forrest(
dat,
estimate = "estimate",
lower = "lower",
upper = "upper",
label = "predictor",
xlab = "Regression coefficient (95% CI)"
)
},
width = 8,
height = 5
)
```