--- title: "Introduction to forrest" format: html vignette: > %\VignetteIndexEntry{Introduction to forrest} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: collapse: true comment: "#>" fig.width: 7 fig.height: 4 out.width: "100%" --- ```{r setup} #| include: false library(forrest) ``` `forrest` creates publication-ready forest plots from any data frame that contains point estimates and confidence intervals. A single function, `forrest()`, handles the full range of use cases — regression model results, subgroup analyses, meta-analyses, dose-response patterns, and more. The only hard dependency is [tinyplot](https://github.com/grantmcdermott/tinyplot). `forrest` works with base R data frames, tibbles, and data.tables. --- ## Basic forest plot The simplest call requires only three column names: `estimate`, `lower`, and `upper`. Here we display adjusted regression coefficients from a linear model predicting systolic blood pressure (SBP). ```{r basic} dat <- data.frame( predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)", "Current smoker", "Physically active"), estimate = c( 0.18, -0.42, 0.11, -0.31, 0.24), lower = c( 0.05, -0.61, -0.04, -0.52, 0.08), upper = c( 0.31, -0.23, 0.26, -0.10, 0.40) ) forrest( dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", xlab = "Regression coefficient (95% CI)" ) ``` --- ## Section headers from a grouping column Pass a column name to `section` to automatically group rows under bold section headers. `forrest()` inserts a header row wherever the section value changes, indents the row labels within each section, and adds a blank spacer row after each section. No manual data manipulation is required. ```{r section} #| fig-height: 5 sub_dat <- data.frame( subgroup = c("Sex", "Sex", "Age group", "Age group", "Age group"), label = c("Female", "Male", "30\u201349 years", "50\u201369 years", "70+ years"), estimate = c(-0.38, 0.12, 0.22, -0.15, -0.41), lower = c(-0.58, -0.08, 0.02, -0.38, -0.66), upper = c(-0.18, 0.32, 0.42, 0.08, -0.16) ) forrest( sub_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "label", section = "subgroup", xlab = "Regression coefficient (95% CI)", header = "Subgroup" ) ``` Use `section_indent = FALSE` to suppress automatic indentation, and `section_spacer = FALSE` to suppress the blank row after each section. ### Two-level hierarchy with subsection For analyses with a nested grouping structure, combine `section` and `subsection`. `forrest()` inserts top-level bold headers for `section` changes and indented sub-headers for `subsection` changes within each section. ```{r subsection} #| fig-height: 7 nested_dat <- data.frame( domain = c( "Physical environment", "Physical environment", "Physical environment", "Physical environment", "Physical environment", "Social environment", "Social environment", "Social environment", "Social environment" ), type = c( "Air quality", "Air quality", "Urban form", "Urban form", "Urban form", "Support", "Support", "Deprivation", "Deprivation" ), predictor = c( "PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)", "Green space (%)", "Walkability", "Noise (per 10 dB)", "Social cohesion", "Social isolation", "Area deprivation", "Employment rate" ), estimate = c(-0.18, 0.12, -0.22, 0.15, -0.08, -0.11, 0.09, 0.05, -0.03), lower = c(-0.38, -0.08, -0.42, -0.05, -0.28, -0.31, -0.09, -0.12, -0.20), upper = c( 0.02, 0.32, -0.02, 0.35, 0.12, 0.09, 0.27, 0.22, 0.14) ) forrest( nested_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", section = "domain", subsection = "type", xlab = "Mean difference in SBP (mmHg, 95% CI)" ) ``` --- ## Adding a summary (pooled) estimate Mark one or more rows with `is_summary = TRUE` to draw them as filled diamonds instead of squares. This is useful for pooled estimates in meta-analyses or for overall effects after subgroup rows. ```{r summary} sex_dat <- data.frame( label = c("Female", "Male", "Overall"), estimate = c(-0.42, -0.29, -0.36), lower = c(-0.61, -0.48, -0.50), upper = c(-0.23, -0.10, -0.22), is_sum = c(FALSE, FALSE, TRUE) ) forrest( sex_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "label", is_summary = "is_sum", xlab = "Regression coefficient (95% CI)", title = "Association of female sex with SBP by subgroup" ) ``` --- ## Group colouring Pass a `group` column to colour estimates by a categorical variable. A legend is added automatically using the Okabe-Ito colorblind-safe palette. ```{r grouped} #| fig-height: 5 grp_dat <- data.frame( predictor = rep( c("Air pollution (PM2.5)", "Noise exposure", "Green space access", "Walkability index", "Food environment"), 2 ), domain = rep(c("Physical environment", "Social environment"), each = 5), estimate = c(-0.18, 0.12, -0.22, 0.15, -0.08, 0.05, -0.03, 0.09, -0.11, 0.14), lower = c(-0.38, -0.08, -0.42, -0.05, -0.28, -0.12, -0.20, -0.09, -0.31, -0.04), upper = c( 0.02, 0.32, -0.02, 0.35, 0.12, 0.22, 0.14, 0.27, 0.09, 0.32) ) forrest( grp_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", group = "domain", xlab = "Mean difference in SBP (mmHg, 95% CI)" ) ``` --- ## Multiple estimates per row (dodge) Set `dodge = TRUE` (or a positive number) when consecutive rows share the same `label` value. The CIs are vertically offset within each label band, and the label is displayed once at the group centre. Combine with `group` to colour the series. ```{r dodge} #| fig-height: 4 #| fig-width: 9 dodge_dat <- data.frame( exposure = rep( c("PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)", "Noise (per 10 dB)", "Green space (%)", "Walkability"), each = 2 ), period = rep(c("Childhood", "Adulthood"), 5), estimate = c( 0.14, -0.05, 0.08, 0.12, -0.19, -0.06, 0.11, -0.03, 0.07, 0.10 ), lower = c( -0.10, -0.26, -0.09, -0.08, -0.40, -0.25, -0.05, -0.12, -0.14, -0.09 ), upper = c( 0.38, 0.16, 0.25, 0.32, 0.02, 0.13, 0.27, 0.06, 0.28, 0.29 ) ) forrest( dodge_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "exposure", group = "period", dodge = TRUE, header = "Environmental exposure", ref_line = 0, xlab = "Mean difference in SBP (mmHg, 95% CI)" ) ``` A numeric value for `dodge` sets the vertical spacing between rows within a group directly (in y-axis units). `dodge = TRUE` uses the default of `0.25`. Structural rows (section and subsection headers, spacers) are always treated as singleton groups and are not affected by dodging. ### Wide-format text columns alongside dodged CIs By default, `cols` text values appear at each row's dodged y position, keeping them aligned with their CI whiskers. Set `cols_by_group = TRUE` to collapse each text column to one value per label group — this produces a wide table with one row per label and one column per condition, matching the layout commonly seen in multi-period epidemiology papers. ```{r dodge-cols} #| fig-height: 4 #| fig-width: 11 # Add per-condition formatted text columns to the long-format data dodge_dat$est_ci <- sprintf("%.2f (%.2f, %.2f)", dodge_dat$estimate, dodge_dat$lower, dodge_dat$upper) dodge_dat$text_child <- ifelse( dodge_dat$period == "Childhood", dodge_dat$est_ci, "" ) dodge_dat$text_adult <- ifelse( dodge_dat$period == "Adulthood", dodge_dat$est_ci, "" ) forrest( dodge_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "exposure", group = "period", dodge = TRUE, cols_by_group = TRUE, cols = c("Childhood (95% CI)" = "text_child", "Adulthood (95% CI)" = "text_adult"), widths = c(2.8, 3.5, 2.2, 2.2), header = "Environmental exposure", ref_line = 0, xlab = "Mean difference in SBP (mmHg, 95% CI)" ) ``` --- ## Point shapes Pass a `shape` column to assign different point characters per category. Use together with `group` and `dodge` to encode two categorical dimensions at once — for example, colour = time period and shape = sex. ```{r shape} #| fig-height: 5 #| fig-width: 9 shape_dat <- data.frame( exposure = rep( c("PM2.5 (per 10 \u03bcg/m\u00b3)", "NO2 (per 10 ppb)", "Noise (per 10 dB)", "Green space (%)", "Walkability"), each = 4 ), period = rep(rep(c("Childhood", "Adulthood"), each = 2), 5), sex = rep(c("Female", "Male"), 10), estimate = c( 0.22, -0.08, -0.10, 0.18, 0.11, 0.05, 0.00, 0.14, -0.31, 0.06, -0.09, -0.02, 0.08, 0.12, -0.04, 0.03, 0.17, -0.06, 0.22, 0.01 ), lower = c( -0.20, -0.38, -0.28, -0.14, -0.12, -0.22, -0.22, -0.14, -0.56, -0.24, -0.38, -0.30, -0.18, -0.14, -0.22, -0.15, -0.09, -0.38, -0.04, -0.27 ), upper = c( 0.64, 0.22, 0.08, 0.50, 0.34, 0.32, 0.22, 0.42, -0.06, 0.36, 0.20, 0.26, 0.34, 0.38, 0.14, 0.21, 0.43, 0.26, 0.48, 0.29 ) ) forrest( shape_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "exposure", group = "period", shape = "sex", dodge = TRUE, ref_line = 0, xlab = "Mean difference in SBP (mmHg, 95% CI)" ) ``` The shape legend appears at `legend_shape_pos` (`"bottomright"` by default). Set `legend_shape_pos = NULL` to suppress it. --- ## Adding text columns Use the `cols` argument — a named character vector mapping display headers to column names in `data` — to show formatted statistics alongside the plot. ```{r text-cols} #| fig-height: 4.5 #| fig-width: 10 tc_dat <- data.frame( predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)", "Current smoker", "Physically active"), estimate = c( 0.18, -0.42, 0.11, -0.31, 0.24), lower = c( 0.05, -0.61, -0.04, -0.52, 0.08), upper = c( 0.31, -0.23, 0.26, -0.10, 0.40), coef_ci = c( " 0.18 ( 0.05, 0.31)", "-0.42 (-0.61, -0.23)", " 0.11 (-0.04, 0.26)", "-0.31 (-0.52, -0.10)", " 0.24 ( 0.08, 0.40)" ), pval = c("0.006", "<0.001", "0.148", "0.003", "0.009") ) forrest( tc_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", header = "Predictor", cols = c("Coef (95% CI)" = "coef_ci", "P-value" = "pval"), widths = c(2.8, 4, 2.5, 1.2), xlab = "Regression coefficient (95% CI)" ) ``` ### Section-level annotations in text columns When `section` is active, text columns show `""` for section header rows by default. Use `section_cols` — a named character vector with the same syntax as `cols` — to populate specific columns in section header rows with a section-level value (e.g. number of studies, total N). The name must match a name in `cols`; the value is a column in `data` whose first non-NA entry in each section is used. ```{r section-cols} #| fig-height: 5 #| fig-width: 11 sc_dat <- data.frame( subgroup = c("Sex", "Sex", "Age group", "Age group", "Age group"), label = c("Female", "Male", "30\u201349 y", "50\u201369 y", "70+ y"), estimate = c(-0.38, 0.12, 0.22, -0.15, -0.41), lower = c(-0.58, -0.08, 0.02, -0.38, -0.66), upper = c(-0.18, 0.32, 0.42, 0.08, -0.16), coef_ci = c( "-0.38 (-0.58, -0.18)", " 0.12 (-0.08, 0.32)", " 0.22 ( 0.02, 0.42)", "-0.15 (-0.38, 0.08)", "-0.41 (-0.66, -0.16)" ), # Constant within each section — section header will show this value k_text = c("k = 2", "k = 2", "k = 3", "k = 3", "k = 3") ) forrest( sc_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "label", section = "subgroup", section_cols = c("k" = "k_text"), header = "Subgroup", cols = c("Coef (95% CI)" = "coef_ci", "k" = "k_text"), widths = c(2.5, 4, 2.5, 1.2), xlab = "Regression coefficient (95% CI)" ) ``` --- ## Alternating row stripes Set `stripe = TRUE` for a subtle alternating row background that aids readability with many rows. ```{r stripe} #| fig-height: 6 stripe_dat <- data.frame( label = c( "Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)", "Current smoker", "Physically active", "Alcohol intake", "Sleep duration", "Depressive symptoms" ), estimate = c( 0.42, -0.18, 0.31, -0.07, 0.25, -0.12, 0.19, -0.34), lower = c( 0.22, -0.38, 0.12, -0.24, 0.06, -0.30, 0.01, -0.52), upper = c( 0.62, 0.02, 0.50, 0.10, 0.44, 0.06, 0.37, -0.16) ) forrest( stripe_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "label", stripe = TRUE, xlab = "Regression coefficient (95% CI)" ) ``` --- ## Themes `forrest()` ships three built-in themes. Pass the theme name as a character string, or supply a named list of style overrides for full control. ```{r themes-minimal} #| fig-height: 3.5 theme_dat <- data.frame( predictor = c("Age (per 10 y)", "Female sex", "BMI (per 5 kg/m\u00b2)"), estimate = c( 0.18, -0.42, 0.11), lower = c( 0.05, -0.61, -0.04), upper = c( 0.31, -0.23, 0.26) ) # "minimal" theme — lighter gridlines and softer reference line forrest( theme_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", theme = "minimal", title = 'theme = "minimal"', xlab = "Coefficient (95% CI)" ) ``` ```{r themes-classic} #| fig-height: 3.5 # "classic" theme — dotted gridlines and solid black reference line forrest( theme_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", theme = "classic", title = 'theme = "classic"', xlab = "Coefficient (95% CI)" ) ``` ```{r themes-custom} #| fig-height: 3.5 # Custom theme — override individual style keys forrest( theme_dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", theme = list(ref_col = "#e63946", ref_lty = 1L, grid_col = "#eeeeee"), title = "Custom theme (red reference line)", xlab = "Coefficient (95% CI)" ) ``` --- ## Saving plots Use `save_forrest()` to export a plot to PDF, PNG, SVG, or TIFF. Pass a zero-argument function that calls `forrest()`. ```{r save} #| eval: false save_forrest( file = "my_forest_plot.pdf", plot = function() { forrest( dat, estimate = "estimate", lower = "lower", upper = "upper", label = "predictor", xlab = "Regression coefficient (95% CI)" ) }, width = 8, height = 5 ) ```