Markdown Literal Programming

In literate programming, the typical paradigm of source code is reversed; instead of a wall of code with the occasional comment, the user writes human readable text (like this paragraph) with source code interspersed. In the R language, this is primarily done with the rmarkdown package, which takes a plaint text R markdown file (.Rmd) containing code “chunks” and executes that code when converting to a regular markdown file (.md) and then possibly some other format (.html, .pdf, etc).

Markdown is a lightweight plain-text language used to format text. Let’s look at the original description of markdown from John Gruber’s website, the creator of the markdown standard. Using the rvest package, we can programmatically scrape Gruber’s blog, extract HTML paragraph tags, and convert those tags to character vectors.

He continues by outlining why markdown was created, his rationale for it’s format, and some inspiration for it’s syntax.

This entire vignette was written in markdown and converted to HTML using pandoc. However, as you may have noticed, we haven’t exactly been conforming to this original desire for markdown to be readable as is. We didn’t copy the text from his blog and past it as text into this vignette. This is where the gluedown package comes in.

The gluedown package helps ease the transition between the incredibly powerful vector support in R and the readability of markdown. Since this vignette was written in R Markdown (.Rmd), we are able to (1) use the power of packages like rvest to collect, process, and/or analyze some kind of data and then (2) transition that result to the human readable markdown format.

In the rest of this vignette, we will see some of the various use cases for gluedown. We will see how easy it is to transition between R vectors and readable results in markdown/HTML.

Vector Lists

Printing vectors as markdown lists was the initial inspiration for the package. In R, atomic vectors the fundamental object type that composes more complex objects like lists and dataframes. The state.name vector built into base R is a character vector of all 50 state names.

str(state.name, vec.len = 3)
#>  chr [1:50] "Alabama" "Alaska" "Arizona" ...

If we as a user want to use those state names as text in our markdown document we can use the cat() function and tell rmarkdown to print the results of that function “as is” (rather than as code output).

cat(state.name[1:3])

Alabama Alaska Arizona

That output obviously isn’t very appealing. We could tweak our use of cat() a little to separate them on new lines.

cat(state.name[1:3], sep = "\n\n")

Alabama

Alaska

Arizona

This is more readable, but with some more work, we can use cat() to print an ordered list.

cat(paste0(1:3, ". ", state.name[1:3]), sep = "\n")

Alabama
Alaska
Arizona

This workflow gets tiresome, although it’s made slightly more simple with the fantastic glue package from Jim Hester.

glue("{1:3}. {state.name[1:3]}")

Alabama
Alaska
Arizona

This is the technique used in this package. Vector inputs are passed to glue::glue() and the appropriate markdown syntax is implemented.

The md_order() function simplifies the glue::glue() workflow and allows users to more easily customize the appearance of the list in markdown format.

# markdown only cares about the first number
md_order(state.name[1:3], seq = FALSE)
#> 1. Alabama
#> 1. Alaska
#> 1. Arizona
# markdown ignored padding and allows for use of parentheses
md_order(state.name[1:10], seq = TRUE, pad = TRUE, marker = ")")
#> 01) Alabama
#> 02) Alaska
#> 03) Arizona
#> 04) Arkansas
#> 05) California
#> 06) Colorado
#> 07) Connecticut
#> 08) Delaware
#> 09) Florida
#> 10) Georgia

Although, as we can see below, all these different options are rendered as the same kind of HTML <ol> fragment.

md_order(state.name[1:3], seq = FALSE)

Alabama
Alaska
Arizona

md_order(state.name[1:10], seq = TRUE, pad = TRUE, marker = ")")

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia

This ordered list is a markdown container block. As described in the GitHub Flavored Markdown specification:

We can think of a document as a sequence of blocks—structural elements like paragraphs, block quotations, lists, headings, rules, and code blocks. Some blocks (like block quotes and list items) contain other blocks; others (like headings and paragraphs) contain inline content—text, links, emphasized text, images, code spans, and so on.

We can nest md_*() functions to create inline content within a code block. Let’s use some inline functions to create a new vector names inline with five states, each formatted in another syntax. We’ll take a look at what that vector really looks like with a simple print().

inlines <- c(
  md_bold(state.name[4]),
  md_code(state.name[5]),
  md_link(state.name[6], "https://Colorado.gov"),
  md_italic(state.name[7]),
  md_strike(state.name[8])
)

str(inlines, vec.len = 3)
#>  chr [1:5] "**Arkansas**" "`California`" "[Colorado](https://Colorado.gov)" ...

Using md_bullet() we will print that vector as a bullet point list container block and each list item will be rendered as a separate inline.

md_bullet(inlines)

Arkansas
California
Colorado
Connecticut
~~Delaware~~

These functions demonstrate how gluedown can be used to transition between R vectors, simply formatted markdown text, and beautifully formatted HTML text.

Aside from container blocks and inlines, there is a third type of markdown content. The leaf blocks cannot contain inline content. The thematic break is an example of a leaf block.

md_rule(char = "*", n = 80)

Code blocks are another type of leaf block. The code we’ve been writing so far is contained within rmarkdown chunks, which execute the code within. By default, those code chunks are then displayed as regular code blocks in the intermediary .md file. Sometimes we might want to use code blocks to display other types of text. Perhaps we want to show the content of a function. The md_fence() function creates a new code fence from the lines created by deparse().

lines <- deparse(md_bullet)
md_fence(lines)
function (x, marker = c("*", "-", "+")) 
{
    marker <- match.arg(marker)
    glue::glue("{marker} {x}")
}

Or perhaps we want to display some code from another language that isn’t supposed to be executed

command <- "sudo apt install r-base-dev"
md_fence(paste("$", command), char = "~", info = "bash")
#> ~~~bash
#> $ sudo apt install r-base-dev
#> ~~~

Pipes

The package has been designed to fit well in a traditional R workflow so users can seamlessly create content with their code and display that content with gluedown. In that spirit, all functions are designed to fit within the tidyverse ecosystem by working with pipes. Pipes allow users to pass the results of one function into the beginning of the next. By ending this “pipeline” with md_quote(), we chain together five coding steps:

Read the HTML text of a Wikipedia page
Extract the first <blockquote> tag
Convert that tag to a character vector
Remove Wikipedia’s bracketed note
Print that vector as a markdown block quote

read_html("https://w.wiki/A58") %>% # 1
  html_element("blockquote") %>%    # 2
  html_text(trim = TRUE) %>%        # 3
  str_remove("\\[(.*)\\]") %>%      # 4
  md_quote()                        # 5

We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

Extensions

The package primarily uses GitHub Flavored Markdown (GFM), a site-specific version of the CommonMark specification, an unambiguous implementation of John Gruber’s original Markdown. With this flavor, some useful extensions like task lists are supported on GitHub. Elsewhere, like this HTML vignette, a task list will just render as a bullet list. You can learn more about how GFM us implemented in this package’s other vignette.

legislation <- c("Houses passes", "Senate concurs", "President signs")
md_task(legislation, check = 1:2)

Houses passes
Senate concurs
President signs

Markdown tables are another extremely useful extension. The md_table() functions wraps around the much more powerful knitr::kable() function, which allows data frames to be printed in a number of alternative formats. Printing data frames is a very typical use case for documenting the process of data science. With small summary tables like the one below, a markdown table is much more readable than the plain text tibble or data frame printed by default.

print(head(state.x77))
#>            Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
#> Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
#> Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
#> Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
#> Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
#> California      21198   5114        1.1    71.71   10.3    62.6    20 156361
#> Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766

md_table(head(state.x77), digits = 2)

	Population	Income	Illiteracy	Life Exp	Murder	HS Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766

Markdown Literal Programming

Vector Lists

Pipes

Extensions

Inlines