--- title: "ring" author: "Rich FitzJohn" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{ring} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` {r echo = FALSE, results = "hide"} knitr::opts_chunk$set( error = FALSE, fig.width = 7, fig.height = 5) set.seed(1) ``` This package implements [ring buffers](https://en.wikipedia.org/wiki/Ring_buffer). A ring buffer can be used as a first-in-first-out (FIFO) buffer where the maximum size is known ahead of time. Because they do not grow in size, they are useful to avoid using more and more memory as a process runs. Because the data reading and writing happens in an (apparently) circular way, once data is added to the buffer it is not copied (in contrast if you used a vector then every time data is consumed you'd have to shuffle the vector around). `ring` implements two different ring buffers that will likely suit different applications. * `ring_buffer_bytes` is implemented as a byte array in C, possibly with a "stride" indicating a set of bytes that go together. Once the data reaches the end of the array we start writing to the beginning again. * `ring_buffer_env` is implemented as a doubly linked list using R's environments. This buffer can hold arbitrary R objects at each position. The target audience for this package is either other package developers who need a ring buffer in their package, or modellers who have decided that a ring buffer is the right data structure to help with their simulation model. For all buffers, `head` will refer to the next bit of the buffer to be written to, and `tail` will refer to the next bit of the buffer to be read. That is, elements are pushed onto the `head` of the buffer and retrieved from the `tail`. (There is no direct analogy between these terms and the R functions `head` and `tail` which operate on fixed-size vectors.) # The environment buffer `ring_buffer_env` This is the simplest buffer to understand because we don't have to deal with raw vectors. To create a buffer that can hold up to 100 elements, use the `ring_buffer_env` function: ``` {r } buf <- ring::ring_buffer_env(100) ``` This is an [R6](https://github.com/r-lib/R6) class, with several methods: ``` {r } buf ``` Operations on the class happen by running methods using `$`. So the size of the buffer: ``` {r } buf$size() ``` ...the number of elements free and used: ``` {r } buf$free() buf$used() ``` ...whether the buffer is empty or full: ``` {r } buf$is_empty() buf$is_full() ``` To start using the buffer we need to put some data in it. There are two main functions for adding data: * `buf$set(data, n)` sets `n` elements to be the value `data` * `buf$push(data, iterate)` pushes `data` into the buffer, with the `iterate` argument indicating if we should iterate over `data` or treat it as a single element So to set the first 5 elements to be "a", "b", ..., "e", use: ``` {r } buf$push(letters[1:5]) ``` The buffer is no longer empty ``` {r } buf$is_empty() ``` ...having 5 elements: ``` {r } buf$used() ``` ...and room for 95 more: ``` {r } buf$free() ``` To read the content of the buffer without modifying it, use `read(n)` where `n` is the number of elements to read. This *always* returns a list of length `n`: ``` {r } buf$read(1) buf$read(2) ``` If you try to read too far, then the buffer will underflow and you will get an error: ``` {r error = TRUE} buf$read(20) ``` If you just want the the first element, use `tail()` ``` {r } buf$tail() ``` The tail returns the first element in (so the buffer naturally operates as a first-in-first-out queue). You can also read the most recently added element with `head()` ``` {r } buf$head() ``` And you can offset these by an integer number of steps. So moving one position into the buffer from the tail gets the second element added: ``` {r } buf$tail_offset(1) ``` or moving three elements into the buffer from the head (most recently added element) gets the same bit of data ``` {r } buf$head_offset(3) ``` The above operations are all nondestructive -- they leave the buffer unchanged. To consume elements, use `take(n)` which operates the same way as `read` but it also moves the buffer `tail`; it consumes elements leaving space for more. ``` {r } buf$free() buf$take(1) buf$free() ``` Now we have consumed an element the tail has moved along, so `tail` contains "b" and "a" is removed from the buffer: ``` {r } buf$tail() ``` To reset the buffer, use `reset()`. This empties the buffer of all data: ``` {r } buf$reset() buf$used() buf$is_empty() ``` While the ring buffer is fixed in size in typical use, you can grow it explicitly. To add additional space, use the `grow` method: ``` {r } buf$size() buf$grow(20) buf$size() ``` ## Application: simulation with recent history The whole point of the ring buffer though is that we can push things onto it and pull the most recent out, even when the number of things pushed *overall* is larger than the buffer size. So imagine a simulation where we need to keep track of the last 5 steps. The simulation is a random walk. ``` {r } step <- function(x) { if (runif(1) < 0.5) x - 1L else x + 1L } x <- 0L buf <- ring::ring_buffer_env(5) h <- integer(20) buf$push(x) h[1L] <- x set.seed(1) for (i in seq_len(length(h) - 1L)) { x <- step(x) buf$push(x) h[i + 1L] <- x } ``` The whole history: ``` {r } h ``` The last 5 steps: ``` {r } unlist(buf$read(5)) ``` So we could rewrite the simulation so that the random walk tends up if the last few steps have been increases and tends down if the last few steps have been decreases: ``` {r } step <- function(x) { if (length(x) > 1) { p <- mean(diff(x)) / 2 + 0.5 } else { p <- 0.5 } if (runif(1) < p) x[length(x)] - 1L else x[length(x)] + 1L } x <- 0L buf <- ring::ring_buffer_env(5) h <- integer(100) buf$push(x) h[1L] <- x set.seed(1) for (i in seq_len(length(h) - 1L)) { x <- step(unlist(buf$read(buf$used()))) buf$push(x) h[i + 1L] <- x } ``` Now we have a simulation with a strong mean reverting tendency: ``` {r } par(mar = c(4, 4, .5, .5)) plot(h, type = "l", xlab = "step", ylab = "y", las = 1) ``` Because the buffer always holds the last 5 (or fewer) elements the book-keeping involved in working with the last few elements out is simplified. Ignoring the fact that we hold the entire history in the fixed size vector `h`, only the last few elements need to be retained which may be useful if the simulation generates a lot of data. A downside of this implementation is that `buf$read()` returns a list that must be turned into a vector with `unlist`, even though we know in this case that the simulation will always produce an integer vector. The ring buffers described below can help with that problem. # The bytes buffer `ring_buffer_bytes` This is the classical implementation of a ring buffer, and the implementation is broadly based on the one [here](https://github.com/dhess/c-ringbuf), by [\@dhess](https://github.com/dhess). This operates basically the same way as `ring_buffer_env`, and presents a very similar interface to R, but with a few key differences: * The contents of the buffer are raw bytes (using R's raw vectors). These are a bit fiddly to work with but can be very powerful. * The `iterate` distinction of `push` disappears because there is no ambiguity with R objects To construct a buffer of 1000 bytes: ``` {r } buf <- ring::ring_buffer_bytes(1000) ``` Most of the same methods apply directly: ``` {r } buf$free() buf$used() buf$is_full() buf$is_empty() ``` Generate a byte sequence: ``` {r } bytes <- as.raw(0:255) ``` ...and push them into the buffer: ``` {r } buf$push(bytes) ``` ...read from the buffer ``` {r } buf$read(10) ``` ...destructively take the oldest elements ``` {r } buf$used() buf$take(20) buf$used() ``` ## Striding Single bytes can hold only values 0 to 255 (or character equivalents, such as `a` becomes `r charToRaw("a")` via `charToRaw("a")`. But if you want to hold a full integer, that (usually) takes 4 bytes, a double (usually) takes 8. To allow this, a bytes buffer can be "strided"; this indicates the number of consecutive bytes that should together make up one logical entry. The buffer then contains `size` of these. So to create a buffer of 100 entries, each of `8` bytes you could do: ``` {r } buf <- ring::ring_buffer_bytes(100, 8) ``` Each element pushed onto the buffer must have the correct size. So to push the byte sequence 1..8 onto the buffer: ``` {r } buf$push(as.raw(1:8)) ``` but if you pushed more or less it would be an error: ``` {r error = TRUE} buf$push(as.raw(1:4)) ``` Reading happens in *logical* units, not bytes: ``` {r } buf$read(1) ``` and you can get the number of elements used: ``` {r } buf$used() ``` or the number of bytes ``` {r } buf$used(bytes = TRUE) ``` ## The typed bytes buffer `ring_buffer_bytes_typed` If 8 bytes is a double, it should be possible to make a bytes buffer that holds one (or more) doubles per entry. That is what the `ring_buffer_bytes_typed` buffer does, with a few corner cases dealt with. To use, you decide what the *R interpretation* of an entry is, it will determine the size per entry and appropriate encoding and decoding functions and you can ignore that it is storing bytes. For performance reasons this does not use R's serialisation and simply copies the data stored in vectors. For example, to make a buffer of 10 elements, each of which is a single real number (double), use: ``` {r } buf <- ring::ring_buffer_bytes_typed(10, double(1)) ``` onto which real numbers can be pushed: ``` {r } buf$push(pi) ``` And retrieve the data. ``` {r } buf$take(1) ``` Entries can contain more than one number; to make a buffer of length 10, each element of which is a vector of 5 doubles: ``` {r } buf <- ring::ring_buffer_bytes_typed(10, double(5)) buf$push(rnorm(5)) buf$read(1) ``` Because this is just implemented as a byte array, we can just push a bunch of numbers straight into the buffer: ``` {r } buf$push(rnorm(5 * 10)) ``` With elements in the buffer, we can request them. The integer argument of `take` indicates the number of groups of 5 doubles we would like back: ``` {r } buf$take(1) ``` If you try to take more than is in the buffer it is an error: ``` {r error = TRUE} buf$take(10) ``` ## The translating bytes buffer `ring_buffer_bytes_translate` The `ring_buffer_bytes_typed` function is implemented by translating R objects to bytes (when storing with `$set()`, `$push()`, etc). and from bytes back to R objects (when retrieving with `$read()`, `$take()`, etc). `ring_buffer_bytes_translate` exposes this interface. The "typed" buffers do not allow storing strings because they can be any number of bytes long (the bytes buffers require a fixed "stride" within a buffer). But we can store fixed length strings. To convert a string to a byte sequence, use `charToRaw` (or `as.raw(utf8ToInt(x))`, but then multi-byte sequences might start being difficult). ``` {r } (bytes <- charToRaw("hello world")) ``` The inverse transformation is `rawToChar` (or `intToUtf8(as.integer(x))`): ``` {r } rawToChar(bytes) ``` The function `ring_buffer_bytes_translate` takes these functions as its 3rd and fourth arguments. So to make a buffer that will hold up to 100 strings, each of 8 bytes: ``` {r } b <- ring::ring_buffer_bytes_translate(100, 8, charToRaw, rawToChar) ``` We can now store 8 character strings: ``` {r } b$push("abcdefgh") b$tail() ``` But other length strings cannot be added: ``` {r error = TRUE} b$push("hello!") ``` Probably this would be most useful storing just single characters as then it would make a buffer of *text*. # The C API The `ring` package can be used in other R packages using the `LinkingTo` mechanism. To do so: * In your `DESCRIPTION`, add a line `LinkingTo: ring` (you do not need to include `ring` in `Depends` or `Imports` as we need it only for the package build). * In your `src/` directory, add a file `ring.c` containing just the line `#include ` (but see the note in the documentation for `ring_buffer_create` below). * Anywhere in your code you want to use the ring buffer, include the line `#include ` to include the prototypes and use the interface as described below. (I am not sure what the best practice way of doing this with a standalone shared library compiled with `R CMD SHLIB` is though; probably best to make a package.) The C API is documented only in the header file, and it should be fairly straightforward to use (with reference to the docs above; this is the code underlying the `ring_buffer_bytes` interface). ``` {r echo = FALSE, results = "asis"} writeLines(c("```c", readLines(system.file("include/ring/ring.h", package = "ring")), "```")) ``` For a complete real-world example of use, see [dde](https://github.com/mrc-ide/dde), which uses a ring buffer to hold the history of a set of differential equations, and uses that to implement delay equations. Here, the ring buffer means that the memory requirements don't grow with the length of running the simulation (as it only cares about fairly recent history, the natural overflow from the ring buffer is well suited). The memory is only allocated at the beginning of the simulation so there is no additional memory allocations. And because `ring` returns (const) pointers to the appropriate place in memory there is little copying. A simple application that implements the same mean-reverting simulation from above: ``` {r echo = FALSE, results = "asis"} writeLines(c("```c", readLines(system.file("examples/example.c", package = "ring")), "```")) ``` ## A nontrivial example In the [`dde`](https://github.com/mrc-ide/dde) package (not yet on CRAN), I use ring buffers to solve delay differential equations (DDEs). To solve these, we need to know the state of the system at a series of points in the past. So at every time step we push the state of the system onto a ring buffer. Then, as the solver moves forward in time we can get the system at some previous point in time by looking back through the ring buffer until the time in question is found. In this application a ring buffer is the ideal data structure because we often want to solve equations where the time we look back is a small fraction of the total time. Without a ring buffer we'd either have to store the _entire_ history (with a large memory cost, most of which is not needed) or periodically copy the history around. To use `ring` within the `dde` package: * In the `DESCRIPTION` we [declare a link to `ring`](https://github.com/mrc-ide/dde/blob/7ebaefd/DESCRIPTION#L14) using the `LinkingTo:` field. * In the `src` directory, [the contents of `` are included](https://github.com/mrc-ide/dde/blob/7ebaefd/src/ring.c); this is possible because of the `LinkingTo` field. This file now includes all the actual ring buffer implementation. * In [src/dopri.h](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.h#L8) we include `` which allows the ring buffer code to be used in any file that includes `dopri.h`. There is a [data structure in this header](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.h#L77-L111) that includes within itself a ring buffer to hold the history. * In [src/dopri.c](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c) the ring buffer code is actually used: - [initialisation](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L48-L50) - [a time is found within the ring buffer](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L694-L719) - [the ring buffer is advanced](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L417) - [the data is freed](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri.c#L220) * In [src/dopri_5.c](https://github.com/mrc-ide/dde/blob/7ebaefd/src/dopri_5.c#L109-L119) new data is written to the head of the ring buffer, being the state of the system at the end of the step. The history head is treated as a big block of contiguous doubles. Used this way, the programmer can focus on simply writing to the application and do as little work on bookkeeping as possible. # The C++ API If you're using C++ you may find the [Boost circular buffer](https://www.boost.org/doc/libs/1_61_0/doc/html/circular_buffer.html) is likely to be far better; you can use this by `LinkingTo:` the `BH` package and using `#include ` in your code. Alternatively, the `ring` C code can be directly used in C++ as above. Or, there is a class-based approach available: * In your `src/` directory, add a file `ring.cpp` containing just the line `#include ` * Anywhere in your code you want to use the ring buffer, include the line `#include ` to include the class definition: ``` {r echo = FALSE, results = "asis"} writeLines(c("```cpp", readLines(system.file("include/ring/ring.hpp", package = "ring")), "```")) ```