The *fastverse* is a suite of complementary high-performance
packages for statistical computing and data manipulation in R. Developed
independently by various people, *fastverse* packages jointly
contribute to the objectives of:

- Speeding up R through heavy use of compiled code (C, C++, Fortran)
- Enabling more complex statistical and data manipulation operations in R
- Reducing the number of dependencies required for advanced computing in R

The `fastverse`

package is a meta-package providing
utilities for easy installation, loading and management of these
packages. It is an extensible framework that allows users to
(permanently) add or remove packages to create a ‘verse’ of packages
suiting their general needs, or even create separate ‘verses’ of their
own.

*fastverse* packages are jointly attached with
`library(fastverse)`

, and several functions starting with
`fastverse_`

help manage dependencies, detect namespace
conflicts, add/remove packages from the *fastverse* and update
packages. The **vignette**
provides a concise overview of the package.

The *fastverse* installs with 4 core packages^{1} (5
dependencies in total) which provide broad C/C++ based statistical and
data manipulation functionality and have carefully managed APIs.

**data.table**: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, flexible split-apply-combine computing, reshaping, (rolling) joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.**collapse**: Fast grouped and weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes including*xts*,*data.table*,*tibble*,*plm*,*sf*.**kit**: Parallel (row-wise) statistical functions, vectorized and nested switches, and some utilities such as efficient partial sorting.**magrittr**: Efficient pipe operators and aliases for enhanced R programming and code un-nesting.

```
# Install the CRAN version
install.packages("fastverse")
# Install (Windows/Mac binaries) from R-universe
install.packages("fastverse", repos = "https://fastverse.r-universe.dev")
# Install from GitHub (requires compilation)
::install_github("fastverse/fastverse") remotes
```

*Note* that the GitHub/r-universe version is not a development
version, development takes place in the ‘development’ branch.

Users can, via the `fastverse_entend()`

function, freely
attach extension packages. Setting `permanent = TRUE`

adds
these packages to the core *fastverse*. Another option is adding
a `.fastverse`

config file with packages to the project
directory. Separate verses can be created with
`fastverse_child()`

. See the **vignette**
for details.

High-performing packages for different data manipulation and statistical computing topics are suggested below. The total (recursive) dependency count is indicated for each package.

**xts**and**zoo**: Fast and reliable matrix-based time series classes providing fully identified ordered observations and various utilities for plotting and computations (1 dependency).**roll**: Fast rolling and expanding window functions for vectors and matrices (3 dependencies).*Notes*:*xts*/*zoo*objects are preserved by*roll*functions and by*collapse*’s time series and data transformation functions^{2}. As*xts*/*zoo*objects are matrices, all*matrixStats*functions apply to them as well.*xts*objects can also easily be converted to and from*data.table*, which also has some fast rolling functions like`frollmean`

and`frollapply`

.

**lubridate**: Facilitates ‘POSIX-’ and ‘Date’ based computations (2 dependencies).**anytime**: Anything to ‘POSIXct’ or ‘Date’ converter (2 dependencies).**fasttime**: Fast parsing of strings to ‘POSIXct’ (0 dependencies).**nanotime**: Provides a coherent set of temporal types and functions with nanosecond precision -

based on the ‘integer64’ class (7 dependencies).**clock**: Comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) (6 dependencies).**timechange**: Efficient manipulation of date-times accounting for time zones and daylight saving times (1 dependency).*Notes*: Date and time variables are preserved in many*data.table*and*collapse*operations.*data.table*additionally offers an efficient integer based date class ‘IDate’ with some supporting functionality.*xts*and*zoo*also provide various functions to transform dates, and*zoo*provides classes ‘yearmon’ and ‘yearqtr’ for convenient computation with monthly and quarterly data. Package*mondate*also provides a class ‘mondate’ for monthly data.

**stringi**: Main R package for fast, correct, consistent, and convenient string/text manipulation (backend to*stringr*and*snakecase*) (0 dependencies).**stringr**: Simple, consistent wrappers for common string operations, based on*stringi*(3 dependencies).**snakecase**: Convert strings into any case, based on*stringi*and*stringr*(4 dependencies).**stringfish**: Fast computation of common (base R) string operations using the ALTREP system (2 dependencies).**stringdist**: Fast computation of string distance metrics, matrices, and fuzzy matching (0 dependencies).

**matrixStats**: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns (0 dependencies).**Rfast**and**Rfast2**: Heterogeneous sets of fast functions for statistics, estimation and data manipulation operating on vectors and matrices. Missing values and object attributes are not (consistently) supported (4-5 dependencies).**vctrs**provides basic many basic programming functions for R vectors (including lists and data frames) implemented in C (such as sorting, matching, replicating, unique values, concatenating, splitting etc. of vectors). These are often significantly faster than base R equivalents, but generally not as aggressively optimized as equivalent functions found in*collapse*,*kit*,*Rfast*or*data.table*(4 dependencies).**parallelDist**: Multi-threaded distance matrix computation (3 dependencies).**coop**: Fast implementations of the covariance, correlation, and cosine similarity (0 dependencies).**rsparse**: Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines (8 dependencies). See also package**MatrixExtra**.**fastmatrix**provides a small set of functions written in C or Fortran providing fast computation of some matrices and operations useful in statistics (0 dependencies).**matrixTests**efficient execution of multiple statistical hypothesis tests on rows and columns of matrices (1 dependency).**rrapply**: The`rrapply()`

function extends base`rapply()`

by including a condition or predicate function for the application of functions and diverse options to prune or aggregate the result (0 dependencies).**dqrng**: Fast uniform, normal or exponential random numbers and random sampling (i.e. faster`runif`

,`rnorm`

,`rexp`

,`sample`

and`sample.int`

functions) (3 dependencies).**fastmap**: Fast implementation of data structures based on C++, including a key-value store (`fastmap`

), stack (`faststack`

), and queue (`fastqueque`

) (0 dependencies).**fastmatch**: A faster`match()`

function (drop-in replacement for`base::match`

, and`base::%in%`

), that keeps the hash table in memory for much faster repeated lookups (0 dependencies).**hutilscpp**provides C++ implementations of some frequently used utility functions in R (4 dependencies).*Notes*:*Rfast*has a number of like-named functions to*matrixStats*. These are simpler but typically faster and support multi-threading. Some highly efficient statistical functions can also be found scattered across various other packages, notable to mention here are*Hmisc*(60 dependencies) and*DescTools*(17 dependencies).

**sf**: Leading framework for geospatial computing and manipulation in R, offering a simple and flexible spatial data frame and supporting functionality (12 dependencies).**geos**: Provides an R API to the Open Source Geometry Engine (GEOS) C-library and a vector format with which to efficiently store ‘GEOS’ geometries, functions to extract information from, calculate relationships between, and transform geometries, and facilities to import/export geometry vectors to other spatial formats (2 dependencies).**stars**: Spatiotemporal data (raster and vector) in the form of dense arrays, with space and time being array dimensions (16 dependencies).**terra**: Methods for spatial data analysis with raster and vector data. Processing of very large (out of memory) files is supported (1 dependency).*Notes*:*collapse*can be used for efficient manipulation and computations on*sf*data frames.*sf*also offers tight integration with*dplyr*.

**dygraphs**: Interface to ‘Dygraphs’ interactive time series charting library (12 dependencies).**lattice**: Trellis graphics for R (0 dependencies).**grid**: The grid graphics package (0 dependencies).**ggplot2**: Create elegant data visualizations using the Grammar of Graphics (27 dependencies).**scales**: Scale functions for visualizations (11 dependencies).*Notes:**latticeExtra*provides extra graphical utilities base on*lattice*.*gridExtra*provides miscellaneous functions for*grid*graphics (and consequently for*ggplot2*which is based on*grid*).*gridtext*provides improved text rendering support for*grid*graphics. Many packages offer*ggplot2*extensions, (typically starting with ‘gg’) such as*ggExtra*,*ggalt*,*ggforce*,*ggmap*,*ggtext*,*ggthemes*,*ggrepel*,*ggridges*,*ggfortify*,*ggstatsplot*,*ggeffects*,*ggsignif*,*GGally*,*ggcorrplot*,*ggdendro*, etc.. Users in desperate need for greater performance may also find the (unmaintained) lwplot package useful that provides a faster and lighter version of*ggplot2*with*data.table*backend.

**tidytable**: A tidy interface to*data.table*that is*rlang*compatible. Quite comprehensive implementation of*dplyr*,*tidyr*and*purr*functions. Package uses a class*tidytable*that inherits from*data.table*. The`dt()`

function makes*data.table*syntax pipeable (12 total dependencies).**dtplyr**: A tidy interface to*data.table*built around lazy evaluation i.e. users need to call`as.data.table()`

,`as.data.frame()`

or`as_tibble()`

to access the results. Lazy evaluation holds the potential of generating more performant*data.table*code (20 dependencies).**tidyfst**: Tidy verbs for fast data manipulation. Covers*dplyr*and some*tidyr*functionality. Functions have`_dt`

suffix and preserve*data.table*object. A cheatsheet is provided (7 dependencies).**tidyft**: Tidy verbs for fast data operations by reference. Best for big data manipulation on out of memory data using facilities provided by*fst*(7 dependencies).**tidyfast**: Fast tidying of data. Covers*tidyr*functionality,`dt_`

prefix, preserves*data.table*object (2 dependencies).**maditr**: Fast data aggregation, modification, and filtering with pipes and*data.table*. Minimal implementation with functions`let()`

and`take()`

for most common data manipulation tasks. Also provides Excel-like lookup functions (2 dependencies).**table.express**also o builds*data.table*expressions from*dplyr*verbs, without executing them eagerly. Similar to*dtplyr*but less mature (17 dependencies).

**fst**: A compressed data file format that is very fast to read and write. Full random access in both rows and columns allows reading subsets from a ‘.fst’ file (2 dependencies).**qs**provides a lightning-fast and complete replacement for the`saveRDS`

and`readRDS`

functions in R. It supports general R objects with attributes and references - at similar speeds to*fst*- but does not provide on-disk random access to data subsets like*fst*(4 dependencies).**arrow**provides both a low-level interface to the Apache Arrow C++ library (a multi-language toolbox for accelerated data interchange and in-memory processing) including fast reading / writing delimited files, efficient storage of data as`.parquet`

or`.feather`

files, efficient (lazy) queries and computations, and sharing data between R and Python (14 dependencies). It provides methods for several*dplyr*functions allowing highly efficient data manipulation on arrow datasets. Check out the useR2022 workshop on working with larger than memory data with apache arrow in R, and the apache arrow R cookbook as well as the awesome-arrow-r repository.**duckdb**: DuckDB is a high-performance analytical database system that can be used on in-memory or out-of memory data (including csv,`.parquet`

files, arrow datasets, and it’s own`.duckdb`

format), and that provides a rich SQL dialect and optimized query execution for data analysis (1 dependency). It can also be used with the*dbplyr*package that translates*dplyr*code to SQL. This Article by Christophe Nicault (October 2022) demonstrates the integration of*duckdb*with R and*arrow*. Also see the official docs.**vroom**provides fast reading of delimited files (23 dependencies).*Notes*:*data.table*provides`fread`

and`fwrite`

for fast reading of delimited files.

**nCompiler**: Compiles R functions to C++, and covers basic math, distributions, vectorized math and linear algebra, as well as basic control flow. R and Compiled C++ functions can also be jointly utilized in the a class ‘nClass’ that inherits from R6. An in-progress user-manual provides an overview of the package.**ast2ast**: Also compiles R functions to C++, and is very straightforward to use (it has a single function`translate()`

to compile R functions), but less flexible than nCompiler (e.g. it currently does not support linear algebra). Available on CRAN (6 dependencies).**odin**: Implements R to C translation and compilation, but specialized for differential equation solving problems. Available on CRAN (8 dependencies).**armacmp**translates linear algebra code written in R to C++ using the Armadillo Template Library. The package can also be used to write mathematical optimization routines that are translated and optimized in C++ using*RcppEnsmallen*.**r2c**provides compilation of R functions to be applied over many groups (e.g. grouped bivariate linear regression etc.).**FastR**is a high-performance implementation of the entire R programming language, that can JIT compile R code to run on the Graal VM.**inline**allows users to write C, C++ or Fortran functions and compile them directly to an R function for use within the R session. Available on CRAN (0 dependencies).*Notes*: Many of these projects are experimental and not available as CRAN packages.

**R’s C API**is the most natural way to extend R and does not require additional packages. It is further documented in the Writing R Extensions Manual, the R Internals Manual, the**r-internals**repository and sometimes referred to in the R Blog (and some other Blogs on the web). Users willing to extend R in this way should familiarize themselves with R’s garbage collection and PROTECT Errors.**Rcpp**provides seamless R and C++ integration, and is widely used to extend R with C++. Compared to the C API compile time is slower and object files are larger, but users don’t need to worry about garbage collection and can use modern C++ as well as a rich set of R-flavored functions and classes (0 dependencies).**cpp11**provides a simpler, header-only R binding to C++ that allows faster compile times and several other enhancements (0 dependencies).**tidyCpp**provides a tidy C++ wrapping of the C API of R - to make the C API more amenable to C++ programmers (0 dependencies).**JuliaCall**Provides an R interface to the Julia programming language (11 dependencies). Other interfaces are provided by XRJulia (2 dependencies) and JuliaConnectoR (0 dependencies).**rextendr**provides an R interface to the Rust programming language (29 dependencies).**rJava**provides an R interface to Java (11 dependencies).*Notes*: There are many Rcpp extension packages binding R to powerful C++ libraries, such as linear algebra through*RcppArmadillo*and*RcppEigen*, thread-safe parallelism through*RcppParallel*etc.

- See the High-Performance and Parallel Computing Task View and the futureverse.

Please notify me of any other packages you think should be included
here. Such packages should be well designed, top-performing,
low-dependency, and, with few exceptions, provide own compiled code.
Please note that the *fastverse* focuses on general purpose
statistical computing and data manipulation, thus I won’t include fast
packages to estimate specific kinds of models here (of which R also has
a great many).