---
title: "ympes"
output:
litedown::html_format:
options:
embed_resources: ["all"]
vignette: >
%\VignetteEngine{litedown::vignette}
%\VignetteIndexEntry{ympes}
%\VignetteEncoding{UTF-8}
---
ympes provides a collection of lightweight helper functions (imps) both for
interactive use and for inclusion within other packages. It's my attempt to save
some functionality that would otherwise get lost in a script somewhere on my
computer. To that end it's a bit of a hodgepodge of things that I've
found useful at one time or another and, more importantly, remembered to include
here!
```{r}
library(ympes)
```
## Visualising palettes
I often want to quickly see what a palette looks like to ensure I can
distinguish the different colours. The imaginatively named `plot_palette()`
thus provides a quick overview
```{r}
#| fig.alt = "A plot with 3 rectangular regions, coloured green, red and black."
plot_palette(c("#5FE756", "red", "black"))
```
We can make the plot square(ish) by setting the argument `square = TRUE`. A nice
side effect of this is the automatic adjusting of labels to account for the
underlying colour
```{r}
#| fig.alt = "A plot of the 8 colours that define the 'R4' palette. The plot is
#| divided in to a 3 by 3 square (one square is left blank)."
plot_palette(palette.colors(palette = "R4"), square = TRUE)
```
## Finding strings
Sometimes you just want to find rows of a data frame where a particular string
occurs. `greprows()` searches for pattern matches within a data frames columns
and returns the related rows or row indices. It is a thin wrapper around a
subset, lapply and reduce `grep()` based approach.
```{r}
dat <- data.frame(
first = letters,
second = factor(rev(LETTERS)),
third = "Q"
)
greprows(dat, "A|b")
```
grepvrows() is identical to greprows() except with the default value = TRUE.
```{r}
grepvrows(dat, "A|b")
greprows(dat, "A|b", value = TRUE)
```
greplrows() returns a logical vector (match or not for each row of dat).
```{r}
greplrows(dat, "A|b", ignore.case = TRUE)
```
## Capturing strings
One of my favourite functions in \R is `strcapture()`. This function allows you
to extract the captured elements of a regular expression in to a tabular data
structure. Being able to parse input strings from a file to correctly split
columns in a data frame in a single function call feels so elegant.
To illustrate this, we generate some synthetic movement data which we pretend
to have loaded in from a file. Each entry has the form "Name-Direction-Value"
with the first two entries representing character strings and, the last entry,
an integer value.
```{r}
movements <- function(length) {
x <- lapply(
list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
sample,
size = length,
replace = TRUE
)
do.call(paste, c(x, sep = "-"))
}
# just a small sample to begin with
(dat <- movements(3))
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
```
For small (define as you wish) data sets this works fine. Unfortunately as the
number of entries increases the performance decays (see
https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis).
`fstrapture()` attempts to improve upon this by utilising an approach I saw
implemented by Toby Hocking in the [nc](https://cran.r-project.org/package=nc)
and the function `nc::capture_first_vec()`.
```{r}
# Now a larger number of strings
dat <- movements(1e5)
(t <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
t[["elapsed"]] / t2[["elapsed"]]
```
As well as the improved performance you will notice two other differences
between the two function signatures. Firstly, to make things more pipeable, the
data parameter `x` appears before the `pattern` parameter. Secondly,
`fstrcapture()` works only with Perl-compatible regular expressions.
## Combining values for lazy people
`cc()` is for those of us that get fed up typing quotation marks. It accepts
either comma-separated, unquoted names that you wish to quote or, a
length one character vector that you wish to split by whitespace. Intended
mainly for interactive use only, an example is likely more enlightening than
my description
```{r}
cc(dale, audrey, laura, hawk)
cc("dale audrey laura hawk")
```
## Avoid overwriting data frame columns
Sometimes I find myself needing to add a temporary variable to a data frame
without kaboshing a variable already present. `new_name()` provides a simple
wrapper around `tempfile()` that generates random column names and checks for
their suitability. Not normally the sort of thing I'd wrap but I find myself
writing the same code a lot so here we are
```{r}
new_name(mtcars)
new_name(mtcars, 3L)
```
## Assertions (Experimental)
Where better place for yet another implementation of bespoke assertion functions
than a small helper package!. Motivated by `vctrs::vec_assert()` but with lower
overhead at a cost of less flexibility. The assertion functions in ympes are
designed to make it easy to identify the top level calling function whether used
within a user facing function or internally. They are somewhat experimental in
nature and should be treated accordingly!
Currently implemented are:
`assert_character()`, `assert_chr()`,
`assert_character_not_na()`, `assert_chr_not_na()`,
`assert_scalar_character()`, `assert_scalar_chr()`,
`assert_scalar_character_not_na()`, `assert_scalar_chr_not_na()`,
`assert_string()`, `assert_string_not_na()`,
`assert_double()`, `assert_dbl()`,
`assert_double_not_na()`, `assert_dbl_not_na()`,
`assert_scalar_double()`, `assert_scalar_dbl()`,
`assert_scalar_double_not_na()`, `assert_scalar_dbl_not_na()`,
`assert_integer()`, `assert_int()`,
`assert_integer_not_na()`, `assert_int_not_na()`,
`assert_scalar_integer()`, `assert_scalar_int()`,
`assert_scalar_integer_not_na()`, `assert_scalar_int_not_na()`,
`assert_integerish()`, `assert_whole()`
`assert_scalar_whole()`, `assert_scalar_integerish()`,
`assert_logical()`, `assert_lgl()`,
`assert_logical_not_na()`, `assert_lgl_not_na()`,
`assert_scalar_logical()`, `assert_scalar_lgl()`,
`assert_scalar_logical_not_na()`, `assert_scalar_lgl_not_na()`,
`assert_bool()`, `assert_boolean()`,
`assert_list()`,
`assert_data_frame()`,
`assert_negative()`, `assert_negative_or_na()`,
`assert_positive()`, `assert_positive_or_na()`,
`assert_non_negative()`, `assert_non_negative_or_na()`,
`assert_non_positive()`, `assert_non_positive_or_na()`,
`assert_numeric()`, `assert_num()`,
`assert_numeric_not_na()`, `assert_num_not_na()`,
`assert_scalar_numeric()`, `assert_scalar_num()`,
`assert_scalar_numeric_not_na()`, `assert_scalar_num_not_na()`,
`assert_between()`
Hopefully most of these are self-explanatory but there is some opinionated
(currently undocumented) handling of NA so care should be taken to inspect
the underlying source code before using.
Currently these assertions return NULL if the assertion succeeds. Otherwise an
error of class "ympes-error" (with optional subclass if supplied when calling
the assertion).
```{r}
# Use in a user facing function
fun <- function(i, d, l, chr, b) {
assert_scalar_int(i)
TRUE
}
fun(i=1L)
try(fun(i="cat"))
# Use in an internal function
internal_fun <- function(a) {
assert_string(
a,
.arg = deparse(substitute(x)),
.call = sys.call(-1L),
.subclass = "example_error"
)
TRUE
}
external_fun <- function(b) {
internal_fun(a=b)
}
external_fun(b="cat")
try(external_fun(b = letters))
tryCatch(external_fun(b = letters), error = class)
```