ympes provides a collection of lightweight helper functions (imps) both for interactive use and for inclusion within other packages. It’s my attempt to save some functionality that would otherwise get lost in a script somewhere on my computer. To that end it’s a bit of a hodgepodge of things that I’ve found useful at one time or another and, more importantly, remembered to include here!
library(ympes)
I often want to quickly see what a palette looks like to ensure I can
distinguish the different colours. The imaginatively named plot_palette()
thus provides a quick overview
plot_palette(c("#5FE756", "red", "black"))
We can make the plot square(ish) by setting the argument square = TRUE
. A nice
side effect of this is the automatic adjusting of labels to account for the
underlying colour
plot_palette(palette.colors(palette = "R4"), square = TRUE)
Sometimes you just want to find rows of a data frame where a particular string
occurs. greprows()
searches for pattern matches within a data frames columns
and returns the related rows or row indices. It is a thin wrapper around a
subset, lapply and reduce grep()
based approach.
dat <- data.frame(
first = letters,
second = factor(rev(LETTERS)),
third = "Q"
)
greprows(dat, "A|b")
#> [1] 2 26
grepvrows() is identical to greprows() except with the default value = TRUE.
grepvrows(dat, "A|b")
first | second | third |
---|---|---|
b | Y | Q |
z | A | Q |
greprows(dat, "A|b", value = TRUE)
first | second | third |
---|---|---|
b | Y | Q |
z | A | Q |
greplrows() returns a logical vector (match or not for each row of dat).
greplrows(dat, "A|b", ignore.case = TRUE)
#> [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] TRUE TRUE
One of my favourite functions in \R is strcapture()
. This function allows you
to extract the captured elements of a regular expression in to a tabular data
structure. Being able to parse input strings from a file to correctly split
columns in a data frame in a single function call feels so elegant.
To illustrate this, we generate some synthetic movement data which we pretend to have loaded in from a file. Each entry has the form “Name-Direction-Value” with the first two entries representing character strings and, the last entry, an integer value.
movements <- function(length) {
x <- lapply(
list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
sample,
size = length,
replace = TRUE
)
do.call(paste, c(x, sep = "-"))
}
# just a small sample to begin with
(dat <- movements(3))
#> [1] "Bob-Up-1" "Bob-Right-7" "Bob-Up-1"
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
Name | Direction | Value |
---|---|---|
Bob | Up | 1 |
Bob | Right | 7 |
Bob | Up | 1 |
For small (define as you wish) data sets this works fine. Unfortunately as the
number of entries increases the performance decays (see
https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis).
fstrapture()
attempts to improve upon this by utilising an approach I saw
implemented by Toby Hocking in the nc
and the function nc::capture_first_vec()
.
# Now a larger number of strings
dat <- movements(1e5)
(t <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
#> user system elapsed
#> 0.819 0.022 0.845
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
#> user system elapsed
#> 0.020 0.000 0.021
t[["elapsed"]] / t2[["elapsed"]]
#> [1] 40.2381
As well as the improved performance you will notice two other differences
between the two function signatures. Firstly, to make things more pipeable, the
data parameter x
appears before the pattern
parameter. Secondly,
fstrcapture()
works only with Perl-compatible regular expressions.
cc()
is for those of us that get fed up typing quotation marks. It accepts
either comma-separated, unquoted names that you wish to quote or, a
length one character vector that you wish to split by whitespace. Intended
mainly for interactive use only, an example is likely more enlightening than
my description
cc(dale, audrey, laura, hawk)
#> [1] "dale" "audrey" "laura" "hawk"
cc("dale audrey laura hawk")
#> [1] "dale" "audrey" "laura" "hawk"
Sometimes I find myself needing to add a temporary variable to a data frame
without kaboshing a variable already present. new_name()
provides a simple
wrapper around tempfile()
that generates random column names and checks for
their suitability. Not normally the sort of thing I’d wrap but I find myself
writing the same code a lot so here we are
new_name(mtcars)
#> [1] "new127033b9b8f63"
new_name(mtcars, 3L)
#> [1] "new127039fcc7e0" "new12703448de49f" "new1270319e254a2"
Where better place for yet another implementation of bespoke assertion functions
than a small helper package!. Motivated by vctrs::vec_assert()
but with lower
overhead at a cost of less flexibility. The assertion functions in ympes are
designed to make it easy to identify the top level calling function whether used
within a user facing function or internally. They are somewhat experimental in
nature and should be treated accordingly!
Currently implemented are:
assert_character()
, assert_chr()
,
assert_character_not_na()
, assert_chr_not_na()
,
assert_scalar_character()
, assert_scalar_chr()
,
assert_scalar_character_not_na()
, assert_scalar_chr_not_na()
,
assert_string()
, assert_string_not_na()
,
assert_double()
, assert_dbl()
,
assert_double_not_na()
, assert_dbl_not_na()
,
assert_scalar_double()
, assert_scalar_dbl()
,
assert_scalar_double_not_na()
, assert_scalar_dbl_not_na()
,
assert_integer()
, assert_int()
,
assert_integer_not_na()
, assert_int_not_na()
,
assert_scalar_integer()
, assert_scalar_int()
,
assert_scalar_integer_not_na()
, assert_scalar_int_not_na()
,
assert_integerish()
, assert_whole()
assert_scalar_whole()
, assert_scalar_integerish()
,
assert_logical()
, assert_lgl()
,
assert_logical_not_na()
, assert_lgl_not_na()
,
assert_scalar_logical()
, assert_scalar_lgl()
,
assert_scalar_logical_not_na()
, assert_scalar_lgl_not_na()
,
assert_bool()
, assert_boolean()
,
assert_list()
,
assert_data_frame()
,
assert_negative()
, assert_negative_or_na()
,
assert_positive()
, assert_positive_or_na()
,
assert_non_negative()
, assert_non_negative_or_na()
,
assert_non_positive()
, assert_non_positive_or_na()
,
assert_numeric()
, assert_num()
,
assert_numeric_not_na()
, assert_num_not_na()
,
assert_scalar_numeric()
, assert_scalar_num()
,
assert_scalar_numeric_not_na()
, assert_scalar_num_not_na()
,
assert_between()
Hopefully most of these are self-explanatory but there is some opinionated (currently undocumented) handling of NA so care should be taken to inspect the underlying source code before using.
Currently these assertions return NULL if the assertion succeeds. Otherwise an error of class “ympes-error” (with optional subclass if supplied when calling the assertion).
# Use in a user facing function
fun <- function(i, d, l, chr, b) {
assert_scalar_int(i)
TRUE
}
fun(i=1L)
#> [1] TRUE
try(fun(i="cat"))
# Use in an internal function
internal_fun <- function(a) {
assert_string(
a,
.arg = deparse(substitute(x)),
.call = sys.call(-1L),
.subclass = "example_error"
)
TRUE
}
external_fun <- function(b) {
internal_fun(a=b)
}
external_fun(b="cat")
#> [1] TRUE
try(external_fun(b = letters))
tryCatch(external_fun(b = letters), error = class)
#> [1] "example_error" "ympes-error" "error" "condition"