ympes

ympes provides a collection of lightweight helper functions (imps) both for interactive use and for inclusion within other packages. It’s my attempt to save some functionality that would otherwise get lost in a script somewhere on my computer. To that end it’s a bit of a hodgepodge of things that I’ve found useful at one time or another and, more importantly, remembered to include here!

library(ympes)

Visualising palettes

I often want to quickly see what a palette looks like to ensure I can distinguish the different colours. The imaginatively named plot_palette() thus provides a quick overview

plot_palette(c("#5FE756", "red", "black"))

A plot with 3 rectangular regions, coloured green, red and black.

We can make the plot square(ish) by setting the argument square = TRUE. A nice side effect of this is the automatic adjusting of labels to account for the underlying colour

plot_palette(palette.colors(palette = "R4"), square = TRUE)

A plot of the 8 colours that define the ‘R4’ palette. The plot is divided in to a 3 by 3 square (one square is left blank).

Finding strings

Sometimes you just want to find rows of a data frame where a particular string occurs. greprows() searches for pattern matches within a data frames columns and returns the related rows or row indices. It is a thin wrapper around a subset, lapply and reduce grep() based approach.

dat <- data.frame(
    first = letters,
    second = factor(rev(LETTERS)),
    third = "Q"
)
greprows(dat, "A|b")
#> [1]  2 26

grepvrows() is identical to greprows() except with the default value = TRUE.

grepvrows(dat, "A|b")
first second third
b Y Q
z A Q
greprows(dat,  "A|b", value = TRUE)
first second third
b Y Q
z A Q

greplrows() returns a logical vector (match or not for each row of dat).

greplrows(dat, "A|b", ignore.case = TRUE)
#>  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25]  TRUE  TRUE

Capturing strings

One of my favourite functions in \R is strcapture(). This function allows you to extract the captured elements of a regular expression in to a tabular data structure. Being able to parse input strings from a file to correctly split columns in a data frame in a single function call feels so elegant.

To illustrate this, we generate some synthetic movement data which we pretend to have loaded in from a file. Each entry has the form “Name-Direction-Value” with the first two entries representing character strings and, the last entry, an integer value.

movements <- function(length) {
    x <- lapply(
        list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
        sample,
        size = length,
        replace = TRUE
    )
    do.call(paste, c(x, sep = "-"))
}

# just a small sample to begin with
(dat <- movements(3))
#> [1] "Bob-Up-4"     "Mary-Right-3" "Mary-Right-9"
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto   <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
Name Direction Value
Bob Up 4
Mary Right 3
Mary Right 9

For small (define as you wish) data sets this works fine. Unfortunately as the number of entries increases the performance decays (see https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis). fstrapture() attempts to improve upon this by utilising an approach I saw implemented by Toby Hocking in the nc and the function nc::capture_first_vec().

# Now a larger number of strings
dat <- movements(1e5)
(t  <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
#>    user  system elapsed 
#>   0.829   0.035   0.868 
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
#>    user  system elapsed 
#>   0.021   0.000   0.021 
t[["elapsed"]] / t2[["elapsed"]]
#> [1] 41.33333

As well as the improved performance you will notice two other differences between the two function signatures. Firstly, to make things more pipeable, the data parameter x appears before the pattern parameter. Secondly, fstrcapture() works only with Perl-compatible regular expressions.

Combining values for lazy people

cc() is for those of us that get fed up typeing quotation marks. It accepts either comma-separated, unquoted names that you wish to quote or, a length one character vector that you wish to split by whitespace. Intended mainly for interactive use only, an example is likely more enlightening than my description

cc(dale, audrey, laura, hawk)
#> [1] "dale"   "audrey" "laura"  "hawk"  
cc("dale audrey laura hawk")
#> [1] "dale"   "audrey" "laura"  "hawk"