With strings encoded as a vector of characters, we can perform vector operations over the actual characters. All {charcuterie} functions aim to return a new object of class “chars” so it is also able to be printed as a string and passed to other vector-handling functions.
library(charcuterie)
#>
#> Attaching package: 'charcuterie'
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, union
To convert a regular string into a chars
object, use
chars()
. This prints as a string, but is actually a
vector
chars("string")
#> [1] "string"
# but it's a vector
unclass(chars("string"))
#> [1] "s" "t" "r" "i" "n" "g"
Only a single string can be converted this way, so if you want to produce more than one of these, I suggest
many_chars <- lapply(c("foo", "bar", "baz"), chars)
many_chars
#> [[1]]
#> [1] "foo"
#>
#> [[2]]
#> [1] "bar"
#>
#> [[3]]
#> [1] "baz"
unclass(many_chars[[2]])
#> [1] "b" "a" "r"
A regular string can be recovered using string()
which
pastes the characters back together
and this can optionally take a separator
Because the chars
object is a vector we can do vector
things, such as indexing
"string"[3] # doesn't work
#> [1] NA
chars("string")[3]
#> [1] "r"
chars("banana")[seq(2, 6, 2)]
#> [1] "aaa"
subsetting
head("string", 3) # doesn't work
#> [1] "string"
head(chars("string"), 3)
#> [1] "str"
tail(chars("string"), 3)
#> [1] "ing"
substituting
tabulating
table("mississippi") # doesn't work
#>
#> mississippi
#> 1
table(chars("mississippi"))
#>
#> i m p s
#> 4 1 2 4
sorting
sort("string") # doesn't work
#> [1] "string"
sort(chars("string"))
#> [1] "ginrst"
sort(chars("string"), decreasing = TRUE)
#> [1] "tsrnig"
reversing
Since these are vectors, we no longer need nchar
to
determine the length
length("string") # just the one 'string'
#> [1] 1
length(chars("string")) == nchar("string")
#> [1] TRUE
Membership tests can now determine if a given character is in the ‘string’
"i" %in% "rhythm" # doesn't work
#> [1] FALSE
"y" %in% "rhythm" # doesn't work
#> [1] FALSE
"i" %in% chars("rhythm")
#> [1] FALSE
"y" %in% chars("rhythm")
#> [1] TRUE
is.element("y", "rhythm") # doesn't work
#> [1] FALSE
is.element("y", chars("rhythm"))
#> [1] TRUE
chars
objects can be concatenated; combining two strings
produces a longer string
c("butter", "fly") # doesn't work in the character sense
#> [1] "butter" "fly"
c(chars("butter"), chars("fly"))
#> [1] "butterfly"
c(chars("butter"), chars("fly"))[c(1, 9)]
#> [1] "by"
Set operations can be useful
setdiff(chars("javascript"), chars("script"))
#> [1] "jav"
union(chars("bunny"), chars("rabbit"))
#> [1] "bunyrait"
intersect(chars("bob"), chars("rob"))
#> [1] "bo"
setequal(chars("stop"), chars("post"))
#> [1] TRUE
setequal(chars("stop"), chars("posit"))
#> [1] FALSE
unique(chars("mississippi"))
#> [1] "misp"
Since chars
objects are regular vectors, they continue
to work with other vectorised operations
rev(toupper(chars("string")))
#> [1] "GNIRTS"
toString(chars("abc"))
#> [1] "a, b, c"
Filter(\(x) x != "a", "banana")
#> [1] "banana"
Filter(\(x) x != "a", chars("banana"))
#> [1] "bnn"
This last example motivates a non-set-wise way to exclude some
characters, so this package introduces a new except
function
except(chars("javascript"), chars("script"))
#> [1] "java"
except(chars("carpet"), chars("car"))
#> [1] "pet"
except(chars("banana"), "a")
#> [1] "bnn"
except(chars("banana"), chars("a"))
#> [1] "bnn"
Anywhere a vector of individual character works, a chars
object should also work