Important Miscellany

2024-01-31

The Importance of this miscellany

The features of strex that were deemed the most interesting have been given their own vignettes. However, the package was intended as a miscellany of useful functions, so the functions demonstrated here encapsulate the spirit of this package, i.e. functions that save R string manipulators time.

library(strex)

Could this be numeric?

Sometimes you don’t want to know whether something is numeric, just whether or not it could be. Now you can find out with str_can_be_numeric().

str_can_be_numeric(c("1a", "abc", "5", "2e7", "seven"))
#> [1] FALSE FALSE  TRUE  TRUE FALSE

Currency

To get currencies and amounts mentioned in strings, there are str_extract_currencies() and str_nth_currency(), str_first_currency() and str_last_currency(). str_first_currency() just returns the first currency amount. str_last_currency() returns the last. str_nth_currency() allows you to get the second, third and so on. str_extract_currencies() returns all currency amounts mentioned in a string.

string <- c("Alan paid £5", "Joe paid $7")
str_first_currency(string)
#>   string_num       string curr_sym amount
#> 1          1 Alan paid £5        £      5
#> 2          2  Joe paid $7        $      7
string <- c("€1 is $1.17", "£1 is $1.29")
str_nth_currency(string, n = c(1, 2))
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        €   1.00
#> 2          2 £1 is $1.29        $   1.29
str_last_currency(string) # only gets the first mentioned
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        $   1.17
#> 2          2 £1 is $1.29        $   1.29
str_extract_currencies(string)
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        €   1.00
#> 2          1 €1 is $1.17        $   1.17
#> 3          2 £1 is $1.29        £   1.00
#> 4          2 £1 is $1.29        $   1.29

Extract a single element of a string

This is a simple wrapper around stringr::str_sub().

string <- "abcdefg"
str_sub(string, 3, 3)
#> [1] "c"
str_elem(string, 3) # simpler and more exressive
#> [1] "c"

Extract numbers and non-numeric elements

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_extract_numbers(string)
#> [[1]]
#> [1] 1 2 3
#> 
#> [[2]]
#> [1]  7  8 99
str_extract_non_numerics(string)
#> [[1]]
#> [1] "aa"  "bbb" "ccc"
#> 
#> [[2]]
#> [1] "xyz"      "ayc"      "jzk"      "elephant"

Split a string by its numbers

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_split_by_numbers(string)
#> [[1]]
#> [1] "aa"  "1"   "bbb" "2"   "ccc" "3"  
#> 
#> [[2]]
#> [1] "xyz"      "7"        "ayc"      "8"        "jzk"      "99"       "elephant"

Force a file name to have an extension

We can give files a given extension, leaving them alone if they already have it.

string <- c("spreadsheet1.csv", "spreadsheet2")
str_give_ext(string, "csv")
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

If the file already has an extension, we can append one or replace it.

str_give_ext(string, "xls") # append
#> [1] "spreadsheet1.csv.xls" "spreadsheet2.xls"
str_give_ext(string, "csv", replace = TRUE) # replace
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

Strip away a file extension

string <- c("spreadsheet1.csv", "spreadsheet2")
str_before_last_dot(string)
#> [1] "spreadsheet1" "spreadsheet2"

Remove quoted bits from a string

string <- "I hate having these \"quotes\" in the middle of my strings."
cat(string)
#> I hate having these "quotes" in the middle of my strings.
str_remove_quoted(string)
#> [1] "I hate having these  in the middle of my strings."

Split camel case

I’m not mad on CamelCase, I often want to deconstruct it.

string <- c("CamelVar1", c("CamelVar2"))
str_split_camel_case(string)
#> [[1]]
#> [1] "Camel" "Var1" 
#> 
#> [[2]]
#> [1] "Camel" "Var2"

Convert a string to a vector

This is something I did a lot to avoid using regular expression. Don’t do it for that purpose. Learn regex. https://regexone.com/ is a very good start.

string <- "R is good."
str_to_vec(string)
#>  [1] "R" " " "i" "s" " " "g" "o" "o" "d" "."

Trim anything, not just whitespace

What if something is needlessly surrounded by parentheses and we want to get rid of them?

string <- "(((Why all the parentheses?)))"
string %>%
  str_trim_anything(coll("("), side = "left") %>%
  str_trim_anything(coll(")"), side = "r")
#> [1] "Why all the parentheses?"

Remove duplicated bits of strings

string <- c("I often write the word *my* twice in a row in my my sentences.")
str_singleize(string, " my")
#> [1] "I often write the word *my* twice in a row in my sentences."