--- title: "Introduction to osum" output: # word_document: # toc: true # pdf_document: # toc: true html_document: toc: true toc_float: true theme: flatly vignette: > %\VignetteIndexEntry{Introduction to osum} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r init,include=FALSE,purl=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Motivation Before ***R*** became available, I was a heavy user of the S-PLUS™ software, a commercial declination of ***R***'s ancestor ***S***. It had a very practical function called `objects.summary()`, which would list objects from an environment in a tabular form (basically as a `data.frame`) with some interesting attributes including class, mode, dimensions, and size. I couldn't find its equivalent in ***R***, so I wrote one 😊 ## Installation ### Stable version You can install the current stable version of `osum` from [CRAN](https://cran.r-project.org/): ```{r install.CRAN,eval=FALSE,purl=FALSE} install.packages("osum") ``` Windows and macOS binary packages are available from here. ### Development version You can install the development version of `osum` including latest features from [GitHub](https://github.com/zivankaraman/osum): ```{r install.GitHub,eval=FALSE,purl=FALSE} require(remotes) install_github("zivankaraman/osum") ``` ```{r setup,include=FALSE} library(osum) ``` ## Basic Usage First, we need to populate the session environment with a few objects. ```{r populate} a <- month.name b <- sample(c("FALSE", "TRUE"), size = 5, replace = TRUE) cars <- mtcars .hidden <- -1L .secret <- "Shhht!" x1 <- rnorm(n = 10) x2 <- runif(n = 20) x3 <- rbinom(n = 30, size = 10, prob = 0.5) lst <- list(first = x1, second = x2, third = x3) fun <- function(x) {sqrt(x)} ``` By default, the environment of the call to `objects.summary` is used, here `.GlobalEnv`. ```{r defaul} objects.summary() ``` The hidden objects are not shown by default. One has to provide argument `all.objects=TRUE` to see them (not unlike the `all.names` argument to the `ls` function) ```{r hidden} objects.summary(all.objects = TRUE) ``` If the `objects.summary` is called inside the function, it is the calling function's environment that is used by default. ```{r inside.function} # shows an empty list because inside myfunc no variables are defined myfunc <- function() {objects.summary()} myfunc() # define a local variable inside myfunc myfunc <- function() {y <- 1; objects.summary()} myfunc() ``` ### Restricting the Objects List We can limit the output to objects with names matching the regular expression provided as the `pattern` argument. Alternatively, we can provide a character vector naming objects to summarize in the `names` argument. ```{r pattern} objects.summary(pattern = "^x") objects.summary(names = c("a", "b")) ``` ## Where to Look for Objects? We can list the objects from any environment, not just the current environment. The environment can be provided as an integer indicating the position in the search list or a character giving the name of an environment in the search list. ```{r where} idx <- grep("package:graphics", search()) objects.summary(idx, pattern = "^plot") objects.summary("package:graphics", pattern = "^plot") ``` We can also explicitly provide an environment. ```{r environment} e <- new.env() e$a <- 1:10 e$b <- rnorm(25) e$df <- iris e$arr <- iris3 objects.summary(e) ``` ```{r clean.environment,include=FALSE} rm(e, myfunc) ``` Unless an explicit environment is provided, `where` argument should designate an element of the search list. However, if it is a character of the form "*package:pkg_name*" and if the package named "*pkg_name*" is installed, it is silently loaded, its objects retrieved, and then it is unloaded when the function exits. Depending on the time it takes to load the package, the execution might be slower than getting the information about an attached package. ```{r package} # check if the package foreign is attached length(grep("package:foreign", search())) > 0L objects.summary("package:foreign", pattern = "^write") # check if the package foreign is attached length(grep("package:foreign", search())) > 0L ``` ## Selecting Information to Display We don't need to display all the attributes, the `what` argument controls which information is returned. Partial matching is used, so only enough initial letters of each string element are needed to guarantee unique recognition. For example, "`data[.class]`", "`stor[age.mode]`", "`ext[ent]`", "`obj[ect.size]`". ```{r what} objects.summary(what = c("data.class", "storage.mode", "extent", "object.size")) objects.summary(what = c("data", "stor", "ext", "obj")) ``` In fact, just providing the first letter is sufficient, since all the possible values start with a different letter. The order of columns in the summary respects the order in which their names are listed in the `what` argument. ```{r what.order} objects.summary(what = c("m", "s", "t", "o", "d", "e")) ``` It should be noted that attributes `storage.mode`, `mode`, and `typeof` are somewhat redundant, so you can select only those that are relevant to you. You can set your personal preferences using the `osum.options` function, as explained in [Options]. ## Filtering Objects The subset of objects from the environment `where` which should be selected for summary is specified with either an explicit vector of names provided in argument `names`, or with some combination of the subsetting criteria `pattern` (as seen in [Restricting the Objects List]), `data.class`, `storage.mode`, `mode`, and `typeof`. If argument `names` is given, the other criteria are ignored. If more than one criterion is given, only objects which satisfy all of them are selected. In the absence of both `names` and criteria, all objects in `where` are selected. ```{r filter} objects.summary("package:datasets", pattern = "^[sU]", what = c("dat", "typ", "ext", "obj"), data.class = c("data.frame", "matrix")) ``` Objects can have more than one class, but only the first class element is used by default. Specifying `all.classes=TRUE` allows to consider the entire class vector of an object, both in selection based on argument `data.class` and in the returned summary. ```{r filter.all.classes} objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), data.class = "array") objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), all.classes = TRUE, data.class = "array") ``` Besides simple filtering criteria by values of attributes, we can also filter on logical expression indicating elements (rows) to keep. The expression is evaluated in the data frame with object attributes, so columns should be referred to (by unquoted attribute name) as variables in the expression (not unlike the `select` argument of the base `subset` function). This can be particularly helpful when we want to exclude some values, avoiding explicit listing of all other (possible) values, as shown in the example below. ```{r filter.expression} objects.summary("package:grDevices", filter = mode != "function") ``` The filter expression can involve more than one attribute. ```{r filter.expression2} objects.summary("package:datasets", filter = mode != storage.mode)[1:10, ] ``` It can also be quite complex, as long as it yields a logical value for every object (row). ```{r filter.expression3} objects.summary("package:datasets", all.classes = TRUE, filter = sapply(data.class, length) > 2L) ``` ## Sorting Objects By default, the object entries (printed as rows) in the summary are sorted alphabetically by object name. By providing the `order` argument, they can be sorted on any other column(s). The `order` argument should be (unquoted) column names. For numeric columns, one can precede the name by "-" to sort in descending order, with the expression enclosed in parentheses (see examples). To sort on more than one column, the expression must be provided as a vector `c(., .)` (again see examples). Feature inspired by the standard ***R*** `order` function. ```{r sort} # filter on 'mode' and sort on 'data.class' objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), mode = "numeric", order = data.class)[1:10, ] # filter on 'mode' and sort (descending) on 'object.size' objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), mode = "numeric", order = (-object.size))[1:10, ] objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), order = c(data.class, -object.size))[1:10, ] ``` It should be noted that although the `extent` is by default printed (by the specific print method for objects of class `objects.summary`) as a product of dimensions (d1 x d2), it is internally stored as a list, which allows sorting on a number of rows or columns, for example. ```{r sort.extent} # get all two-dimensional objects of from the datasets package, with more than 7 columns, # sorted by number on columns (ascending) and then on number of rows (descending) objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), filter = sapply(extent, length) == 2L & sapply(extent, "[", 2L) > 7L, order = c(sapply(extent, "[", 2L), -sapply(extent, "[", 1L))) ``` The entries are sorted in ascending order by default. They can be sorted in descending order by specifying `reverse=TRUE`. ```{r sort.reverse} # get five biggest objects from package datasets objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj"), reverse=TRUE)[1:10, ] ``` *It should be noted that the objects in the summary can be filtered and/or sorted by the columns that will **not** be part of the summary (i.e. are not listed in the `what` argument).* ```{r sort.cols} objects.summary("package:datasets", what = c("dat", "typ", "ext"), pattern = "st", filter = mode %in% c("list", "numeric"), order = object.size) ``` ## Printing and Summarizing The `objects.summary` function creates an object of class `objects.summary`, which is an extension of the `data.frame` class. The purpose of this class is being able to propose custom `print` and `summary` methods. The number of rows printed can be limited by the `max.rows` argument, which allows more straightforward control than the `max` argument of the `print.data.frame`. When `all.classes` argument is set to `TRUE`, the entire class vector is returned, and the `data.class` column is a list of character vectors. When such data is printed, the output is limited to a fixed number of characters (12 by default), longer strings being shown as e.g. "matrix, ..." or "nfnGroup....". The `data.class.width` argument to the `print` method allows users to change this value (probably to increase it), in order to see (almost) all the classes. ```{r print} os <- objects.summary("package:datasets", what = c("dat", "ext", "obj"), all.classes = TRUE, order = object.size, reverse = TRUE) print(os, data.class.width = 25, max.rows = 12) multi_class_objects <- row.names(objects.summary("package:datasets", all.classes = TRUE, filter = sapply(data.class, length) > 1L)) os <- objects.summary("package:datasets", names = multi_class_objects, all.classes = TRUE, what = c("dat", "ext", "obj")) print(os, data.class.width = 32, max.rows = 12) ``` As already mentioned in [Sorting Objects], the `extent` column is internally stored as a list, and we can explicitly control how it is printed by the `format.extent` argument. ```{r print.extent} multi_dim_objects <- row.names(objects.summary("package:datasets", all.classes = TRUE, data.class = c("array", "table"))) os <- objects.summary("package:datasets", names = multi_dim_objects, what = c("dat", "ext", "obj")) print(os[rev(order(sapply(os$extent, length))), ], format.extent = TRUE, max.rows = 12) # default print(os[rev(order(sapply(os$extent, length))), ], format.extent = FALSE, max.rows = 12) ``` Other options can be passed down to the `print.data.frame` function (not necessarily very useful). ```{r print.pass.down} print(objects.summary("package:datasets", what = c("dat", "typ", "ext", "obj")), format.extent = TRUE, max.rows = 12, right = FALSE, quote = TRUE) ``` The `summary` method shares the same specific arguments as the `print` except for `max.rows`. ```{r summary} os <- objects.summary("package:datasets", all.classes = TRUE, what = c("dat", "ext", "obj"), filter = sapply(data.class, length) > 1L) summary(os, data.class.width = 32, format.extent = FALSE) ``` Again, other options can be passed down to the `summary.data.frame` function. ```{r summary.pass.down} summary(os, data.class.width = 32, maxsum = 10, quantile.type = 5) ``` ## Options There are a few custom options dedicated to the package. The function `osum.options`, crafted after the `base` package `options`, allows the user to set and examine them. The custom options mainly allow for providing the default values for the specific arguments to the `print` and `summary` methods (`data.class.width`, `format.extent`, and `max.rows`), as seen in [Printing and Summarizing]. ```{r options} # see all current options osum.options() ``` ```{r options.set} # set some values old_opt <- osum.options(osum.data.class.width = 12, osum.max.rows = 25) # previous values of the changed 'osum' options old_opt ``` It is also possible to select what information will be returned by default by the function `objects.summary`. It must be a subset of `c("data.class", "storage.mode", "mode", "typeof", "extent", "object.size")`, partial matching is allowed. ```{r options.info} # set which attributes are retrieved by default osum.options(osum.information = c("dat", "mod", "ext", "obj")) # get the current value of the option osum.options("osum.information") # if the argument 'what' is not specified, the new default values are used objects.summary("package:base", filter = data.class != "function") ```



*Created on `r format(Sys.Date(), "%Y-%m-%d")`.*