--- title: "Logging" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Logging} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(crew) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `crew` facilitates computationally demanding workloads that consume a lot of memory and CPU resources, and workloads that exhaust these resources may stall or crash. Oftentimes, a crash is silent and there are no clues for troubleshooting. But if you proactively monitor `crew` processes in log files, you can keep track of resource usage and generated prospective data which is useful in diagnosing problems. This vignette describes best practices for logging. # Processes `crew` runs in 3 different kinds of R processes: 1. The local R session from which you call `crew_controller_local()` or similar. 2. The `mirai` dispatcher which sends tasks to workers. The dispatcher starts on `controller$start()` and terminates on `controller$terminate()` or when the local R session closes. The dispatcher always runs on the same machine as (1). 3. The workers that accept tasks from (2). These workers may run on different machines, e.g. on a cluster or the cloud, depending on the [launcher plugin](https://wlandau.github.io/crew/articles/plugins.html) you are using. This vignette first covers logging for local the R session and dispatcher (1 and 2), then logging for workers (3). # Logging worker processes Each [launcher plugin](https://wlandau.github.io/crew/articles/plugins.html) has its own way to turn on logging for workers. In the case of the local plugin (`crew_controller_local()`), the `crew_options_local()` has arguments to configure how messages are written to log file, and each worker instance gets its own log file in a directory you specify. Other plugins in packages such as [`crew.cluster`](https://wlandau.github.io/crew.cluster/) and [`crew.aws.batch`](https://wlandau.github.io/crew.aws.batch/) have different arguments to configure platform-specific logging mechanisms (e.g. SLURM logs or AWS CloudWatch logs). Example for the local plugin: ```r library(crew) log_directory <- tempfile() controller <- crew_controller_local( workers = 2, options_local = crew_options_local(log_directory = log_directory) ) controller$start() controller$push(print("task 1")) controller$push(print("task 2")) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE) writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 1" writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 2" ``` To add resource metrics to existing log files, regardless of your platform, use the `options_metrics` argument and set `path = "/dev/stdout"`.^[Windows does not have `/dev/stdout`, but `path` can be a directory on disk as well. In that case, `crew` will write the log files there, instead of to the existing log files specified in `options_local`.] ```r library(crew) log_directory <- tempfile() controller <- crew_controller_local( name = "a", workers = 2, options_local = crew_options_local(log_directory = log_directory), options_metrics = crew_options_metrics( path = "/dev/stdout", seconds_interval = 1 ) ) controller$start() controller$push({ Sys.sleep(2.5) print("task 1") Sys.sleep(2.5) }) controller$push({ Sys.sleep(5) print("task 2") }) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE) writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410013.006|2.100|0.210|116834304|420885200896|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410014.018|0.000|0.000|116899840|420894638080|__AUTOMETRIC__ #> [1] "task 1" #> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410015.020|0.000|0.000|116932608|420903026688|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410016.024|0.000|0.000|116932608|420903026688|__AUTOMETRIC__ writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410013.006|2.100|0.210|115671040|420750983168|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410014.014|0.000|0.000|115736576|420760420352|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410015.020|0.000|0.000|115736576|420760420352|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410016.025|0.000|0.000|115736576|420760420352|__AUTOMETRIC__ #> [1] "task 2" ``` The lines with `__AUTOMETRIC__` record resource usage metrics over the life cycle of each worker. Use [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html) to read these metrics and [`autometric::log_plot()`](https://wlandau.github.io/autometric/reference/log_plot.html) to visualize them. Below, the `time` column is elapsed time in seconds since logging began, and `resident` is memory usage in megabytes. More details are at [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html). ```r data <- autometric::log_read(log_directory) data$name <- substr(data$name, 0, 20) data #> version pid name status time core cpu resident virtual #> 1 0.0.5 43098 crew_worker_a_1_c783 0 0.000 2.1 0.21 116.8343 420885.2 #> 2 0.0.5 43098 crew_worker_a_1_c783 0 1.012 0.0 0.00 116.8998 420894.6 #> 3 0.0.5 43098 crew_worker_a_1_c783 0 2.014 0.0 0.00 116.9326 420903.0 #> 4 0.0.5 43098 crew_worker_a_1_c783 0 3.018 0.0 0.00 116.9326 420903.0 #> 5 0.0.5 43099 crew_worker_a_2_3647 0 0.000 2.1 0.21 115.6710 420751.0 #> 6 0.0.5 43099 crew_worker_a_2_3647 0 1.008 0.0 0.00 115.7366 420760.4 #> 7 0.0.5 43099 crew_worker_a_2_3647 0 2.014 0.0 0.00 115.7366 420760.4 #> 8 0.0.5 43099 crew_worker_a_2_3647 0 3.019 0.0 0.00 115.7366 420760.4 ``` ```r autometric::log_plot(data, pid = 43098, metric = "resident") ``` ![](./figures/memory.png) # Logging local processes You can use the [`autometric`](https://wlandau.github.io/autometric/) package directly to monitor local `crew` processes (including the `mirai` dispatcher). To begin, you need the process IDs of the local processes. As of `crew` version `0.9.5.9012`, every controller has a `pids()` method to get the process IDs of all the local processes. ```r library(crew) controller <- crew_controller_local() controller$start() controller$pids() #> local dispatcher-e07cd32eeac18cb9e1dd59fb #> 2104319 2104731 ``` You can pass this named vector of process IDs directly into [`autometric::log_start()`](https://wlandau.github.io/autometric/reference/log_start.html), which starts a background thread which logs resource usage metrics to the text file of your choice.^[On Mac OS, you may not get usable metrics for the dispatcher because of security restrictions on low-level system calls.] Remember to call [`autometric::log_stop()`](https://wlandau.github.io/autometric/reference/log_stop.html) when you are done (or restart R) to stop logging. ```r path <- tempfile() autometric::log_start( path = path, seconds = 1, pids = controller$pids() ) controller$push(Sys.sleep(5)) controller$wait() autometric::log_stop() controller$terminate() ``` Use [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html) to read these metrics and [`autometric::log_plot()`](https://wlandau.github.io/autometric/reference/log_plot.html) to visualize them. Below, the `time` column is elapsed time in seconds since logging began, and `resident` is memory usage in megabytes. More details are at [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html). ```r data <- autometric::log_read(path) data$name <- substr(data$name, 0, 10) data #> version pid name status time core cpu resident virtual #> 1 0.0.1 2104319 local 0 0.000 0.000 0.000 195.1007 9708.073 #> 2 0.0.1 2104731 dispatcher 0 0.000 0.000 0.000 31.7399 9514.222 #> 3 0.0.1 2104319 local 0 1.000 4.998 0.078 195.1007 9842.328 #> 4 0.0.1 2104731 dispatcher 0 1.000 0.000 0.000 31.7399 9849.901 #> 5 0.0.1 2104319 local 0 2.000 0.000 0.000 195.1007 9909.436 #> 6 0.0.1 2104731 dispatcher 0 2.000 0.000 0.000 31.7399 9984.119 #> 7 0.0.1 2104319 local 0 3.001 0.000 0.000 195.1007 9909.436 #> 8 0.0.1 2104731 dispatcher 0 3.001 0.000 0.000 31.7399 10118.337 #> 9 0.0.1 2104319 local 0 4.001 0.000 0.000 195.1007 9976.545 #> 10 0.0.1 2104731 dispatcher 0 4.001 0.000 0.000 31.7399 10185.445 ``` # In `targets` As explained at , `crew` helps `targets` orchestrate pipelines. For a `targets` pipeline, to log all local processes as in the previous section, you will need to pass an already started controller to `tar_option_set()`. To do this safely: 1. Only start the controller if you are actually running a pipeline. You can use [`targets::tar_active()`](https://docs.ropensci.org/targets/reference/tar_active.html) in `_targets.R` to check if [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) is running. 2. Do not run [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) with `callr_function = NULL`. This ensures the logging thread quits when the [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) isolated `callr` R process quits. Example `_targets.R` file: ```r library(autometric) library(crew) library(targets) library(tarchetypes) if (tar_active()) { controller <- crew_controller_local(workers = 2) controller$start() log_start( path = "log.txt", seconds = 1, pids = controller$pids() ) } tar_option_set(controller = controller) list( tar_target(name = sleep1, command = Sys.sleep(5)), tar_target(name = sleep2, command = Sys.sleep(5)) ) ```