--- title: "Logging" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Logging} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(crew) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `crew` facilitates computationally demanding workloads that consume a lot of memory and CPU resources, and workloads that exhaust these resources may stall or crash. Oftentimes, a crash is silent and there are no clues for troubleshooting. But if you proactively monitor `crew` processes in log files, you can keep track of resource usage and generated prospective data which is useful in diagnosing problems. This vignette describes best practices for logging. # Logging worker processes Each [launcher plugin](https://wlandau.github.io/crew/articles/plugins.html) has its own way to turn on logging for workers. In the case of the local plugin (`crew_controller_local()`), the `crew_options_local()` has arguments to configure how messages are written to log file, and each worker instance gets its own log file in a directory you specify. Other plugins in packages such as [`crew.cluster`](https://wlandau.github.io/crew.cluster/) and [`crew.aws.batch`](https://wlandau.github.io/crew.aws.batch/) have different arguments to configure platform-specific logging mechanisms (e.g. SLURM logs or AWS CloudWatch logs). Example for the local plugin: ```r library(crew) log_directory <- tempfile() controller <- crew_controller_local( workers = 2, options_local = crew_options_local(log_directory = log_directory) ) controller$start() controller$push(print("task 1")) controller$push(print("task 2")) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE) writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 1" writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 2" ``` To add resource metrics to existing log files, regardless of your platform, use the `options_metrics` argument and set `path = "/dev/stdout"`.^[Windows does not have `/dev/stdout`, but `path` can be a directory on disk as well. In that case, `crew` will write the log files there, instead of to the existing log files specified in `options_local`.] ```r library(crew) log_directory <- tempfile() controller <- crew_controller_local( name = "a", workers = 2, options_local = crew_options_local(log_directory = log_directory), options_metrics = crew_options_metrics( path = "/dev/stdout", seconds_interval = 1 ) ) controller$start() controller$push( name = "task1", command = { Sys.sleep(2.5) print("task 1") Sys.sleep(2.5) } ) controller$push({ Sys.sleep(5) print("task 2") }) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE) ``` ```r writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624627.805|2.500|0.250|113688576|420894687232|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624628.809|0.000|0.000|113721344|420903075840|task1|__AUTOMETRIC__ #> [1] "task 1" #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624629.815|0.000|0.000|113754112|420911464448|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624630.818|0.000|0.000|113754112|420911464448|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624631.818|0.000|0.000|113786880|420911464448|__DEFAULT__|__AUTOMETRIC__ ``` ```r writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624627.820|1.700|0.170|114835456|420760469504|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624628.821|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624629.826|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624630.828|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> [1] "task 2" ``` The lines with `__AUTOMETRIC__` record resource usage metrics over the life cycle of each worker. Use [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html) to read these metrics and [`autometric::log_plot()`](https://wlandau.github.io/autometric/reference/log_plot.html) to visualize them. Below, the `time` column is elapsed time in seconds since logging began, and `resident` is memory usage in megabytes. The `phase` column shows which task was running at the time the log was recorded. A phase of `__DEFAULT__` means the worker was idle (not running a task). More details are at [`autometric::log_read()`](https://wlandau.github.io/autometric/reference/log_read.html). ```r data <- autometric::log_read(log_directory) data$name <- substr(data$name, 0, 20) data #> version phase pid name status time core cpu resident virtual #> 1 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 0.000 2.5 0.25 113.6886 420894.7 #> 2 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 1.004 0.0 0.00 113.7213 420903.1 #> 3 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 2.010 0.0 0.00 113.7541 420911.5 #> 4 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 3.013 0.0 0.00 113.7541 420911.5 #> 5 0.0.5.9001 __DEFAULT__ 61269 crew_worker_a_1_61f7 0 4.013 0.0 0.00 113.7869 420911.5 #> 6 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 0.000 1.7 0.17 114.8355 420760.5 #> 7 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 1.001 0.0 0.00 114.9010 420769.9 #> 8 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 2.006 0.0 0.00 114.9010 420769.9 #> 9 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 3.008 0.0 0.00 114.9010 420769.9 ``` ```r autometric::log_plot(data, pid = 43098, metric = "resident") ``` ![](./figures/memory.png) # In `targets` As explained at , `crew` helps `targets` orchestrate pipelines. In addition to the `crew` workers above, `targets` has its own central process for managing the pipeline. To monitor this central process, simply write `if (tar_active()) log_start()` in `_targets.R`. This will write a separate file to record the resource usage statistics of the central process. Example: ```r library(autometric) library(crew) library(targets) library(tarchetypes) tar_option_set( controller = crew_controller_local( workers = 2, options_metrics = crew_options_metrics( path = "worker_log_directory/", # Worker logs live here. seconds_interval = 1 ) ) ) if (tar_active()) { log_start( path = "main_process.txt", # Statistics on the main process go here. seconds = 1 ) } list( tar_target(name = sleep1, command = Sys.sleep(5)), tar_target(name = sleep2, command = Sys.sleep(5)) ) ```