--- title: "Using nlmixr2save" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using nlmixr2save} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `nlmixr2save` focuses on two related problems: 1. Saving `nlmixr2` results in a format that stays readable outside of `.rds`. 2. Reusing expensive fits or simulations when nothing important has changed. ## Why save a fit this way? `saveFit()` writes a saved fit as a collection of files and then optionally zips them together. In practice, most of the saved object is reconstructed from: - `.R` files for model definitions and objects that can be recreated as source. - `.csv` files for tabular fit components and datasets. That means the saved fit is largely inspectable outside of R, and it is not tied to the binary serialization format used by a specific version of `nlmixr2` or `rxode2`. ```{r, eval = FALSE} library(nlmixr2est) library(nlmixr2data) library(nlmixr2save) fit <- nlmixr2(one.cmt, theo_sd, est = "focei") saveFit(fit, "fit") restored_fit <- loadFit("fit") ``` When `loadFit("fit")` runs, it recreates the object by sourcing the generated `.R` (for fit this would be `fit.R`) files and reading the generated `.csv` files back in. This is the main protection against a saved fit becoming unreadable simply because an internal serialization format changes. ## What gets restored? For deterministic estimation methods, the restored object includes the full saved fit, including the original model, fit results, and `origData`. This is especially useful for long-running estimation jobs because the saved fit can be rebuilt without repeating the estimation itself. ## Dataset-aware caching This `saveFit()` can automatically be performed and cached with a new operator `:=` and is the quickest way to save nlmixr2 fits (and other items). For example: ```{r, eval = FALSE} fit := nlmixr2(one.cmt, theo_sd, est = "focei") ``` For `nlmixr2` fits, the cache key is based on: - the normalized model definition, - the estimation method, - the control and table options, and - a simplified version of the dataset that keeps the standard estimation columns, covariates, and requested `table$keep` columns. This has two important consequences. ### 1. Irrelevant dataset changes do not force a refit If the new dataset only changes columns that are not used for estimation, `nlmixr2save` restores the existing fit instead of rerunning the estimation. ```{r, eval = FALSE} fit := nlmixr2(one.cmt, theo_sd, est = "focei") theo_sd_extra <- theo_sd theo_sd_extra$.ignored <- "notes" fit := nlmixr2(one.cmt, theo_sd_extra, est = "focei") ``` The expensive estimation is skipped, but the restored object still updates `fit$origData` to the new dataset. In other words, the cached fit is reused only when the meaningful estimation inputs match, while the saved object still tracks the latest original dataset you supplied. ### 2. Meaningful dataset changes do force a refit If you change a column that matters to the fit, such as `DV`, time, dosing information, a covariate, or a `table$keep` column, the cache key changes and the estimation is run again. This is the intended safety boundary: harmless dataset changes are absorbed, but real estimation changes invalidate the cache. ## Using `:=` for long-running fits and simulations The `:=` operator caches the result under the object name on the left-hand side. If the saved result matches the current call, it restores the cached object instead of rerunning the call. ```{r, eval = FALSE} fit := nlmixr2(one.cmt, theo_sd, est = "focei") # creates fit.zip # Same call: restore from cache fit := nlmixr2(one.cmt, theo_sd, est = "focei") # Different estimation method: rerun fit := nlmixr2(one.cmt, theo_sd, est = "saem") # overwrites fit.zip ``` For deterministic `nlmixr2` fits, the cached form is the text-and-csv-based fit bundle described above. For other functions, the cached form is usually an `.rds` file. ## Seed-aware restores for stochastic work Some calculations depend on the random-number stream. For those, `:=` stores both the result and random-state metadata. When the result is restored, the seed is advanced to the same post-run state so downstream code sees the same random stream it would have seen if the expensive call had actually run. This is what makes `:=` useful for long-running simulations and stochastic estimation methods: you can restore the result without silently changing the reproducibility of the rest of the script. ## Integrating `nlmixr2save` into your package There are two common integration paths. ### Deterministic estimation If your package returns standard `nlmixr2` fit objects through a deterministic estimation method, users can already write: ```{r, eval = FALSE} fit := nlmixr2(model, data, est = "focei") # saves fit ``` and get zip-based save/restore behavior automatically. ### Stochastic estimation or simulation If your package provides a stochastic workflow, use the seed-aware path instead. For plain simulation functions, register the function name with `saveFitRandom()`. For `nlmixr2` estimation methods, mark the estimator as random so `:=` knows to use the seed-aware cache path. The companion vignette `vignette("register-simulation-functions", package = "nlmixr2save")` shows both patterns. ## Limitations `nlmixr2save` is intentionally conservative, and a few limitations are worth keeping in mind: 1. The saved fit is primarily `.R` and `.csv`, but not exclusively. Some fit components still fall back to `.rds` when they cannot be safely recreated as text. 2. Cache reuse only works when `:=` sees the expensive call directly. Wrapping the call inside something like `suppressMessages(nlmixr2(...))` forces the call to run before caching can intercept it. 3. If dataset simplification cannot be computed, caching falls back to hashing the full dataset. That is safe, but it can cause more reruns than strictly necessary. 4. Seed-aware restores require the same starting random state. If the seed is different, the cached stochastic result is discarded and rerun. 5. The cache files are named from the left-hand-side object name and written in the current working directory, so project-level file management still matters.