--- title: "Solver modes: cubic, trust-region fallback, and quasi-Newton polish" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Solver modes: cubic, trust-region fallback, and quasi-Newton polish} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(arcopt) ``` `arcopt` solves its subproblem in one of three modes, chosen adaptively at runtime: | Mode | Subproblem | Targeted regime | |---|---|---| | `cubic` | Eigendecomposition cubic regularization (default) | Indefinite Hessians, saddle points | | `tr` | Trust-region with adaptive radius | Flat ridges where the cubic penalty has decayed to its floor | | `qn_polish` | Wolfe-safeguarded BFGS line search (opt-in) | Strongly convex healthy basins where `hess()` calls are wasted | The default configuration runs in `cubic` mode and switches *only* if its detector signals trigger. The trust-region fallback is enabled by default but dormant on well-behaved problems. Polish mode is opt-in via `control = list(qn_polish_enabled = TRUE)`. Detector thresholds are documented in `?arcopt_advanced_controls`. The `result$diagnostics` sublist reports which mode the run finished in and how many transitions occurred: * `solver_mode_final` -- `"cubic"`, `"tr"`, or `"qn_polish"` * `ridge_switches` -- count of cubic-to-TR fallback firings * `qn_polish_switches`, `qn_polish_reverts` -- bidirectional polish transition counts * `hess_evals_at_polish_switch` -- Hessian-evaluation count at the first polish transition ## Mode 1: cubic regularization (default) The cubic subproblem eigendecomposes the true Hessian and writes its step in the eigenbasis. A single negative eigenvalue is enough to produce a step in the negative-curvature direction, so saddle escape with the true Hessian is automatic. As a small example, consider a two-component Gaussian mixture likelihood started from a symmetric near-saddle point: ```{r mixture-saddle} set.seed(42) y <- c(rnorm(100, -2, 1), rnorm(100, 2, 1)) mixture_nll <- function(theta) { -sum(log(0.5 * dnorm(y, theta[1], 1) + 0.5 * dnorm(y, theta[2], 1))) } mixture_gr <- function(theta) { p1 <- 0.5 * dnorm(y, theta[1], 1) p2 <- 0.5 * dnorm(y, theta[2], 1) w1 <- p1 / (p1 + p2) w2 <- p2 / (p1 + p2) -c(sum(w1 * (y - theta[1])), sum(w2 * (y - theta[2]))) } mixture_hess <- function(theta, h = 1e-5) { d <- 2L h_mat <- matrix(0, d, d) for (i in seq_len(d)) { e_i <- numeric(d) e_i[i] <- h h_mat[, i] <- (mixture_gr(theta + e_i) - mixture_gr(theta - e_i)) / (2 * h) } 0.5 * (h_mat + t(h_mat)) } res <- arcopt(c(0.01, 0.01), mixture_nll, mixture_gr, mixture_hess, control = list(trace = 0)) res$par # near (-2, 2) modulo label permutation res$diagnostics$solver_mode_final min(eigen(res$hessian, only.values = TRUE)$values) # > 0: local min ``` Started from the symmetric point near the origin, BFGS-class methods land at the saddle and report convergence; arcopt's cubic mode steps in the negative-curvature direction and recovers the class means. ## Mode 2: trust-region fallback (default-on, dormant) Cubic regularization can stall in *flat-ridge* regimes -- iterations where the regularization parameter `sigma_k` has been driven to its floor, the step-acceptance ratio `rho_k` is near 1, and yet the gradient is bounded away from zero because the Hessian is positive definite but near-singular in some weakly identified subspace. arcopt detects this with a sliding-window check on four signals and, on a hold of ten consecutive iterations, swaps the cubic subproblem for a trust-region subproblem. The transition is one-way and default-on. For well-conditioned problems the detector simply does not fire, and the run terminates in cubic mode: ```{r tr-dormant} rosen_fn <- function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2 rosen_gr <- function(x) { c(-2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2), 200 * (x[2] - x[1]^2)) } rosen_hess <- function(x) { matrix(c(1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1], -400 * x[1], 200), 2, 2) } res <- arcopt(c(-1.2, 1), rosen_fn, rosen_gr, rosen_hess, control = list(trace = 0)) res$diagnostics$solver_mode_final res$diagnostics$ridge_switches ``` The fallback's value shows up on problems with weakly identified parameter subspaces -- e.g., Dirichlet-process-style mixtures with unused stick-breaking segments, where the cubic-only configuration exhausts its iteration budget at a strongly indefinite saddle but the trust-region fallback fires once and reaches a true local minimum. That regime is too data- and dependency-heavy for an inline example; see the JSS manuscript referenced from the package README for a worked case. If you want to disable the fallback explicitly: ```{r, eval = FALSE} arcopt(x0, fn, gr, hess, control = list(tr_fallback_enabled = FALSE)) ``` ## Mode 3: quasi-Newton polish (opt-in) In a strongly convex healthy basin where the cubic regularization parameter has dropped to its floor and the Hessian is essentially constant across iterates, every `hess()` call recomputes second derivatives that contribute negligibly to step quality. Polish mode recognizes this regime via a five-signal detector and replaces the cubic subproblem with a Wolfe line search along a BFGS-approximated Newton direction, suspending further `hess()` calls until either convergence or a revert. Polish is *opt-in* (`qn_polish_enabled = FALSE` by default). Enable it for long-running smooth problems with expensive Hessians (analytic AD via Stan, finite differences, or any pipeline where each second derivative takes appreciable wall-clock time): ```{r polish} fn <- function(x) sum(0.5 * x^2 + x^4 / 12) gr <- function(x) x + x^3 / 3 hess <- function(x) diag(1 + x^2) x0 <- rep(10, 5) res_default <- arcopt(x0, fn, gr, hess, control = list(maxit = 50, trace = 0)) res_polish <- arcopt(x0, fn, gr, hess, control = list(maxit = 50, trace = 0, qn_polish_enabled = TRUE)) c(default_hess = res_default$evaluations$hess, polish_hess = res_polish$evaluations$hess) res_polish$diagnostics$solver_mode_final # "qn_polish" res_polish$diagnostics$qn_polish_switches # 1 ``` On this five-dimensional smooth-convex problem the polish detector fires once after the initial cubic-Newton approach steps, and the remaining iterations consume zero `hess()` calls -- typically a saving of 30-50% of the Hessian budget at no loss of accuracy. ## Choosing modes The defaults are designed so most users do not need to touch the mode switches. The cubic mode handles indefiniteness and saddle escape; the trust-region fallback handles flat-ridge stagnation; and polish is reserved for problems where `hess()` is genuinely expensive and the basin is large enough for the detector window to fill. A practical heuristic: * If the iterate enters a saddle and stalls -- you have the right default. Make sure `hess()` returns the *true* Hessian, not a positive-definite proxy. * If `arcopt` exhausts its iteration budget on a problem that should be locally well-conditioned -- inspect `result$diagnostics$ridge_switches`. If it is 0 and the gradient is bounded away from zero, you may be in a flat-ridge regime; the TR-fallback should already be on by default. * If a Hessian-supplied run is fast in clock time but the per-call Hessian is expensive (Stan AD, FD), enable `qn_polish_enabled = TRUE` and check `result$diagnostics$qn_polish_switches`. For any of these cases the diagnostic fields tell you which mode actually carried the run, and the per-mode controls in `?arcopt_advanced_controls` provide the tuning surface.