--- title: "Tolerance" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tolerance} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(discretes) ``` Many operations in the discretes package involve comparing a *queried* numeric value to the discrete values in a numeric series. Because most decimal fractions are not represented exactly in floating point, values that "should" match can differ by a tiny amount after arithmetic. In this package, `tol` is an **absolute tolerance** used when deciding whether a query value is "close enough" to a discrete value in the series. - Setting **`tol = 0`** requests **exact** comparisons. - The default is `sqrt(.Machine$double.eps)`, which is small but usually enough to ignore round-off noise. ## Where is `tol` actually used? Tolerance is not applied to the series you see at the top level. It is **passed down** through any nested or transformed series and used only when we check whether a value is in the series at the underlying level. The two places where it is used are: - **Arithmetic progressions** (`arithmetic()`): we decide if a value is in the series using the implied step index \((x - representative) / spacing\). Because of floating point, this index is rarely an exact integer (e.g. `(0.3 - 0) / 0.1`). A value is treated as a discrete value if that index is within `tol` of an integer. - **Numeric-vector-based series** (plain numeric vectors or `as_discretes()`): we decide if a value is in the series by checking whether it is within `tol` of any stored discrete value. For transformed series (e.g. `1 / arithmetic()`), the queried value is mapped back to the underlying series and the same logic is applied there with the same `tol`; no separate tolerance is applied to the transformed values. ## Example: `tol` with `arithmetic()` Without a tolerance, values can be hard to recognize as discrete values in an arithmetic series because the implied index might be `2.999999999` instead of `3`. ```{r} x <- arithmetic(representative = 0, spacing = 0.1) has_discretes(x, 0.3) has_discretes(x, 0.3, tol = 0) ``` ## Example: `tol` with an explicit numeric series Even if a series is represented as a numeric vector, you can still run into tiny floating-point mismatches. ```{r} v <- c(0, 0.1, 0.2, 0.1 * 3) # last entry is not exactly 0.3 has_discretes(v, 0.3) has_discretes(v, 0.3, tol = 0) ``` If you want "mathematical series" behaviour for an explicit series, use `tol = 0`. If you want "numerical computing" behaviour (robust to round-off), keep `tol` positive. ## Example: preserving numeric vectors When a numeric vector is converted to a "discretes" object via `as_discretes()`, the root numeric vector is preserved, even if the series gets transformed. Consider transforming a numeric vector vs. that same vector expressed as a "discretes" object. ```{r} raw <- 10^(-5:5) raw preserved <- 10^as_discretes(-5:5) preserved ``` The numeric vector gets transformed right away, whereas it remains as the base series when transformed as a "discretes" object. This has implications because errors are propagated differently. For example, an exponent slightly off by `1e-10` results in a more significant error after transformation that may not be within the default `tol`: ```{r} # Exponent slightly off from `5` q <- 10^(5 - 1e-10) # Error magnitude after transformation 10^5 - q ``` This means that "is this value in the series?" fails for the eagerly transformed vector, but not when the vector is preserved. This is because tolerance is applied to the root series. ```{r} has_discretes(raw, q) has_discretes(preserved, q) ``` ## Choosing a `tol` - **Default is usually fine**: `sqrt(.Machine$double.eps)` is intended to smooth over tiny floating-point noise. - **Use `tol = 0` for exactness**: when you truly want explicit sets to behave exactly, perhaps useful in some situations when working with 'integer' storage mode. - **Increase `tol` for noisy values**: if your values come from computation or measurement and you are noticing values incorrectly treated as not in the series, increase `tol` accordingly.