--- title: "Interval Indices" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Interval Indices} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") if(requireNamespace("pkgload", quietly = TRUE)) { pkgload::load_all(".", quiet = TRUE) } else if(requireNamespace("Immutables", quietly = TRUE)) { library(Immutables) } else { stop("Need either installed 'Immutables' or the 'pkgload' package to render this vignette.") } ``` ## Interval index basics `interval_index` is a persistent structure that associates elements with interval endpoints and supports fast interval-relation queries. Intervals are stored sorted by start endpoint, with stable first-in-first-out (FIFO) ordering for ties. All operations are persistent, and return modified copies. Peeking returns the stored element (which may be any type), popping returns a list with `$value` containing the stored element, `$start` and `$end` the interval endpoints, and `$remaining` the rest of the index without that element. ```{r} ix <- interval_index( "A", "B", "C", start = c(1, 2, 4), end = c(3, 4, 5) ) ix ``` The `as_interval_index()` variant builds an index from a vector or list of elements paired with start/end vectors, useful when endpoints are already in separate vectors. ```{r} ix2 <- as_interval_index(c("phase1", "phase2", "phase3"), start = c(1, 3, 2), end = c(4, 5, 6)) ix2 ``` `as_flexseq(ix)` returns a payload-only `flexseq`; interval bounds and any custom monoids are dropped (see `as.list` to convert and preserve interval metadata). ## Point and interval queries `peek_point()` returns the *first* (in FIFO order) element whose interval contains a query point; `peek_all_point()` returns all matches as an `interval_index` slice. ```{r pointops} peek_point(ix, point = 2) peek_all_point(ix, point = 2) ``` `pop_point()` removes and returns the first match as `$value`/`$start`/`$end`/`$remaining`. Its "all" counterpart `pop_all_point()` returns `$elements` (an interval_index of removed matches) and `$remaining`. ```{r} res <- pop_point(ix, 2) res$value res$start res$end res$remaining ``` By default the four `*_point()` functions test interval containment. The `match_at` parameter lets them match instead on coordinate equality at the entry's start, end, or either endpoint. This is useful for sweep-line patterns where you need to identify intervals arriving or retiring at a specific event point. `bounds` is ignored for these modes (coordinate equality is structural). ```{r} # Entries whose start coordinate equals 2 peek_all_point(ix, point = 2, match_at = "start") # Entries whose end coordinate equals 3 — the sweep-line retirement query peek_all_point(ix, point = 3, match_at = "end") # Either endpoint equals 3 peek_all_point(ix, point = 3, match_at = "either") ``` Interval-vs-interval queries select overlaps of different kinds. First we'll build a new index to query: ```{r} ix3 <- interval_index( "narrow", "wide", "right", "inner", start = c(2, 1, 4, 3), end = c(3, 6, 5, 4) ) ix3 ``` `peek_overlaps()` returns any interval that shares at least one point with the query range. `peek_containing()` returns intervals that fully contain the query range. `peek_within()` returns intervals fully contained by the query range. ```{r} # overlaps [2, 5]: any shared point peek_all_overlaps(ix3, start = 2, end = 5) # containing [3, 4]: index interval must enclose query peek_all_containing(ix3, start = 3, end = 4) # within [1, 5]: index interval must fit inside query peek_all_within(ix3, start = 1, end = 5) ``` All relation queries have symmetric `pop_*` and `pop_all_*` variants following the same return contract as `pop_point()` and `pop_all_point()`. ## Boundary modes Endpoint inclusion is controlled by two related parameters. The constructors take `default_query_bounds` — one of `"[)"` (default, left-closed right-open), `"[]"` (closed), `"()"` (open), or `"(]"` (left-open right-closed) — which sets the convention used when queries on this index don't specify their own. Query functions (`peek_*` / `pop_*`) then accept a `bounds` override for the duration of a single call. ```{r} edge <- interval_index("E", start = 1, end = 3, default_query_bounds = "[)") # right endpoint excluded by default peek_point(edge, 3) # override to closed bounds for this query peek_point(edge, 3, bounds = "[]") ``` Note that bounds affect what counts as a valid interval: under `"[)"`, an interval with `start == end` like [2, 2) is empty and will never match any query. Use `"[]"` if you need point intervals where `start == end`. ```{r} # [2, 2) contains nothing under half-open bounds pt_open <- interval_index("X", start = 2, end = 2, default_query_bounds = "[)") peek_point(pt_open, point = 2) # [2, 2] contains exactly the point 2 pt_closed <- interval_index("X", start = 2, end = 2, default_query_bounds = "[]") peek_point(pt_closed, point = 2) ``` ## Insertion and endpoint extrema `insert()` adds an element preserving interval start order. ```{r} ix4 <- insert(ix, "D", start = 2, end = 6) ix4 ``` `min_endpoint()` and `max_endpoint()` return the current smallest start and largest end endpoints (not the stored elements). ```{r} min_endpoint(ix) max_endpoint(ix) min_endpoint(interval_index()) ``` ## Empty indices Query and endpoint helpers are non-throwing on empty indices, and `length()` allows for empty checking. ```{r} empty_ix <- interval_index() length(empty_ix) peek_point(empty_ix, point = 1) pop_overlaps(empty_ix, start = 1, end = 5) ``` ## Named interval indices Interval indices can carry names, set either at construction or through `as_interval_index()` on a named list. Names support read-only indexing via `[`, `[[`, and `$`; all replacement forms (`[<-`, `[[<-`, `$<-`) error, because index-based assignment may break the ordering invariant. Named and unnamed elements cannot be mixed within one index. ```{r} ix_named <- as_interval_index( setNames(list("alice", "bob", "carol"), c("a", "b", "c")), start = c(1, 3, 2), end = c(4, 5, 6) ) ix_named ix_named[["b"]] ix_named[c("a", "c")] ix_named[1] try(ix_named$a <- "!!") ``` ## Transforming, iterating, merging `fapply()` maps a function over elements while preserving intervals and order. The function receives `(value, start, end)`, or `(value, start, end, name)` if it accepts a fourth argument. Endpoints and names are passed in read-only, and the return value replaces the stored element (similar to `lapply()` for named R lists). ```{r} fapply(ix, function(value, start, end) paste0(value, "[", start, ",", end, "]")) ``` `loop()` (re-exported from the **coro** package) walks the index in interval start-position order, yielding bare values. Interval endpoints are dropped from each yield; use `fapply()` if your callback needs `(value, start, end)`. ```{r} loop(for (v in ix) print(v)) ``` Plain `for (v in ix)` (without `loop()`) does *not* dispatch to the iteration protocol, it walks the underlying structure and yields internal nodes rather than elements. Always wrap with `loop()`. `merge(x, y)` combines two interval indices into a new one in start-position order, preserving left-biased FIFO on tied starts. The general case runs in O(m + n); disjoint start ranges collapse to O(log(min(m, n))) via concat. The reserved `.ivx_min_end` / `.ivx_max_end` monoids recompute automatically, so interval-relation queries work immediately on the result. ```{r} a <- as_interval_index(c("A1", "A2"), start = c(1, 5), end = c(4, 8)) b <- as_interval_index(c("B1", "B2"), start = c(3, 7), end = c(6, 10)) m <- merge(a, b) peek_all_point(m, 3) ``` Both indices must share the same endpoint type, the same `bounds` convention, and the same monoid set; mismatches error. Both inputs are left unmodified.