---
title: "Segments: the atoms of geometry"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Segments: the atoms of geometry}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## What is a segment?

Two connected vertices. 

A segment is the atom of geometry. Every line is a sequence of segments. Every
polygon boundary is a cycle of segments. Every track step between consecutive
GPS fixes is a segment with a duration. It's the simplest spatial primitive that
encodes *connection* rather than just *position*.

Simple Features --- a dominant spatial data model --- hides segments inside
sealed coordinate sequences. A polygon is a ring of coordinates. A line is a
list of coordinates. The connections between consecutive vertices are implicit.
To discover them you extract coordinates, pair consecutive points, and rebuild
the structure. 

wkpool makes segments explicit.

## The cost of sealed geometry

Consider two adjacent polygons sharing a boundary:

```{r setup}
library(wkpool)
library(wk)

two_squares <- c(
  wkt("POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))"),
  wkt("POLYGON ((1 0, 2 0, 2 1, 1 1, 1 0))")
)
```

In Simple Features, these are two independent geometries. Each owns its
coordinates. The shared edge from (1,0) to (1,1) is duplicated --- encoded
separately in both polygons with no structural record that they share a
boundary. To discover the shared edge, you'd need `sf::st_intersection()` or
equivalent, which runs the full GEOS machinery on every coordinate.

With wkpool, decompose to segments and the structure is *visible*:

```{r decompose}
pool <- establish_topology(two_squares)
pool
```

At this stage every coordinate has been minted as a vertex with an integer ID,
and every consecutive pair within a ring has become a segment. But the vertices
from different features are still separate --- the shared edge exists as
duplicate coordinate pairs. `topology_report()` shows the raw state:

```{r report-raw}
topology_report(pool)
```

Now merge coincident vertices:

```{r merge}
pool <- merge_coincident(pool)
topology_report(pool)
```

After merging, the two copies of (1,0) are now the same vertex ID, and the
two copies of (1,1) are the same vertex ID. The shared boundary is a structural
fact --- two segments in different features that reference the same vertex pair.

## What becomes easy

With segments as explicit objects, operations that are expensive on Simple
Features become table lookups.

**Shared boundaries** --- segments that appear in more than one feature:

```{r shared}
find_shared_edges(pool)
```

In Simple Features this requires geometric intersection tests. Here it's a
grouped count on vertex-pair indices.

**Neighbours** --- which features share structure:

```{r neighbours}
find_neighbours(pool)
```

The adjacency graph falls out of a self-join on shared segments.

**Internal boundaries** --- shared edges with opposite winding, the defining
signature of a true polygon boundary (as opposed to a self-touching ring):

```{r internal}
find_internal_boundaries(pool)
```

**Topology-preserving simplification** --- reduce vertex count while keeping
shared boundaries aligned. With wkpool, shared vertices are identified by index.
Simplify the vertex pool, and every feature that references those vertices
updates together. In Simple Features you simplify each geometry independently,
and shared boundaries drift apart.

## The full topology

wkpool provides the arc-node decomposition.   

**Vertex degree** tells you where topology is interesting:

```{r degree}
vertex_degree(pool)
```

Degree-2 vertices are interior to an arc --- they're the "pass-through" points.
Degree != 2 means a node: a branch point, an endpoint, or a junction where
multiple features meet.

**Nodes** --- the vertices where topology happens:

```{r nodes}
find_nodes(pool)
```

**Arcs** --- maximal sequences of segments between nodes:

```{r arcs}
find_arcs(pool)
arc_node_summary(pool)
```

This is the arc-node model from computational geometry (de Berg et al. 2008),
expressed as data frames rather than linked-list pointer structures. The same
decomposition that PostGIS topology provides, and Arc/INFO is probably the most
famous implementation, remnants exist in .e00 format and some other softwares. This
linear-only topology also highlights clearly how polygons are *just lines*, they aren't
composed of 2D topology (triangles) in many modern geo software. 

## Cycles and winding

Cycles are closed loops in the segment graph. For polygon data, they correspond
to rings:

```{r cycles}
cycles <- find_cycles(pool)
cycles
```

Signed area from the shoelace formula distinguishes outer rings from holes
--- negative area is an outer ring in the SF convention:

```{r classify}
classify_cycles(pool)
```

This is *intrinsic* --- we observe winding from the coordinate order, we don't
declare it via metadata. The geometry tells you what it is.

## The round trip

wkpool decomposes Simple Features into vertices and segments. It goes back too:

```{r roundtrip}
# Arcs as linestrings (the topological boundary representation)
arcs_to_wkt(pool)

# Cycles as polygons
cycles_to_wkt(pool)
```

This is the "SF at the edges" principle. Read geometry in any format wk can
handle (WKT, WKB, sf, geos, s2). Decompose to vertices and segments for
analysis. Recompose to Simple Features when you need to write, plot, or hand
off to another tool.

Simple Features is the interchange format. Segments are the working format.

## Integration: triangulation

The vertex pool and segment table map directly to constrained triangulation
inputs --- no coordinate extraction or reformatting needed:

RTriangle PSLG format (without classing)

```{r interch, include = FALSE}
pslg <- as_pslg(pool)
str(pslg)
#RTriangle::pslg(P = pslg$P, S = pslg$S)

```
The pool *is* the PSLG. The mapping is structural, not a conversion.

## Segments in movement data

wkpool applies beyond polygons. Track data --- animal movement, vessel
tracks, GPS logs --- is inherently a sequence of segments. Each step between
consecutive fixes carries a bearing, distance, speed, and duration.

The traipse package works directly on this segment view:

```{r tracks, eval = requireNamespace("traipse", quietly = TRUE)}
library(traipse)

# 5 GPS fixes from a Southern Ocean track
x <- c(147.0, 147.5, 148.1, 148.3, 148.0)
y <- c(-42.0, -42.3, -42.5, -42.2, -41.9)

# Each function operates on the implicit segment between consecutive points
track_bearing(x, y)
track_distance(x, y)
```

traipse doesn't construct a LINESTRING and decompose it. It works directly
on the consecutive vertex pairs. The segments *are* the data. The trip package
builds on this: a trip is a grouped tibble of ordered coordinates, and every
analytical operation runs on the segment view via `dplyr::lead()` and
`dplyr::lag()` within groups. No geometry column, no format conversion, no
decomposition/reconstruction cycle.

## Why this matters now

Apache Arrow is becoming the universal columnar memory format, and GeoArrow is
defining how geometry lives in Arrow's layout. A GeoArrow polygon is already
stored as coordinate arrays with offset indices --- structurally closer to a
vertex pool than to a sealed WKB blob.

wkpool is a proof of concept for what GeoArrow-native analysis could look like:
work on decomposed vertices and segments directly, without round-tripping through
Simple Features semantics. The same principle applies in any language.

## The lineage

The ideas in wkpool trace back to experiences with Arc/INFO, Myriax Eonfusion, and the
silicate package
[10.32614/CRAN.package.silicate](https://CRAN.R-project.org/package=silicate),
which explored topological data models for spatial data in R using PATH/PATH0,
ARC/ARC0, SC/SC0, TRI/TRI0 (segment-and-constraint) models. wkpool is leaner:
the vertex pool and segment table, built on wk handlers, without the full
ontology of silicate. Every network edge, every polygon boundary, every
track step reduces to a segment. 

## Further reading

- de Berg, M. et al. (2008). *Computational Geometry: Algorithms and
  Applications*. Springer. (Arc-node topology, DCEL)