Help for package geokmeans

Type:

Package

Title:

A Collection of Fast, Exact and Eco-Friendly k-Means Clustering Algorithms

Version:

0.1.0

Description:

A collection of fast k-means clustering algorithms under a single, uniform interface. The core method is Geometric-k-means, a bound-free algorithm of Sharma et al. (2026) <doi:10.1007/s10994-025-06891-1> that uses geometry to restrict computation to the data points able to change clusters, substantially reducing distance computations and runtime while returning the same result as standard k-means. Also included are Lloyd's algorithm, Elkan, Hamerly, Annulus, Exponion, and Ball k-means. All algorithms are implemented in 'C++' via 'Rcpp' and 'RcppEigen' and return the final centroids, optional per-point cluster assignments, and computational statistics.

License:

GPL-3

Encoding:

UTF-8

Imports:

Rcpp

LinkingTo:

Rcpp, RcppEigen

SystemRequirements:

C++17

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown

Config/testthat/edition:

VignetteBuilder:

knitr

URL:

https://github.com/parichit/Geometric-k-means

BugReports:

https://github.com/parichit/Geometric-k-means/issues

NeedsCompilation:

yes

Packaged:

2026-06-17 16:23:38 UTC; parichit

Author:

Parichit Sharma [aut, cre, cph], Hasan Kurban [aut]

Maintainer:

Parichit Sharma <parishar@iu.edu>

Config/roxygen2/version:

8.0.0

Repository:

CRAN

Date/Publication:

2026-06-22 16:10:02 UTC

geokmeans: Fast and Eco-Friendly k-Means Clustering Algorithms

Description

Fast C++ implementations of several k-means clustering algorithms exposed to R through a uniform interface: Lloyd's algorithm, Elkan, Hamerly, Annulus, Exponion, Ball k-means, and the bound-free Geometric-k-means method.

Details

The main entry points are geo_kmeans(), lloyd_kmeans(), elkan_kmeans(), hamerly_kmeans(), annulus_kmeans(), exponion_kmeans(), ball_kmeans(), and the dispatcher kmeans_dc().

Author(s)

Maintainer: Parichit Sharma parishar@iu.edu [copyright holder]

Authors:

Parichit Sharma parishar@iu.edu [copyright holder]
Hasan Kurban

References

Sharma, P., Stanislaw, M., Kurban, H., Kulekci, O., and Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. doi:10.1007/s10994-025-06891-1

k-Means clustering algorithms

Description

Run one of the bundled k-means variants on a numeric data matrix. All functions share the same interface and return value; they differ only in the acceleration strategy used internally. geo_kmeans() runs the bound-free Geometric-k-means method.

Usage

geo_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

lloyd_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

elkan_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

hamerly_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

annulus_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

exponion_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

ball_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

Arguments

data

A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed.

centers

Either a single positive integer giving the number of clusters k, or a numeric matrix of initial cluster centres (one centroid per row, with ncol(centers) == ncol(data)).

iter_max

Maximum number of iterations.

threshold

Convergence threshold on centroid movement.

init

Initialisation strategy when centers is a number: "random" (random observations) or "sequential" (the first k observations). Ignored when centers is a matrix.

seed

Optional integer seed for the random initialisation, or NULL (the default). Initialisation uses R's random number generator: supplying a seed sets it via set.seed() so the result is reproducible, while NULL leaves the RNG untouched, so the ambient stream (e.g. a preceding set.seed() in your session) is honoured.

with_labels

Logical; if TRUE (default) the result includes a per-observation cluster assignment computed from the final centroids.

verbose

Logical; if TRUE, print the algorithm's convergence message.

drop_empty

Logical; if TRUE (default), clusters that end up with no assigned observations are removed from the result and the remaining cluster labels are renumbered, with a message. Requesting more clusters than the number of distinct rows in data is always an error.

Value

An object of class "geokmeans": a list with components

centroids: A ⁠k x ncol(data)⁠ matrix of final cluster centres.
cluster: Integer vector of cluster ids (1-based), if with_labels = TRUE.
iterations: Number of iterations performed.
distance_calculations: Total number of point-to-centroid distance computations.
method: The algorithm used.
k: The number of clusters.

References

Sharma, P., Stanislaw, M., Kurban, H., Kulekci, O., and Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. doi:10.1007/s10994-025-06891-1

Examples

set.seed(1)
X <- rbind(matrix(rnorm(100, 0), ncol = 2),
           matrix(rnorm(100, 5), ncol = 2))
fit <- geo_kmeans(X, centers = 2)
fit$centroids
table(fit$cluster)

# Supplying explicit starting centroids:
geo_kmeans(X, centers = X[c(1, 51), ])

Run a k-means variant by name

Description

A thin dispatcher over the individual algorithm functions.

Usage

kmeans_dc(
  data,
  centers,
  method = c("geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball"),
  ...
)

Arguments

data

A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed.

centers

Either a single positive integer giving the number of clusters k, or a numeric matrix of initial cluster centres (one centroid per row, with ncol(centers) == ncol(data)).

method

The algorithm to use. One of "geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball".

...

Further arguments passed to the chosen algorithm.

Value

An object of class "geokmeans"; see geo_kmeans().

Examples

set.seed(1)
X <- rbind(matrix(rnorm(100, 0), ncol = 2),
           matrix(rnorm(100, 5), ncol = 2))
kmeans_dc(X, centers = 2, method = "elkan")

Package {geokmeans}

geokmeans: Fast and Eco-Friendly k-Means Clustering Algorithms

Description

Details

Author(s)

References

See Also

k-Means clustering algorithms

Description

Usage

Arguments

Value

References

Examples

Run a k-means variant by name

Description

Usage

Arguments

Value

Examples