Type: | Package |
Title: | Creating, Manipulating, and Subsetting "dist" Objects |
Version: | 0.3.0 |
Maintainer: | Minh Long Nguyen <edelweiss611428@gmail.com> |
Description: | Efficiently creates, manipulates, and subsets "dist" objects, commonly used in cluster analysis. Designed to minimise unnecessary conversions and computational overhead while enabling seamless interaction with distance matrices. |
License: | CC BY 4.0 |
Encoding: | UTF-8 |
URL: | https://github.com/edelweiss611428/dissimilarities |
BugReports: | https://github.com/edelweiss611428/dissimilarities/issues |
Imports: | Rcpp, microbenchmark, proxy, stats |
LinkingTo: | Rcpp |
RoxygenNote: | 7.3.2 |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2025-06-28 15:28:22 UTC; edelweiss |
Author: | Minh Long Nguyen [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-06-28 15:40:01 UTC |
Dist2Mat conversion
Description
Efficiently converts a "dist" object into a symmetric distance "matrix".
Usage
Dist2Mat(dist)
Arguments
dist |
A "dist" object, which can be computed via the stats::dist function, representing pairwise distances between observations. |
Details
Converts a "dist" object, typically created using the stats::dist function, into a symmetric matrix form. This implementation is optimised for speed and performs significantly faster than base::as.matrix or proxy::as.matrix when applied to "dist" objects.
Row names are retained. If it is null, as.character(1:nObs) will be used as the row and column names of the resulting matrix instead.
Value
A distance "matrix".
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
x = matrix(rnorm(200), nrow = 50)
dx = dist(x)
#Dist2Mat conversion
microbenchmark(base::as.matrix(dx),
proxy::as.matrix(dx),
Dist2Mat(dx))
#Check if equal
v1 = as.vector(base::as.matrix(dx))
v2 = as.vector(Dist2Mat(dx))
all.equal(v1, v2)
Expanding a distance matrix given new data
Description
Efficiently appends new "rows" to an existing "dist" object without explicitly recomputing a full pairwise distance matrix.
Usage
expandDist(distA, A, B, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
Arguments
distA |
A "dist" object, representing the pairwise distance matrix between observations in matrix A, ideally computed via the distance metric specified in this function. This requires manual check. |
A |
A numeric matrix. |
B |
A numeric matrix. |
method |
A character string specifying the distance metric to use. Supported methods include
|
diag |
A boolean value, indicating whether to display the diagonal entries. |
upper |
A boolean value, indicating whether to display the upper triangular entries. |
p |
A positive integer, required for computing Minkowski distance; by default p = 2 (i.e., Euclidean). |
Details
Expands an existing distance matrix of class "dist" for matrix A, given new data B, without explicitly computing the distance matrix of rbind(A,B). This supports multiple commonly used distance measures and is optimised for speed.
Row names are retained. If either rownames(A) or rownames(B) is null, as.character(1:(nrow(A)+nrow(B))) will be used as row names instead.
Value
A distance matrix of class "dist" for rbind(A,B).
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
A = matrix(rnorm(100), nrow = 20)
B = matrix(rnorm(250), nrow = 50)
AB = rbind(A,B)
distA = fastDist(A)
v1 = as.vector(expandDist(distA, A, B))
v2 = as.vector(fastDist(AB))
all.equal(v1, v2)
"dist" object computation
Description
Efficiently computes a "dist" object from a numeric matrix using various distance metrics.
Usage
fastDist(X, method = "euclidean", diag = FALSE, upper = FALSE, p = 2L)
Arguments
X |
A numeric matrix. |
method |
A character string specifying the distance metric to use. Supported methods include
|
diag |
A boolean value, indicating whether to display the diagonal entries. |
upper |
A boolean value, indicating whether to display the upper triangular entries. |
p |
A positive integer, required for computing Minkowski distance; by default p = 2 (i.e., Euclidean). |
Details
Calculates pairwise distances between rows of a numeric matrix and returns the result as a compact "dist" object, which stores the lower-triangular entries of a complete distance matrix. Supports multiple distance measures, including "euclidean", "manhattan", "maximum", "minkowski", "cosine", and "canberra". This implementation is optimised for speed, especially on large matrices.
Row names are retained. If it is null, as.character(1:nrow(X)) will be used as row names instead.
Value
A distance matrix of class "dist".
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
x = matrix(rnorm(200), nrow = 50)
microbenchmark(stats::dist(x, "minkowski", p = 5),
fastDist(x, "minkowski", p = 5))
v1 = as.vector(stats::dist(x, "minkowski", p = 5))
v2 = as.vector(fastDist(x, "minkowski", p = 5))
all.equal(v1, v2)
Computing pairwise distances between rows of two matrices
Description
Efficiently computes pairwise distances between the rows of two numeric matrices using various distance metrics.
Usage
fastDistAB(A, B, method = "euclidean", p = 2L)
Arguments
A |
A numeric matrix. |
B |
A numeric matrix. |
method |
A character string specifying the distance metric to use. Supported methods include
|
p |
A positive integer, required for computing Minkowski distance; by default p = 2 (i.e., Euclidean). |
Details
This function computes the full pairwise distance matrix between the rows of matrices A
and B
,
without forming a concatenated matrix or performing unnecessary intermediate conversions. It supports multiple
commonly used distance measures and is optimised for speed.
Row names in A and B are retained. If either rownames(A) or rownames(B) is null, as.character(1:nrow(A)) and as.character(1:nrow(B)) will be used as row and column names of the resulting matrix instead.
Value
A numeric matrix of dimensions nrow(A)
by nrow(B)
, where each entry represents the distance between a row in A
and a row in B
.
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
X = matrix(rnorm(200), nrow = 50)
A = X[1:25,]
B = X[26:50,]
microbenchmark(proxy::dist(A,B, "minkowski", p = 5),
fastDistAB(A,B, "minkowski", p = 5L))
#Check if equal
v1 = as.vector(proxy::dist(A,B, "minkowski", p = 5))
v2 = as.vector(fastDistAB(A,B, "minkowski", p = 5L))
all.equal(v1, v2)
2D-indexing to 1D-indexing
Description
Efficiently computes 1D-indexing from 2D-indexing
Usage
get1dFrom2d(i,j, N)
Arguments
i |
An integer specifying the row index |
j |
An integer specifying the column index - must be different from i as "dist" object does not store the diagonal entries. |
N |
The number of observations in the original data matrix |
Details
Converts 2D indexing (a row-column pair) into 1D indexing (as used in R's "dist" objects), given the number of observations N.
Currently, name-based indexing is not supported."
Value
An integer specifying the 1d index
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
N = 5
for(i in 1:4){
for(j in (i+1):5){
print(get1dFrom2d(i,j,N))
}
}
1D-indexing to 2D-indexing
Description
Efficiently computes 2D-indexing from 1D-indexing
Usage
get2dFrom1d(idx1d, N)
Arguments
idx1d |
An integer vector of 1D indexes |
N |
The number of observations in the original data matrix |
Details
Converts 1D indexing (as used in R's "dist" objects) into 2D indexing (row-column pairs)
for a distance matrix of size N \times N
.
Currently, name-based indexing is not supported."
Value
An integer matrix storing the corresponding 2D indexes.
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
get2dFrom1d(1:10, 5)
Subsetting a "dist" object by columns
Description
Efficiently extracts a column-wise subset of a "dist" object, returning the corresponding submatrix of pairwise distances. # nolint
Usage
subCols(dist, idx)
Arguments
dist |
A "dist" object, which can be computed via the stats::dist function, representing pairwise distances between observations. |
idx |
An integer vector, specifying the column indices of the subsetted matrix. |
Details
This function extracts specified columns from a "dist" object without explicit conversion to a dense distance "matrix", resulting in better performance and reduced memory overhead. Particularly useful when only a subset of distances is needed for downstream tasks.
Row names are retained. If it is null, as.character(1:nObs) and as.character(idx) will be used as row and column names of the resulting matrix instead.
Value
A numeric "matrix" containing the pairwise distances between all rows and the specified columns.
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
x = matrix(rnorm(200), nrow = 50)
dx = dist(x)
#Randomly subsetting a 50x10 matrix
idx = sample(1:50, 10)
microbenchmark(base::as.matrix(dx)[1:50,idx],
proxy::as.matrix(dx)[1:50,idx],
subCols(dx, idx))
#Check if equal
v1 = as.vector(base::as.matrix(dx)[1:50,idx])
v2 = as.vector(subCols(dx, idx))
all.equal(v1, v2)
Dist2Dist subsetting
Description
Efficiently extracts a subset of observations from a "dist" object and returns a new "dist" object representing only the selected distances.
Usage
subDist2Dist(dist, idx, diag = FALSE, upper = FALSE)
Arguments
dist |
A "dist" object, which can be computed via the stats::dist function, representing the full pairwise distance matrix between observations. |
idx |
An integer vector, specifying the indices of the observations to retain. |
diag |
A boolean value, indicating whether to display the diagonal entries. |
upper |
A boolean value, indicating whether to display the upper triangular entries. |
Details
This function subsets a "dist" object directly without explicit conversion to a dense distance "matrix". It extracts only the relevant distances corresponding to the selected indices, improving both performance and memory efficiency. The result is returned as a subsetted "dist" object, preserving compatibility with downstream functions that accept this class.
Row names are retained. If it is null, as.character(idx) will be used as row names instead.
Value
A numeric "matrix" storing pairwise distances between the selected observations.
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
x = matrix(rnorm(200), nrow = 50)
dx = dist(x)
#Subsetting the first 10 units
microbenchmark(as.dist(base::as.matrix(dx)[1:10,1:10]),
as.dist(proxy::as.matrix(dx)[1:10,1:10]),
subDist2Dist(dx, 1:10))
#Check if equal
v1 = as.vector(as.dist(base::as.matrix(dx)[1:10,1:10]))
v2 = as.vector(subDist2Dist(dx, 1:10))
all.equal(v1, v2)
Dist2Mat subsetting
Description
Efficiently extracts a 2d submatrix of pairwise distances from a "dist" object.
Usage
subDist2Mat(dist, idx1, idx2)
Arguments
dist |
A "dist" object, which can be computed via the stats::dist function, representing the full pairwise distance matrix between observations. |
idx1 |
An integer vector, specifying the row indices of the subsetted matrix. |
idx2 |
An integer vector, specifying the column indices of the subsetted matrix. |
Details
This function efficiently subsets a "dist" object by row and column indices, returning the corresponding rectangular section as a numeric matrix. It avoids explicit conversion from the "dist" object to a dense "matrix", improving memory efficiency and computational speed, especially with large datasets.
Row names are retained. If it is null, as.character(idx1) and as.character(idx2) will be used as row and column names of the resulting matrix instead.
Value
A numeric matrix storing pairwise distances between observations column-indexed by idx1
and row-indexed by idx2
.
Author(s)
Minh Long Nguyen edelweiss611428@gmail.com
Examples
library("microbenchmark")
x = matrix(rnorm(200), nrow = 50)
dx = dist(x)
#Randomly subsetting a 10x10 matrix
idx1 = sample(1:50, 10)
idx2 = sample(1:50, 10)
microbenchmark(base::as.matrix(dx)[idx1,idx2],
proxy::as.matrix(dx)[idx1,idx2],
subDist2Mat(dx, idx1, idx2))
#Check if equal
v1 = as.vector(base::as.matrix(dx)[idx1,idx2])
v2 = as.vector(subDist2Mat(dx, idx1, idx2))
all.equal(v1, v2)