Help for package fdacluster

Title:

Joint Clustering and Alignment of Functional Data

Version:

0.4.1

Description:

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.

License:

GPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

LazyDataCompression:

LinkingTo:

Rcpp, RcppArmadillo, nloptr

RoxygenNote:

7.3.2

Suggests:

fda, funData, future, knitr, rmarkdown, testthat (≥ 3.0.0), withr

Imports:

cli, cluster, dbscan, fdasrvf, future.apply, ggplot2, lpSolve, nloptr, progressr, Rcpp, rlang, tibble

Depends:

R (≥ 3.5.0)

URL:

https://astamm.github.io/fdacluster/, https://github.com/astamm/fdacluster

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-01-14 15:17:40 UTC; stamm-a

Author:

Aymeric Stamm

[aut, cre], Laura Sangalli [ctb], Piercesare Secchi [ctb], Simone Vantini [ctb], Valeria Vitelli [ctb], Alessandro Zito [ctb]

Maintainer:

Aymeric Stamm <aymeric.stamm@cnrs.fr>

Repository:

CRAN

Date/Publication:

2025-01-14 16:50:09 UTC

fdacluster: Joint Clustering and Alignment of Functional Data

Description

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" doi:10.1016/j.csda.2009.12.008. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" doi:10.1016/j.csda.2012.12.001.

Author(s)

Maintainer: Aymeric Stamm aymeric.stamm@cnrs.fr (ORCID)

Other contributors:

Laura Sangalli laura.sangalli@polimi.it [contributor]
Piercesare Secchi piercesare.secchi@polimi.it [contributor]
Simone Vantini simone.vantini@polimi.it [contributor]
Valeria Vitelli valeria.vitelli@medisin.uio.no [contributor]
Alessandro Zito zito.ales@gmail.com [contributor]

References

Arthur, D., and S. Vassilvitskii. 2007. “K-Means++ the Advantages of Careful Seeding.” In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 1027–35.

Marron, J. S., J. O. Ramsay, L. M. Sangalli, and A. Srivastava. 2014. “Statistics of Time Warpings and Phase Variations.”

Marron, J. S., J. O. Ramsay, L. M. Sangalli, and A. Srivastava. 2015. “Functional Data Analysis of Amplitude and Phase Variation.” Statistical Science, 468–84.

Ramsay, J., and B. W. Silverman. 2005. Functional Data Analysis. Springer Series in Statistics. Springer.

Sangalli, L. M., P. Secchi, S. Vantini, and V. Vitelli. 2010. “K-Mean Alignment for Curve Clustering.” Computational Statistics & Data Analysis 54 (5): 1219–33.

Tucker, J. D., W. Wu, and A. Srivastava. 2013. “Generative Models for Functional Data Using Phase and Amplitude Separation.” Computational Statistics & Data Analysis 61: 50–66.

Vantini, S. 2012. “On the Definition of Phase and Amplitude Variability in Functional Data Analysis.” Test 21 (4): 676–96. https://doi.org/10.1007/s11749-011-0268-9.

Visualizes the result of a clustering strategy stored in a `caps` object with ggplot2

Description

This function creates a visualization of the result of the k-mean alignment algorithm and invisibly returns the corresponding ggplot2::ggplot object which enable further customization of the plot. The user can choose to visualize either the amplitude information data in which case original and aligned curves are shown or the phase information data in which case the estimated warping functions are shown.

Usage

## S3 method for class 'caps'
autoplot(object, type = c("amplitude", "phase"), ...)

Arguments

object

An object of class caps.

type

A string specifying the type of information to display. Choices are "amplitude" for plotting the original and aligned curves which represent amplitude information data or "phase" for plotting the corresponding warping functions which represent phase information data. Defaults to "amplitude".

...

Not used.

Value

A ggplot2::ggplot object invisibly.

Examples


ggplot2::autoplot(sim30_caps, type = "amplitude")
ggplot2::autoplot(sim30_caps, type = "phase")

Visualizes results of multiple clustering strategies using ggplot2

Description

This is an S3 method implementation of the ggplot2::autoplot() generic for objects of class mcaps to visualize the performances of multiple caps objects applied on the same data sets either in terms of WSS or in terms of silhouette values.

Usage

## S3 method for class 'mcaps'
autoplot(
  object,
  validation_criterion = c("wss", "silhouette"),
  what = c("mean", "distribution"),
  ...
)

Arguments

object

An object of class mcaps.

validation_criterion

A string specifying the validation criterion to be used for the comparison. Choices are "wss" or "silhouette". Defaults to "wss".

what

A string specifying the kind of information to display about the validation criterion. Choices are "mean" (which plots the mean values) or "distribution" (which plots the boxplots). Defaults to "mean".

...

Other arguments passed to specific methods.

Value

An object of class ggplot2::ggplot.

Examples


p <- ggplot2::autoplot(sim30_mcaps)

Class for clustering with amplitude and phase separation

Description

The k-means algorithm with joint amplitude and phase separation produces a number of outputs. This class is meant to encapsulate them into a single object for providing dedicated S3 methods for e.g. plotting, summarizing, etc. The name of the class stems from Clustering with Amplitude and Phase Separation.

Usage

as_caps(x)

is_caps(x)

Arguments

x

A list coercible into an object of class caps.

Details

An object of class caps is a list with the following components:

original_curves: A numeric matrix of shape N \times L \times M storing a sample with the N L-dimensional original curves observed on grids of size M.
original_grids: A numeric matrix of size N \times M storing the grids of size M on which original curves are evaluated;
aligned_grids: A numeric matrix of size N \times M storing the grids of size M on which original curves must be evaluated to be aligned;
center_curves: A numeric matrix of shape K \times L \times M storing the K centers which are L-dimensional curves observed on a grid of size M;
center_grids: A numeric matrix of size K \times M storing the grids of size M on which center curves are evaluated;
warpings: A numeric matrix of shape N \times M storing the estimated warping functions for each of the N curves evaluated on the within-cluster common grids of size M;
n_clusters: An integer value storing the number of clusters;
memberships: An integer vector of length N storing the cluster ID which each curve belongs to;
distances_to_center: A numeric vector of length N storing the distance of each curve to the center of its cluster;
silhouettes: A numeric vector of length N storing the silhouette values of each observation;
amplitude_variation: A numeric value storing the fraction of total variation explained by amplitude variability.
total_variation: A numeric value storing the amount of total variation.
n_iterations: An integer value storing the number of iterations performed until convergence;
call_name: A string storing the name of the function that was used to produce the k-means alignment results;
call_args: A list containing the exact arguments that were passed to the function call_name that produced this output.

Value

The function as_caps() returns an object of class caps. The function is_caps() returns a boolean which evaluates to TRUE is the input object is of class caps.

Examples

as_caps(sim30_caps)
is_caps(sim30_caps)

Generates results of multiple clustering strategies

Description

This function searches for clusters in the input data set using different strategies and generates an object of class mcaps which stores multiple objects of class caps. This is a helper function to facilitate comparison of clustering methods and choice of an optimal one.

Usage

compare_caps(
  x,
  y,
  n_clusters = 1:5,
  is_domain_interval = FALSE,
  transformation = c("identity", "srvf"),
  metric = c("l2", "normalized_l2", "pearson"),
  clustering_method = c("kmeans", "hclust-complete", "hclust-average", "hclust-single",
    "dbscan"),
  warping_class = c("none", "shift", "dilation", "affine", "bpd"),
  centroid_type = c("mean", "medoid", "median", "lowess", "poly"),
  cluster_on_phase = FALSE
)

Arguments

x

A numeric vector of length M or a numeric matrix of shape N \times M or an object of class funData::funData. If a numeric vector or matrix, it specifies the grid(s) of size M on which each of the N curves have been observed. If an object of class funData::funData, it contains the whole functional data set and the y argument is not used.

y

Either a numeric matrix of shape N \times M or a numeric array of shape N \times L \times M or an object of class fda::fd. If a numeric matrix or array, it specifies the N-sample of L-dimensional curves observed on grids of size M. If an object of class fda::fd, it contains all the necessary information about the functional data set to be able to evaluate it on user-defined grids.

n_clusters

An integer vector specifying a set of clustering partitions to create. Defaults to 1:5.

is_domain_interval

A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to FALSE.

transformation

A string specifying the transformation to apply to the original sample of curves. Choices are no transformation (transformation = "identity") or square-root velocity function transformation = "srvf". Defaults to "identity".

metric

A string specifying the metric used to compare curves. Choices are "l2", "normalized_l2" or "pearson". If transformation == "srvf", the metric must be "l2" because the SRVF transform maps absolutely continuous functions to square-integrable functions. If transformation == "identity" and warping_class is either dilation or affine, the metric cab be either "normalized_l2" or "pearson". The L2 distance is indeed not dilation-invariant or affine-invariant. The metric can also be "l2" if warping_class == "shift". Defaults to "l2".

clustering_method

A character vector specifying one or more clustering methods to be fit. Choices are "kmeans", "hclust-complete", "hclust-average", "hclust-single" or "dbscan". Defaults to all of them.

warping_class

A character vector specifying one or more classes of warping functions to use for curve alignment. Choices are "affine", "dilation", "none", "shift" or "bpd". Defaults to all of them.

centroid_type

A character vector specifying one or more ways to compute centroids. Choices are "mean", "medoid", "median", "lowess" or "poly". Defaults to all of them.

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

Value

An object of class mcaps which is a tibble::tibble storing the objects of class caps in correspondence of each combination of possible choices from the input arguments.

Examples

#----------------------------------
# Compare k-means results with k = 1, 2, 3, 4, 5 using mean centroid and
# various warping classes.
## Not run: 
sim30_mcaps <- compare_caps(
  x = simulated30_sub$x,
  y = simulated30_sub$y,
  warping_class = c("none", "shift", "dilation", "affine"),
  clustering_method = "kmeans",
  centroid_type = "mean"
)

## End(Not run)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(sim30_mcaps)
# or using graphics::plot()
# You can visualize the WSS values:
plot(sim30_mcaps, validation_criterion = "wss", what = "mean")
plot(sim30_mcaps, validation_criterion = "wss", what = "distribution")
# Or the average silhouette values:
plot(sim30_mcaps, validation_criterion = "silhouette", what = "mean")
plot(sim30_mcaps, validation_criterion = "silhouette", what = "distribution")

Diagnostic plot for the result of a clustering strategy stored in a `caps` object

Description

This function plots the values of the distance to center and silhouette for each observation. Observations are ordered within cluster by decreasing value of silhouette.

Usage

diagnostic_plot(x)

Arguments

x

An object of class caps.

Value

An object of class ggplot2::ggplot.

Examples

diagnostic_plot(sim30_caps)

Performs density-based clustering for functional data with amplitude and phase separation

Description

This function extends DBSCAN to functional data. It includes the possibility to separate amplitude and phase information.

Usage

fdadbscan(
  x,
  y,
  is_domain_interval = FALSE,
  transformation = c("identity", "srvf"),
  warping_class = c("none", "shift", "dilation", "affine", "bpd"),
  centroid_type = "mean",
  metric = c("l2", "normalized_l2", "pearson"),
  cluster_on_phase = FALSE,
  use_verbose = FALSE,
  warping_options = c(0.15, 0.15),
  maximum_number_of_iterations = 100L,
  number_of_threads = 1L,
  parallel_method = 0L,
  distance_relative_tolerance = 0.001,
  use_fence = FALSE,
  check_total_dissimilarity = TRUE,
  compute_overall_center = FALSE
)

Arguments

x

y

is_domain_interval

A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to FALSE.

transformation

warping_class

A string specifying the class of warping functions. Choices are no warping (warping_class = "none"), shift y = x + b (warping_class = "shift"), dilation y = ax (warping_class = "dilation"), affine y = ax + b (warping_class = "affine") or boundary-preserving diffeomorphism (warping_class = "bpd"). Defaults to "none".

centroid_type

A string specifying the type of centroid to compute. Choices are "mean", "median" "medoid", "lowess" or "poly". Defaults to "mean". If LOWESS appproximation is chosen, the user can append an integer between 0 and 100 as in "lowess20". This number will be used as the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. The default value is 10%. If polynomial approximation is chosen, the user can append an positive integer as in "poly3". This number will be used as the degree of the polynomial model. The default value is 4L.

metric

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

use_verbose

A boolean specifying whether the algorithm should output details of the steps to the console. Defaults to FALSE.

warping_options

A numeric vector supplied as a helper to the chosen warping_class to decide on warping parameter bounds. This is used only when warping_class != "srvf".

maximum_number_of_iterations

An integer specifying the maximum number of iterations before the algorithm stops if no other convergence criterion was met. Defaults to 100L.

number_of_threads

An integer value specifying the number of threads used for parallelization. Defaults to 1L. This is used only when warping_class != "srvf".

parallel_method

An integer value specifying the type of desired parallelization for template computation, If 0L, templates are computed in parallel. If 1L, parallelization occurs within a single template computation (only for the medoid method as of now). Defaults to 0L. This is used only when warping_class != "srvf".

distance_relative_tolerance

A numeric value specifying a relative tolerance on the distance update between two iterations. If all observations have not sufficiently improved in that sense, the algorithm stops. Defaults to 1e-3. This is used only when warping_class != "srvf".

use_fence

A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to FALSE. This is used only when warping_class != "srvf".

check_total_dissimilarity

A boolean specifying whether an additional stopping criterion based on improvement of the total dissimilarity should be used. Defaults to TRUE. This is used only when warping_class != "srvf".

compute_overall_center

A boolean specifying whether the overall center should be also computed. Defaults to FALSE. This is used only when warping_class != "srvf".

Value

An object of class caps.

Examples

#----------------------------------
# Extracts 15 out of the 30 simulated curves in `simulated30_sub` data set
idx <- c(1:5, 11:15)
x <- simulated30_sub$x[idx, ]
y <- simulated30_sub$y[idx, , ]

#----------------------------------
# Runs an HAC with affine alignment, searching for 2 clusters
out <- fdadbscan(
  x = x,
  y = y,
  warping_class = "affine",
  metric = "normalized_l2"
)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(out)
# or using graphics::plot()
# You can visualize the original and aligned curves with:
plot(out, type = "amplitude")
# Or the estimated warping functions with:
plot(out, type = "phase")

Computes the distance matrix for functional data with amplitude and phase separation

Description

This function computes the matrix of pairwise distances between curves a functional data sample. This can be achieved with or without phase and amplitude separation, which can be done using a variety of warping classes.

Usage

fdadist(
  x,
  y = NULL,
  is_domain_interval = FALSE,
  transformation = c("identity", "srvf"),
  warping_class = c("none", "shift", "dilation", "affine", "bpd"),
  metric = c("l2", "normalized_l2", "pearson"),
  cluster_on_phase = FALSE,
  labels = NULL
)

Arguments

x

y

is_domain_interval

A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to FALSE.

transformation

warping_class

metric

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

labels

A character vector specifying curve labels. Defaults to NULL which uses sequential numbers as labels.

Value

A stats::dist object storing the distance matrix between the input curves using the metric specified through the argument metric and the warping class specified by the argument warping_class.

Examples

idx <- c(1:5, 11:15, 21:25)
D <- fdadist(simulated30_sub$x[idx, ], simulated30_sub$y[idx, , ])

Performs hierarchical clustering for functional data with amplitude and phase separation

Description

This function extends hierarchical agglomerative clustering to functional data. It includes the possibility to separate amplitude and phase information.

Usage

fdahclust(
  x,
  y = NULL,
  n_clusters = 1L,
  is_domain_interval = FALSE,
  transformation = c("identity", "srvf"),
  warping_class = c("none", "shift", "dilation", "affine", "bpd"),
  centroid_type = "mean",
  metric = c("l2", "normalized_l2", "pearson"),
  cluster_on_phase = FALSE,
  linkage_criterion = c("complete", "average", "single", "ward.D2"),
  use_verbose = FALSE,
  warping_options = c(0.15, 0.15),
  maximum_number_of_iterations = 100L,
  number_of_threads = 1L,
  parallel_method = 0L,
  distance_relative_tolerance = 0.001,
  use_fence = FALSE,
  check_total_dissimilarity = TRUE,
  compute_overall_center = FALSE
)

Arguments

x

y

n_clusters

An integer value specifying the number of clusters. Defaults to 1L.

is_domain_interval

A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to FALSE.

transformation

warping_class

centroid_type

metric

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

linkage_criterion

A string specifying which linkage criterion should be used to compute distances between sets of curves. Choices are "complete" for complete linkage, "average" for average linkage and "single" for single linkage. See stats::hclust() for more details. Defaults to "complete".

use_verbose

A boolean specifying whether the algorithm should output details of the steps to the console. Defaults to FALSE.

warping_options

A numeric vector supplied as a helper to the chosen warping_class to decide on warping parameter bounds. This is used only when warping_class != "srvf".

maximum_number_of_iterations

An integer specifying the maximum number of iterations before the algorithm stops if no other convergence criterion was met. Defaults to 100L.

number_of_threads

An integer value specifying the number of threads used for parallelization. Defaults to 1L. This is used only when warping_class != "srvf".

parallel_method

distance_relative_tolerance

use_fence

A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to FALSE. This is used only when warping_class != "srvf".

check_total_dissimilarity

A boolean specifying whether an additional stopping criterion based on improvement of the total dissimilarity should be used. Defaults to TRUE. This is used only when warping_class != "srvf".

compute_overall_center

A boolean specifying whether the overall center should be also computed. Defaults to FALSE. This is used only when warping_class != "srvf".

Details

The number of clusters is required as input because, with functional data, once hierarchical clustering is performed, curves within clusters need to be aligned to their corresponding centroid.

Value

An object of class caps.

Examples

#----------------------------------
# Extracts 15 out of the 30 simulated curves in `simulated30_sub` data set
idx <- c(1:5, 11:15, 21:25)
x <- simulated30_sub$x[idx, ]
y <- simulated30_sub$y[idx, , ]

#----------------------------------
# Runs an HAC with affine alignment, searching for 2 clusters
out <- fdahclust(
  x = x,
  y = y,
  n_clusters = 2,
  warping_class = "affine",
  metric = "normalized_l2"
)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(out)
# or using graphics::plot()
# You can visualize the original and aligned curves with:
plot(out, type = "amplitude")
# Or the estimated warping functions with:
plot(out, type = "phase")

Performs k-means clustering for functional data with amplitude and phase separation

Description

This function provides implementations of the k-means clustering algorithm for functional data, with possible joint amplitude and phase separation. A number of warping class are implemented to achieve this separation.

Usage

fdakmeans(
  x,
  y = NULL,
  n_clusters = 1L,
  seeds = NULL,
  seeding_strategy = c("kmeans++", "exhaustive-kmeans++", "exhaustive", "hclust"),
  is_domain_interval = FALSE,
  transformation = c("identity", "srvf"),
  warping_class = c("none", "shift", "dilation", "affine", "bpd"),
  centroid_type = "mean",
  metric = c("l2", "normalized_l2", "pearson"),
  cluster_on_phase = FALSE,
  use_verbose = FALSE,
  warping_options = c(0.15, 0.15),
  maximum_number_of_iterations = 100L,
  number_of_threads = 1L,
  parallel_method = 0L,
  distance_relative_tolerance = 0.001,
  use_fence = FALSE,
  check_total_dissimilarity = TRUE,
  compute_overall_center = FALSE,
  add_silhouettes = TRUE
)

Arguments

x

y

n_clusters

An integer value specifying the number of clusters. Defaults to 1L.

seeds

An integer value or vector specifying the indices of the initial centroids. If an integer vector, it is interpreted as the indices of the intial centroids and should therefore be of length n_clusters. If an integer value, it is interpreted as the index of the first initial centroid and subsequent centroids are chosen according to the k-means++ strategy. It can be NULL in which case the argument seeding_strategy is used to automatically provide suitable indices. Defaults to NULL.

seeding_strategy

A character string specifying the strategy for choosing the initial centroids in case the argument seeds is set to NULL. Choices are "kmeans++", "exhaustive-kmeans++" which performs an exhaustive search over the choice of the first centroid, "exhaustive" which tries on all combinations of initial centroids or "hclust" which first performs hierarchical clustering using Ward's linkage criterion to identify initial centroids. Defaults to "kmeans++", which is the fastest strategy.

is_domain_interval

A boolean specifying whether the sample of curves is defined on a fixed interval. Defaults to FALSE.

transformation

warping_class

centroid_type

metric

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

use_verbose

A boolean specifying whether the algorithm should output details of the steps to the console. Defaults to FALSE.

warping_options

A numeric vector supplied as a helper to the chosen warping_class to decide on warping parameter bounds. This is used only when warping_class != "srvf".

maximum_number_of_iterations

An integer specifying the maximum number of iterations before the algorithm stops if no other convergence criterion was met. Defaults to 100L.

number_of_threads

An integer value specifying the number of threads used for parallelization. Defaults to 1L. This is used only when warping_class != "srvf".

parallel_method

distance_relative_tolerance

use_fence

A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to FALSE. This is used only when warping_class != "srvf".

check_total_dissimilarity

A boolean specifying whether an additional stopping criterion based on improvement of the total dissimilarity should be used. Defaults to TRUE. This is used only when warping_class != "srvf".

compute_overall_center

A boolean specifying whether the overall center should be also computed. Defaults to FALSE. This is used only when warping_class != "srvf".

add_silhouettes

A boolean specifying whether silhouette values should be computed for each observation for internal validation of the clustering structure. Defaults to TRUE.

Value

An object of class caps.

Examples

#----------------------------------
# Extracts 15 out of the 30 simulated curves in `simulated30_sub` data set
idx <- c(1:5, 11:15, 21:25)
x <- simulated30_sub$x[idx, ]
y <- simulated30_sub$y[idx, , ]

#----------------------------------
# Runs a k-means clustering with affine alignment, searching for 2 clusters
out <- fdakmeans(
  x = x,
  y = y,
  n_clusters = 2,
  warping_class = "affine",
  metric = "normalized_l2"
)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(out)
# or using graphics::plot()
# You can visualize the original and aligned curves with:
plot(out, type = "amplitude")
# Or the estimated warping functions with:
plot(out, type = "phase")

Linear and integer programming

Description

Interface to lp_solve linear/integer programming system.

Usage

lp(
  direction = "min",
  objective.in,
  const.mat,
  const.dir,
  const.rhs,
  transpose.constraints = TRUE,
  int.vec,
  presolve = 0,
  compute.sens = 0,
  binary.vec,
  all.int = FALSE,
  all.bin = FALSE,
  scale = 196,
  dense.const,
  num.bin.solns = 1,
  use.rw = FALSE,
  timeout = 0L
)

Arguments

direction

Character string giving direction of optimization: "min" (default) or "max."

objective.in

Numeric vector of coefficients of objective function

const.mat

Matrix of numeric constraint coefficients, one row per constraint, one column per variable (unless transpose.constraints = FALSE; see below).

const.dir

Vector of character strings giving the direction of the constraint: each value should be one of "<," "<=," "=," "==," ">," or ">=". (In each pair the two values are identical.)

const.rhs

Vector of numeric values for the right-hand sides of the constraints.

transpose.constraints

By default each constraint occupies a row of const.mat, and that matrix needs to be transposed before being passed to the optimizing code. For very large constraint matrices it may be wiser to construct the constraints in a matrix column-by-column. In that case set transpose.constraints to FALSE.

int.vec

Numeric vector giving the indices of variables that are required to be integer. The length of this vector will therefore be the number of integer variables.

presolve

Numeric: presolve? Default 0 (no); any non-zero value means "yes." Currently ignored.

compute.sens

Numeric: compute sensitivity? Default 0 (no); any non-zero value means "yes."

binary.vec

Numeric vector like int.vec giving the indices of variables that are required to be binary.

all.int

Logical: should all variables be integer? Default: FALSE.

all.bin

Logical: should all variables be binary? Default: FALSE.

scale

Integer: value for lpSolve scaling. Details can be found in the lpSolve documentation. Set to 0 for no scaling. Default: 196

dense.const

Three column dense constraint array. This is ignored if const.mat is supplied. Otherwise the columns are constraint number, column number, and value; there should be one row for each non-zero entry in the constraint matrix.

num.bin.solns

Integer: if all.bin=TRUE, the user can request up to num.bin.solns optimal solutions to be returned.

use.rw

Logical: if TRUE and num.bin.solns > 1, write the lp out to a file and read it back in for each solution after the first. This is just to defeat a bug somewhere. Although the default is FALSE, we recommend you set this to TRUE if you need num.bin.solns > 1, until the bug is found.

timeout

Integer: timeout variable in seconds, defaults to 0L which means no limit is set.

Details

This function calls the lp_solve 5.5 solver. That system has many options not supported here. The current version is maintained at https://lpsolve.sourceforge.net/5.5/.

Note that every variable is assumed to be >= 0!

Value

An lpSolve::lp.object object.

Plots the result of a clustering strategy stored in a `caps` object

Description

This function creates a visualization of the result of the k-mean alignment algorithm without returning the plot data as an object. The user can choose to visualize either the amplitude information data in which case original and aligned curves are shown or the phase information data in which case the estimated warping functions are shown.

Usage

## S3 method for class 'caps'
plot(x, type = c("amplitude", "phase"), ...)

Arguments

x

An object of class caps.

type

...

Not used.

Examples

plot(sim30_caps, type = "amplitude")
plot(sim30_caps, type = "phase")

Plots results of multiple clustering strategies

Description

This is an S3 method implementation of the graphics::plot() generic for objects of class mcaps to visualize the performances of multiple caps objects applied on the same data sets either in terms of WSS or in terms of silhouette values.

Usage

## S3 method for class 'mcaps'
plot(
  x,
  validation_criterion = c("wss", "silhouette"),
  what = c("mean", "distribution"),
  ...
)

Arguments

x

An object of class mcaps.

validation_criterion

A string specifying the validation criterion to be used for the comparison. Choices are "wss" or "silhouette". Defaults to "wss".

what

...

Other arguments passed to specific methods.

Examples

plot(sim30_mcaps)

A `caps` object from simulated data for examples

Description

An object of class caps storing the result of the fdakmeans() function applied on the data set simulated30 using the affine warping class and the Pearson metric and searching for 2 clusters.

Usage

sim30_caps

Format

An object of class caps.

An `mcaps` object from simulated data for examples

Description

An object of class mcaps storing the result of the compare_caps() function applied on the data set simulated30_sub for comparing the clustering structures found by the fdakmeans() function with mean centroid type used with various classes of warping functions and varying number of clusters.

Usage

sim30_mcaps

Format

An object of class mcaps which is effectively a tibble::tibble with 5 columns and as many rows as there are clustering strategies to compare. The 5 column-variables are:

n_clusters: The number of clusters;
clustering_method: The clustering method;
warping_class: The class of warping functions used for curve alignment;
centroid_type: The type of centroid used to compute a cluster representative;
caps_obj: The result of the corresponding clustering strategy as objects of class caps.

Simulated data for examples

Description

A data set containing 30 simulated uni-dimensional curves.

Usage

simulated30

Format

A list with abscissas x and values y:

x: Matrix 30x200;
y: Array 30x1x200.

Simulated data for examples

Description

A data set containing 30 simulated uni-dimensional curves.

Usage

simulated30_sub

Format

A list with abscissas x and values y:

x: Matrix 30x30;
y: Array 30x1x30.

Simulated data from the CSDA paper

Description

A data set containing 90 simulated uni-dimensional curves.

Usage

simulated90

Format

A list with abscissas x and values y:

x: Vector of size 100;
y: Matrix if size 90x100.

fdacluster: Joint Clustering and Alignment of Functional Data

Description

Author(s)

References

See Also

Visualizes the result of a clustering strategy stored in a caps object with ggplot2

Description

Usage

Arguments

Value

Examples

Visualizes results of multiple clustering strategies using ggplot2

Description

Usage

Arguments

Value

Examples

Class for clustering with amplitude and phase separation

Description

Usage

Arguments

Details

Value

Examples

Generates results of multiple clustering strategies

Description

Usage

Arguments

Value

Examples

Diagnostic plot for the result of a clustering strategy stored in a caps object

Description

Usage

Arguments

Value

Examples

Performs density-based clustering for functional data with amplitude and phase separation

Description

Usage

Arguments

Value

Examples

Computes the distance matrix for functional data with amplitude and phase separation

Description

Usage

Arguments

Value

Examples

Performs hierarchical clustering for functional data with amplitude and phase separation

Description

Usage

Arguments

Details

Value

Examples

Performs k-means clustering for functional data with amplitude and phase separation

Description

Usage

Arguments

Value

Examples

Linear and integer programming

Description

Usage

Arguments

Details

Value

Plots the result of a clustering strategy stored in a caps object

Description

Usage

Arguments

Examples

Plots results of multiple clustering strategies

Description

Usage

Arguments

Examples

A caps object from simulated data for examples

Description

Usage

Visualizes the result of a clustering strategy stored in a `caps` object with ggplot2

Diagnostic plot for the result of a clustering strategy stored in a `caps` object

Plots the result of a clustering strategy stored in a `caps` object

A `caps` object from simulated data for examples

An `mcaps` object from simulated data for examples