Clustering: Tomato

This clustering algorithm needs a neighborhood graph on the points, and an estimation of the density at each point. A few possible graph constructions and density estimators are provided for convenience, but it is perfectly natural to provide your own.

Author

Marc Glisse

Super class

rgudhi::PythonClass -> Tomato

Methods

Inherited methods

Method `new()`

The Tomato constructor.

Usage

Tomato$new(
  graph_type = c("knn", "radius", "manual"),
  density_type = c("logDTM", "DTM", "logKDE", "KDE", "manual"),
  n_clusters = NULL,
  merge_threshold = NULL,
  ...
)

Arguments

graph_type: A string specifying the method to compute the neighboring graph. Choices are "knn", "radius" or "manual". Defaults to "knn".
density_type: A string specifying the choice of density estimator. Choicea are "logDTM", "DTM", "logKDE" or "manual". When you have many points, "KDE" and "logKDE" tend to be slower. Defaults to "logDTM"
n_clusters: An integer value specifying the number of clusters. Defaults to NULL, i.e. no merging occurs and we get the maximal number of clusters.
merge_threshold: A numeric value specifying the minimum prominence of a cluster so it doesn’t get merged. Defaults to NULL, i.e. no merging occurs and we get the maximal number of clusters.
...: Extra parameters passed to KNearestNeighbors and DTMDensity.

Returns

An object of class Tomato.

Method `fit()`

Runs the Tomato algorithm on the provided data.

Usage

Tomato$fit(X, y = NULL, weights = NULL)

Arguments

X: Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if metric == "precomputed" or a list of neighbors for each point if graph_type == "manual". The number of points is currently limited to about 2 billion.
y: Not used, present here for API consistency with scikit-learn by convention.
weights: A numeric vector specifying a density estimate at each point. Used only if density_type == "manual".

Returns

The updated Tomato class itself invisibly.

Method `fit_predict()`

Runs the Tomato algorithm on the provided data and returns the class memberships.

Usage

Tomato$fit_predict(X, y = NULL, weights = NULL)

Arguments

X: Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if metric == "precomputed" or a list of neighbors for each point if graph_type == "manual". The number of points is currently limited to about 2 billion.
y: Not used, present here for API consistency with scikit-learn by convention.
weights: A numeric vector specifying a density estimate at each point. Used only if density_type == "manual".

Returns

An integer vector storing the class memberships.

Method `set_n_clusters()`

Sets the number of clusters which automatically adjusts class memberships.

Usage

Tomato$set_n_clusters(n_clusters)

Arguments

n_clusters: An integer value specifying the number of clusters.

Returns

The updated Tomato class itself invisibly.

Method `get_n_clusters()`

Gets the number of clusters.

Usage

Tomato$get_n_clusters()

Returns

The number of clusters.

Method `set_merge_threshold()`

Sets the threshold for merging clusters which automatically adjusts class memberships.

Usage

Tomato$set_merge_threshold(merge_threshold)

Arguments

merge_threshold: A numeric value specifying the threshold for merging clusters.

Returns

The updated Tomato class itself invisibly.

Method `get_merge_threshold()`

Gets the threshold for merging clusters.

Usage

Tomato$get_merge_threshold()

Returns

The threshold for merging clusters.

Method `get_labels()`

Gets the class memberships.

Usage

Tomato$get_labels()

Returns

An integer vector storing the class memberships.

Method `plot_diagram()`

Computes the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.

Usage

Tomato$plot_diagram()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Tomato$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("gudhi")
X <- seq_circle(100)
cl <- Tomato$new()
cl$fit_predict(X)
cl$set_n_clusters(2)
cl$get_labels()
}

Author

Super class

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method fit()

Usage

Arguments

Returns

Method fit_predict()

Usage

Arguments

Returns

Method set_n_clusters()

Usage

Arguments

Returns

Method get_n_clusters()

Usage

Returns

Method set_merge_threshold()

Usage

Arguments

Returns

Method get_merge_threshold()

Usage

Returns

Method get_labels()

Usage

Returns

Method plot_diagram()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `fit()`

Method `fit_predict()`

Method `set_n_clusters()`

Method `get_n_clusters()`

Method `set_merge_threshold()`

Method `get_merge_threshold()`

Method `get_labels()`

Method `plot_diagram()`

Method `clone()`