Skip to contents

This clustering algorithm needs a neighborhood graph on the points, and an estimation of the density at each point. A few possible graph constructions and density estimators are provided for convenience, but it is perfectly natural to provide your own.

Author

Marc Glisse

Super class

rgudhi::PythonClass -> Tomato

Methods

Inherited methods


Method new()

The Tomato constructor.

Usage

Tomato$new(
  graph_type = c("knn", "radius", "manual"),
  density_type = c("logDTM", "DTM", "logKDE", "KDE", "manual"),
  n_clusters = NULL,
  merge_threshold = NULL,
  ...
)

Arguments

graph_type

A string specifying the method to compute the neighboring graph. Choices are "knn", "radius" or "manual". Defaults to "knn".

density_type

A string specifying the choice of density estimator. Choicea are "logDTM", "DTM", "logKDE" or "manual". When you have many points, "KDE" and "logKDE" tend to be slower. Defaults to "logDTM"

n_clusters

An integer value specifying the number of clusters. Defaults to NULL, i.e. no merging occurs and we get the maximal number of clusters.

merge_threshold

A numeric value specifying the minimum prominence of a cluster so it doesn’t get merged. Defaults to NULL, i.e. no merging occurs and we get the maximal number of clusters.

...

Extra parameters passed to KNearestNeighbors and DTMDensity.

Returns

An object of class Tomato.


Method fit()

Runs the Tomato algorithm on the provided data.

Usage

Tomato$fit(X, y = NULL, weights = NULL)

Arguments

X

Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if metric == "precomputed" or a list of neighbors for each point if graph_type == "manual". The number of points is currently limited to about 2 billion.

y

Not used, present here for API consistency with scikit-learn by convention.

weights

A numeric vector specifying a density estimate at each point. Used only if density_type == "manual".

Returns

The updated Tomato class itself invisibly.


Method fit_predict()

Runs the Tomato algorithm on the provided data and returns the class memberships.

Usage

Tomato$fit_predict(X, y = NULL, weights = NULL)

Arguments

X

Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if metric == "precomputed" or a list of neighbors for each point if graph_type == "manual". The number of points is currently limited to about 2 billion.

y

Not used, present here for API consistency with scikit-learn by convention.

weights

A numeric vector specifying a density estimate at each point. Used only if density_type == "manual".

Returns

An integer vector storing the class memberships.


Method set_n_clusters()

Sets the number of clusters which automatically adjusts class memberships.

Usage

Tomato$set_n_clusters(n_clusters)

Arguments

n_clusters

An integer value specifying the number of clusters.

Returns

The updated Tomato class itself invisibly.


Method get_n_clusters()

Gets the number of clusters.

Usage

Tomato$get_n_clusters()

Returns

The number of clusters.


Method set_merge_threshold()

Sets the threshold for merging clusters which automatically adjusts class memberships.

Usage

Tomato$set_merge_threshold(merge_threshold)

Arguments

merge_threshold

A numeric value specifying the threshold for merging clusters.

Returns

The updated Tomato class itself invisibly.


Method get_merge_threshold()

Gets the threshold for merging clusters.

Usage

Tomato$get_merge_threshold()

Returns

The threshold for merging clusters.


Method get_labels()

Gets the class memberships.

Usage

Tomato$get_labels()

Returns

An integer vector storing the class memberships.


Method plot_diagram()

Computes the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.

Usage

Tomato$plot_diagram()


Method clone()

The objects of this class are cloneable with this method.

Usage

Tomato$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("gudhi")
X <- seq_circle(100)
cl <- Tomato$new()
cl$fit_predict(X)
cl$set_n_clusters(2)
cl$get_labels()
}