This clustering algorithm needs a neighborhood graph on the points, and an estimation of the density at each point. A few possible graph constructions and density estimators are provided for convenience, but it is perfectly natural to provide your own.
Methods
Method new()
The Tomato constructor.
Arguments
graph_typeA string specifying the method to compute the neighboring graph. Choices are
"knn","radius"or"manual". Defaults to"knn".density_typeA string specifying the choice of density estimator. Choicea are
"logDTM","DTM","logKDE"or"manual". When you have many points,"KDE"and"logKDE"tend to be slower. Defaults to"logDTM"n_clustersAn integer value specifying the number of clusters. Defaults to
NULL, i.e. no merging occurs and we get the maximal number of clusters.merge_thresholdA numeric value specifying the minimum prominence of a cluster so it doesn’t get merged. Defaults to
NULL, i.e. no merging occurs and we get the maximal number of clusters....Extra parameters passed to
KNearestNeighborsandDTMDensity.
Method fit()
Runs the Tomato algorithm on the provided data.
Arguments
XEither a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if
metric == "precomputed"or a list of neighbors for each point ifgraph_type == "manual". The number of points is currently limited to about 2 billion.yNot used, present here for API consistency with scikit-learn by convention.
weightsA numeric vector specifying a density estimate at each point. Used only if
density_type == "manual".
Method fit_predict()
Runs the Tomato algorithm on the provided data and returns the class memberships.
Arguments
XEither a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if
metric == "precomputed"or a list of neighbors for each point ifgraph_type == "manual". The number of points is currently limited to about 2 billion.yNot used, present here for API consistency with scikit-learn by convention.
weightsA numeric vector specifying a density estimate at each point. Used only if
density_type == "manual".
Method set_merge_threshold()
Sets the threshold for merging clusters which automatically adjusts class memberships.
Method plot_diagram()
Computes the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.
Examples
if (FALSE) { # reticulate::py_module_available("gudhi")
X <- seq_circle(100)
cl <- Tomato$new()
cl$fit_predict(X)
cl$set_n_clusters(2)
cl$get_labels()
}