This clustering algorithm needs a neighborhood graph on the points, and an estimation of the density at each point. A few possible graph constructions and density estimators are provided for convenience, but it is perfectly natural to provide your own.
Methods
Method new()
The Tomato
constructor.
Arguments
graph_type
A string specifying the method to compute the neighboring graph. Choices are
"knn"
,"radius"
or"manual"
. Defaults to"knn"
.density_type
A string specifying the choice of density estimator. Choicea are
"logDTM"
,"DTM"
,"logKDE"
or"manual"
. When you have many points,"KDE"
and"logKDE"
tend to be slower. Defaults to"logDTM"
n_clusters
An integer value specifying the number of clusters. Defaults to
NULL
, i.e. no merging occurs and we get the maximal number of clusters.merge_threshold
A numeric value specifying the minimum prominence of a cluster so it doesn’t get merged. Defaults to
NULL
, i.e. no merging occurs and we get the maximal number of clusters....
Extra parameters passed to
KNearestNeighbors
andDTMDensity
.
Method fit()
Runs the Tomato algorithm on the provided data.
Arguments
X
Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if
metric == "precomputed"
or a list of neighbors for each point ifgraph_type == "manual"
. The number of points is currently limited to about 2 billion.y
Not used, present here for API consistency with scikit-learn by convention.
weights
A numeric vector specifying a density estimate at each point. Used only if
density_type == "manual"
.
Method fit_predict()
Runs the Tomato algorithm on the provided data and returns the class memberships.
Arguments
X
Either a numeric matrix specifying the coordinates (in column) of each point (in row) or a full distance matrix if
metric == "precomputed"
or a list of neighbors for each point ifgraph_type == "manual"
. The number of points is currently limited to about 2 billion.y
Not used, present here for API consistency with scikit-learn by convention.
weights
A numeric vector specifying a density estimate at each point. Used only if
density_type == "manual"
.
Method set_merge_threshold()
Sets the threshold for merging clusters which automatically adjusts class memberships.
Method plot_diagram()
Computes the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.
Examples
if (FALSE) { # reticulate::py_module_available("gudhi")
X <- seq_circle(100)
cl <- Tomato$new()
cl$fit_predict(X)
cl$set_n_clusters(2)
cl$get_labels()
}