Skip to contents

DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. This is a wrapper around the Python class sklearn.cluster.DBSCAN.

References

  • Ester, M., H. P. Kriegel, J. Sander, and X. Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231.

  • Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Transactions on Database Systems (TODS), 42(3), p. 19.

Super classes

rgudhi::PythonClass -> rgudhi::SKLearnClass -> rgudhi::BaseClustering -> DBSCAN

Methods

Public methods

Inherited methods


Method new()

The DBSCAN class constructor.

Usage

DBSCAN$new(
  eps = 0.5,
  min_samples = 5L,
  metric = "euclidean",
  metric_params = NULL,
  algorithm = c("auto", "ball_tree", "kd_tree", "brute"),
  leaf_size = 30L,
  p = 2L,
  n_jobs = 1L
)

Arguments

eps

A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. Defaults to 0.5.

min_samples

An integer value specifying the number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. Defaults to 5L.

metric

Either a string or an object coercible into a function via rlang::as_function() specifying the metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is "precomputed", X is assumed to be a distance matrix and must be square. X may be a sparse graph, in which case only nonzero elements may be considered neighbors for DBSCAN. Defaults to "euclidean".

metric_params

A named list specifying additional parameters to be passed on to the metric function. Defaults to NULL.

algorithm

A string specifying the algorithm to be used by the sklearn.neighbors.NearestNeighbors module to compute pointwise distances and find nearest neighbors. Choices are "auto", "ball_tree", "kd_tree" or "brute". Defaults to "auto".

leaf_size

An integer value specifying the leaf size passed to sklearn.neighbors.BallTree or sklearn.neighbors.KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30L.

p

An integer value specifying the power of the Minkowski metric to be used to calculate distance between points. Defaults to 2L.

n_jobs

An integer value specifying the number of parallel jobs to run. Defaults to 1L.

Returns

An object of class DBSCAN.


Method clone()

The objects of this class are cloneable with this method.

Usage

DBSCAN$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("sklearn.cluster")
cl <- DBSCAN$new()
}