Performs clustering according to the DBSCAN algorithm

DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. This is a wrapper around the Python class sklearn.cluster.DBSCAN.

References

Ester, M., H. P. Kriegel, J. Sander, and X. Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231.
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Transactions on Database Systems (TODS), 42(3), p. 19.

Super classes

rgudhi::PythonClass -> rgudhi::SKLearnClass -> rgudhi::BaseClustering -> DBSCAN

Methods

Inherited methods

Method `new()`

The DBSCAN class constructor.

Usage

DBSCAN$new(
  eps = 0.5,
  min_samples = 5L,
  metric = "euclidean",
  metric_params = NULL,
  algorithm = c("auto", "ball_tree", "kd_tree", "brute"),
  leaf_size = 30L,
  p = 2L,
  n_jobs = 1L
)

Arguments

eps: A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. Defaults to 0.5.
min_samples: An integer value specifying the number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. Defaults to 5L.
metric: Either a string or an object coercible into a function via rlang::as_function() specifying the metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is "precomputed", X is assumed to be a distance matrix and must be square. X may be a sparse graph, in which case only nonzero elements may be considered neighbors for DBSCAN. Defaults to "euclidean".
metric_params: A named list specifying additional parameters to be passed on to the metric function. Defaults to NULL.
algorithm: A string specifying the algorithm to be used by the sklearn.neighbors.NearestNeighbors module to compute pointwise distances and find nearest neighbors. Choices are "auto", "ball_tree", "kd_tree" or "brute". Defaults to "auto".
leaf_size: An integer value specifying the leaf size passed to sklearn.neighbors.BallTree or sklearn.neighbors.KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30L.
p: An integer value specifying the power of the Minkowski metric to be used to calculate distance between points. Defaults to 2L.
n_jobs: An integer value specifying the number of parallel jobs to run. Defaults to 1L.

Returns

An object of class DBSCAN.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

DBSCAN$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("sklearn.cluster")
cl <- DBSCAN$new()
}

Performs clustering according to the DBSCAN algorithm

References

Super classes

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `clone()`