DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. This is a wrapper around the Python class sklearn.cluster.DBSCAN.
References
Ester, M., H. P. Kriegel, J. Sander, and X. Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231.
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Transactions on Database Systems (TODS), 42(3), p. 19.
Super classes
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> DBSCAN
Methods
Method new()
The DBSCAN class constructor.
Arguments
eps
A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. Defaults to
0.5
.min_samples
An integer value specifying the number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. Defaults to
5L
.metric
Either a string or an object coercible into a function via
rlang::as_function()
specifying the metric to use when calculating distance between instances in a feature array. Ifmetric
is a string, it must be one of the options allowed by sklearn.metrics.pairwise_distances for itsmetric
parameter. Ifmetric
is"precomputed"
,X
is assumed to be a distance matrix and must be square.X
may be a sparse graph, in which case only nonzero elements may be considered neighbors for DBSCAN. Defaults to"euclidean"
.metric_params
A named list specifying additional parameters to be passed on to the metric function. Defaults to
NULL
.algorithm
A string specifying the algorithm to be used by the sklearn.neighbors.NearestNeighbors module to compute pointwise distances and find nearest neighbors. Choices are
"auto"
,"ball_tree"
,"kd_tree"
or"brute"
. Defaults to"auto"
.leaf_size
An integer value specifying the leaf size passed to sklearn.neighbors.BallTree or sklearn.neighbors.KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to
30L
.p
An integer value specifying the power of the Minkowski metric to be used to calculate distance between points. Defaults to
2L
.n_jobs
An integer value specifying the number of parallel jobs to run. Defaults to
1L
.