This is a wrapper around the Python class sklearn.cluster.OPTICS.
Super classes
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> OPTICS
Methods
Method new()
The OPTICS class constructor.
Usage
OPTICS$new(
min_samples = 5L,
max_eps = Inf,
metric = c("minkowski", "cityblock", "cosine", "euclidean", "l1", "l2", "manhattan",
"braycurtis", "canberra", "chebyshev", "correlation", "dice", "hamming", "jaccard",
"kulsinski", "mahalanobis", "rogerstanimoto", "russellrao", "seuclidean",
"sokalmichener", "sokalsneath", "sqeuclidean", "yule"),
p = 2L,
metric_params = NULL,
cluster_method = c("xi", "dbscan"),
eps = NULL,
xi = 0.05,
predecessor_correction = TRUE,
min_cluster_size = NULL,
algorithm = c("auto", "ball_tree", "kd_tree", "brute"),
leaf_size = 30L,
memory = NULL,
n_jobs = 1L
)
Arguments
min_samples
Either an integer value greater than 1 or a numeric value between 0 and 1 specifying the number of samples in a neighborhood for a point to be considered as a core point. Also, up and down steep regions can’t have more than
min_samples
consecutive non-steep points. Expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). Defaults to5L
.max_eps
A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Reducing
max_eps
will result in shorter run times. Defaults toInf
.metric
Either a string or an object coercible into a function via
rlang::as_function()
specifying the metric to use for distance computation. Ifmetric
is a function, it is called on each pair of instances (rows) and the resulting value recorded. The function should take two numeric vectors as input and return one numeric value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. If metric is"precomputed"
,X
is assumed to be a distance matrix and must be square. Valid string values for metric are:from sklearn.metrics:
"cityblock"
,"cosine"
,"euclidean"
,"l1"
,"l2"
,"manhattan"
;from scipy.spatial.distance:
"braycurtis"
,"canberra"
,"chebyshev"
,"correlation"
,"dice"
,"hamming"
,"jaccard"
,"kulsinski"
,"mahalanobis"
,"minkowski"
,"rogerstanimoto"
,"russellrao"
,"seuclidean"
,"sokalmichener"
,"sokalsneath"
,"sqeuclidean"
,"yule"
.
Defaults to
"minkowski"
.p
An integer value specifying the power for the Minkowski metric. When
p = 1
, this is equivalent to using the Manhattan distance (\(\ell_1\)). Whenp = 2
, this is equivalent to using the Euclidean distance (\(\ell_2\)). For arbitrary \(p\), the Minkowski distance (\(\ell_p\)) is used. Defaults to2L
.metric_params
A named list specifying additional arguments for the metric function. Defaults to
NULL
.cluster_method
A string specifying the extraction method used to extract clusters using the calculated reachability and ordering. Possible values are
"xi"
and"dbscan"
. Defaults to"xi"
.eps
A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Defaults to
max_eps
. Used only whencluster_method == "dbscan"
.xi
A numeric value in \([0,1]\) specifying the minimum steepness on the reachability plot that constitutes a cluster boundary. For example, an upwards point in the reachability plot is defined by the ratio from one point to its successor being at most
1 - xi
. Used only whencluster_method == "xi"
. Defaults to0.05
.predecessor_correction
A boolean value specifying whether to correct clusters according to the predecessors calculated by OPTICS (Schubert and Gertz 2018) . This parameter has minimal effect on most data sets. Used only when
cluster_method == "xi"
. Defaults toTRUE
.min_cluster_size
Either an integer value \(> 1\) or a numeric value in \([0,1]\) specifying the minimum number of samples in an OPTICS cluster, expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). If
NULL
, the value ofmin_samples
is used instead. Used only whencluster_method == "xi"
. Defaults toNULL
.algorithm
A string specifying the algorithm used to compute the nearest neighbors. Choices are
c("auto", "ball_tree", "kd_tree", "brute")
. Defaults to"auto"
which will attempt to decide the most appropriate algorithm based on the values passed to fit method. Note: fitting on sparse input will override the setting of this parameter, usingalgorithm == "brute"
.leaf_size
An integer value specifying the leaf size passed to
BallTree
orKDTree
. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to30L
memory
A string specifying the path to the caching directory into which caching the output of the computation of the tree. Defaults to
NULL
in which case no caching is done.n_jobs
An integer value specifying the number of parallel jobs to run for neighbors search. Defaults to
1L
. A value of-1L
means using all processors.