This is a wrapper around the Python class sklearn.cluster.OPTICS.
Super classes
rgudhi::PythonClass -> rgudhi::SKLearnClass -> rgudhi::BaseClustering -> OPTICS
Methods
Method new()
The OPTICS class constructor.
Usage
OPTICS$new(
min_samples = 5L,
max_eps = Inf,
metric = c("minkowski", "cityblock", "cosine", "euclidean", "l1", "l2", "manhattan",
"braycurtis", "canberra", "chebyshev", "correlation", "dice", "hamming", "jaccard",
"kulsinski", "mahalanobis", "rogerstanimoto", "russellrao", "seuclidean",
"sokalmichener", "sokalsneath", "sqeuclidean", "yule"),
p = 2L,
metric_params = NULL,
cluster_method = c("xi", "dbscan"),
eps = NULL,
xi = 0.05,
predecessor_correction = TRUE,
min_cluster_size = NULL,
algorithm = c("auto", "ball_tree", "kd_tree", "brute"),
leaf_size = 30L,
memory = NULL,
n_jobs = 1L
)Arguments
min_samplesEither an integer value greater than 1 or a numeric value between 0 and 1 specifying the number of samples in a neighborhood for a point to be considered as a core point. Also, up and down steep regions can’t have more than
min_samplesconsecutive non-steep points. Expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). Defaults to5L.max_epsA numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Reducing
max_epswill result in shorter run times. Defaults toInf.metricEither a string or an object coercible into a function via
rlang::as_function()specifying the metric to use for distance computation. Ifmetricis a function, it is called on each pair of instances (rows) and the resulting value recorded. The function should take two numeric vectors as input and return one numeric value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. If metric is"precomputed",Xis assumed to be a distance matrix and must be square. Valid string values for metric are:from sklearn.metrics:
"cityblock","cosine","euclidean","l1","l2","manhattan";from scipy.spatial.distance:
"braycurtis","canberra","chebyshev","correlation","dice","hamming","jaccard","kulsinski","mahalanobis","minkowski","rogerstanimoto","russellrao","seuclidean","sokalmichener","sokalsneath","sqeuclidean","yule".
Defaults to
"minkowski".pAn integer value specifying the power for the Minkowski metric. When
p = 1, this is equivalent to using the Manhattan distance (\(\ell_1\)). Whenp = 2, this is equivalent to using the Euclidean distance (\(\ell_2\)). For arbitrary \(p\), the Minkowski distance (\(\ell_p\)) is used. Defaults to2L.metric_paramsA named list specifying additional arguments for the metric function. Defaults to
NULL.cluster_methodA string specifying the extraction method used to extract clusters using the calculated reachability and ordering. Possible values are
"xi"and"dbscan". Defaults to"xi".epsA numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Defaults to
max_eps. Used only whencluster_method == "dbscan".xiA numeric value in \([0,1]\) specifying the minimum steepness on the reachability plot that constitutes a cluster boundary. For example, an upwards point in the reachability plot is defined by the ratio from one point to its successor being at most
1 - xi. Used only whencluster_method == "xi". Defaults to0.05.predecessor_correctionA boolean value specifying whether to correct clusters according to the predecessors calculated by OPTICS (Schubert and Gertz 2018) . This parameter has minimal effect on most data sets. Used only when
cluster_method == "xi". Defaults toTRUE.min_cluster_sizeEither an integer value \(> 1\) or a numeric value in \([0,1]\) specifying the minimum number of samples in an OPTICS cluster, expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). If
NULL, the value ofmin_samplesis used instead. Used only whencluster_method == "xi". Defaults toNULL.algorithmA string specifying the algorithm used to compute the nearest neighbors. Choices are
c("auto", "ball_tree", "kd_tree", "brute"). Defaults to"auto"which will attempt to decide the most appropriate algorithm based on the values passed to fit method. Note: fitting on sparse input will override the setting of this parameter, usingalgorithm == "brute".leaf_sizeAn integer value specifying the leaf size passed to
BallTreeorKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to30LmemoryA string specifying the path to the caching directory into which caching the output of the computation of the tree. Defaults to
NULLin which case no caching is done.n_jobsAn integer value specifying the number of parallel jobs to run for neighbors search. Defaults to
1L. A value of-1Lmeans using all processors.