Skip to contents

This is a wrapper around the Python class sklearn.cluster.OPTICS.

Super classes

rgudhi::PythonClass -> rgudhi::SKLearnClass -> rgudhi::BaseClustering -> OPTICS

Methods

Public methods

Inherited methods


Method new()

The OPTICS class constructor.

Usage

OPTICS$new(
  min_samples = 5L,
  max_eps = Inf,
  metric = c("minkowski", "cityblock", "cosine", "euclidean", "l1", "l2", "manhattan",
    "braycurtis", "canberra", "chebyshev", "correlation", "dice", "hamming", "jaccard",
    "kulsinski", "mahalanobis", "rogerstanimoto", "russellrao", "seuclidean",
    "sokalmichener", "sokalsneath", "sqeuclidean", "yule"),
  p = 2L,
  metric_params = NULL,
  cluster_method = c("xi", "dbscan"),
  eps = NULL,
  xi = 0.05,
  predecessor_correction = TRUE,
  min_cluster_size = NULL,
  algorithm = c("auto", "ball_tree", "kd_tree", "brute"),
  leaf_size = 30L,
  memory = NULL,
  n_jobs = 1L
)

Arguments

min_samples

Either an integer value greater than 1 or a numeric value between 0 and 1 specifying the number of samples in a neighborhood for a point to be considered as a core point. Also, up and down steep regions can’t have more than min_samples consecutive non-steep points. Expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). Defaults to 5L.

max_eps

A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Reducing max_eps will result in shorter run times. Defaults to Inf.

metric

Either a string or an object coercible into a function via rlang::as_function() specifying the metric to use for distance computation. If metric is a function, it is called on each pair of instances (rows) and the resulting value recorded. The function should take two numeric vectors as input and return one numeric value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. If metric is "precomputed", X is assumed to be a distance matrix and must be square. Valid string values for metric are:

  • from sklearn.metrics: "cityblock", "cosine", "euclidean", "l1", "l2", "manhattan";

  • from scipy.spatial.distance: "braycurtis", "canberra", "chebyshev", "correlation", "dice", "hamming", "jaccard", "kulsinski", "mahalanobis", "minkowski", "rogerstanimoto", "russellrao", "seuclidean", "sokalmichener", "sokalsneath", "sqeuclidean", "yule".

Defaults to "minkowski".

p

An integer value specifying the power for the Minkowski metric. When p = 1, this is equivalent to using the Manhattan distance (\(\ell_1\)). When p = 2, this is equivalent to using the Euclidean distance (\(\ell_2\)). For arbitrary \(p\), the Minkowski distance (\(\ell_p\)) is used. Defaults to 2L.

metric_params

A named list specifying additional arguments for the metric function. Defaults to NULL.

cluster_method

A string specifying the extraction method used to extract clusters using the calculated reachability and ordering. Possible values are "xi" and "dbscan". Defaults to "xi".

eps

A numeric value specifying the maximum distance between two samples for one to be considered as in the neighborhood of the other. Defaults to max_eps. Used only when cluster_method == "dbscan".

xi

A numeric value in \([0,1]\) specifying the minimum steepness on the reachability plot that constitutes a cluster boundary. For example, an upwards point in the reachability plot is defined by the ratio from one point to its successor being at most 1 - xi. Used only when cluster_method == "xi". Defaults to 0.05.

predecessor_correction

A boolean value specifying whether to correct clusters according to the predecessors calculated by OPTICS (Schubert and Gertz 2018) . This parameter has minimal effect on most data sets. Used only when cluster_method == "xi". Defaults to TRUE.

min_cluster_size

Either an integer value \(> 1\) or a numeric value in \([0,1]\) specifying the minimum number of samples in an OPTICS cluster, expressed as an absolute number or a fraction of the number of samples (rounded to be at least 2). If NULL, the value of min_samples is used instead. Used only when cluster_method == "xi". Defaults to NULL.

algorithm

A string specifying the algorithm used to compute the nearest neighbors. Choices are c("auto", "ball_tree", "kd_tree", "brute"). Defaults to "auto" which will attempt to decide the most appropriate algorithm based on the values passed to fit method. Note: fitting on sparse input will override the setting of this parameter, using algorithm == "brute".

leaf_size

An integer value specifying the leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30L

memory

A string specifying the path to the caching directory into which caching the output of the computation of the tree. Defaults to NULL in which case no caching is done.

n_jobs

An integer value specifying the number of parallel jobs to run for neighbors search. Defaults to 1L. A value of -1L means using all processors.

Returns

An object of class OPTICS.

References

Schubert E, Gertz M (2018). “Improving the cluster structure extracted from optics plots.” In LWDA.


Method clone()

The objects of this class are cloneable with this method.

Usage

OPTICS$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("sklearn.cluster") && getRversion() >=      "4.2"
cl <- OPTICS$new()
}