Skip to contents

This is a wrapper around the Python class sklearn.cluster.SpectralClustering.

Super classes

rgudhi::PythonClass -> rgudhi::SKLearnClass -> rgudhi::BaseClustering -> SpectralClustering

Methods

Inherited methods


Method new()

The SpectralClustering class constructor.

Usage

SpectralClustering$new(
  n_clusters = 8L,
  eigen_solver = c("arpack", "lobpcg", "amg"),
  n_components = NULL,
  random_state = NULL,
  n_init = 10L,
  gamma = 1,
  affinity = c("rbf", "nearest_neighbors", "precomputed",
    "precomputed_nearest_neighbors"),
  n_neighbors = 10L,
  eigen_tol = "auto",
  assign_labels = c("kmeans", "discretize", "cluster_qr"),
  degree = 3L,
  coef0 = 1,
  kernel_params = NULL,
  n_jobs = 1L,
  verbose = FALSE
)

Arguments

n_clusters

An integer value specifying the dimension of the projection subspace. Defaults to 8L.

eigen_solver

A string specifying the eigenvalue decomposition strategy to use. Choices are c("arpack", "lobpcg", "amg"). AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities. Defaults to "arpack".

n_components

An integer value specifying the number of eigenvectors to use for the spectral embedding. Defaults to NULL, in which case, n_clusters is used.

random_state

An integer value specifying a pseudo random number generator used for the initialization of the lobpcg eigenvectors decomposition when eigen_solver == "amg", and for the k-means initialization. Defaults to NULL which uses clock time.

n_init

An integer value specifying the number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used if assign_labels == "kmeans". Defaults to 10L.

gamma

A numeric value specifying the kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for affinity == "nearest_neighbors". Defaults to 1.0.

affinity

Either a string or an object coercible to a function via rlang::as_function() specifying how to construct the affinity matrix:

  • "nearest_neighbors": construct the affinity matrix by computing a graph of nearest neighbors;

  • "rbf": construct the affinity matrix using a radial basis function (RBF) kernel;

  • "precomputed": interpret X as a precomputed affinity matrix, where larger values indicate greater similarity between instances;

  • "precomputed_nearest_neighbors": interpret X as a sparse graph of precomputed distances, and construct a binary affinity matrix from the n_neighbors nearest neighbors of each instance;

  • one of the kernels supported by pairwise_kernels.

Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

Defaults to "rbf".

n_neighbors

An integer value specifying the number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for affinity == "rbf". Defaults to 10L.

eigen_tol

A numeric value specifying the stopping criterion for the eigen-decomposition of the Laplacian matrix. If eigen_tol == "auto", then the passed tolerance will depend on the eigen_solver:

  • If eigen_solver == "arpack", then eigen_tol = 0.0;

  • If eigen_solver == "lobpcg" or eigen_solver == "amg", then eigen_tol == NULL which configures the underlying lobpcg solver to automatically resolve the value according to their heuristics.

Note that when using eigen_solver == "lobpcg" or eigen_solver == "amg" values of tol < 1e-5 may lead to convergence issues and should be avoided.

Defaults to "auto".

assign_labels

A string specifying the strategy for assigning labels in the embedding space. There are two ways to assign labels after the Laplacian embedding. k-means is a popular choice ("kmeans"), but it can be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization ("discretize"). The cluster_qr method directly extract clusters from eigenvectors in spectral clustering. In contrast to k-means and discretization, cluster_qr has no tuning parameters and runs no iterations, yet may outperform k-means and discretization in terms of both quality and speed. Defaults to "kmeans".

degree

An integer value specifying the degree of the polynomial kernel. Ignored by other kernels. Defaults to 3L.

coef0

A numeric value specifying the value of the zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels. Defaults to 1.0.

kernel_params

A named list specifying extra arguments to the kernels passed as functions. Ignored by other kernels. Defaults to NULL.

n_jobs

An integer value specifying the number of parallel jobs to run for neighbors search. Defaults to 1L. A value of -1L means using all processors.

verbose

A boolean value specifying the verbosity mode. Defaults to FALSE.

Returns

An object of class SpectralClustering.


Method clone()

The objects of this class are cloneable with this method.

Usage

SpectralClustering$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

if (FALSE) { # reticulate::py_module_available("sklearn.cluster")
cl <- SpectralClustering$new()
}