Performs clustering according to the spectral clustering algorithm
Source:R/sklearn-cluster.R
SpectralClustering.Rd
This is a wrapper around the Python class sklearn.cluster.SpectralClustering.
Super classes
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> SpectralClustering
Methods
Method new()
The SpectralClustering class constructor.
Usage
SpectralClustering$new(
n_clusters = 8L,
eigen_solver = c("arpack", "lobpcg", "amg"),
n_components = NULL,
random_state = NULL,
n_init = 10L,
gamma = 1,
affinity = c("rbf", "nearest_neighbors", "precomputed",
"precomputed_nearest_neighbors"),
n_neighbors = 10L,
eigen_tol = "auto",
assign_labels = c("kmeans", "discretize", "cluster_qr"),
degree = 3L,
coef0 = 1,
kernel_params = NULL,
n_jobs = 1L,
verbose = FALSE
)
Arguments
n_clusters
An integer value specifying the dimension of the projection subspace. Defaults to
8L
.eigen_solver
A string specifying the eigenvalue decomposition strategy to use. Choices are
c("arpack", "lobpcg", "amg")
. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities. Defaults to"arpack"
.n_components
An integer value specifying the number of eigenvectors to use for the spectral embedding. Defaults to
NULL
, in which case,n_clusters
is used.random_state
An integer value specifying a pseudo random number generator used for the initialization of the
lobpcg
eigenvectors decomposition wheneigen_solver == "amg"
, and for the k-means initialization. Defaults toNULL
which uses clock time.n_init
An integer value specifying the number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of
n_init
consecutive runs in terms of inertia. Only used ifassign_labels == "kmeans"
. Defaults to10L
.gamma
A numeric value specifying the kernel coefficient for
rbf
,poly
,sigmoid
,laplacian
andchi2
kernels. Ignored foraffinity == "nearest_neighbors"
. Defaults to1.0
.affinity
Either a string or an object coercible to a function via
rlang::as_function()
specifying how to construct the affinity matrix:"nearest_neighbors"
: construct the affinity matrix by computing a graph of nearest neighbors;"rbf"
: construct the affinity matrix using a radial basis function (RBF) kernel;"precomputed"
: interpretX
as a precomputed affinity matrix, where larger values indicate greater similarity between instances;"precomputed_nearest_neighbors"
: interpretX
as a sparse graph of precomputed distances, and construct a binary affinity matrix from then_neighbors
nearest neighbors of each instance;one of the kernels supported by pairwise_kernels.
Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.
Defaults to
"rbf"
.n_neighbors
An integer value specifying the number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for
affinity == "rbf"
. Defaults to10L
.eigen_tol
A numeric value specifying the stopping criterion for the eigen-decomposition of the Laplacian matrix. If
eigen_tol == "auto"
, then the passed tolerance will depend on theeigen_solver
:If
eigen_solver == "arpack"
, theneigen_tol = 0.0
;If
eigen_solver == "lobpcg"
oreigen_solver == "amg"
, theneigen_tol == NULL
which configures the underlyinglobpcg
solver to automatically resolve the value according to their heuristics.
Note that when using
eigen_solver == "lobpcg"
oreigen_solver == "amg"
values oftol < 1e-5
may lead to convergence issues and should be avoided.Defaults to
"auto"
.assign_labels
A string specifying the strategy for assigning labels in the embedding space. There are two ways to assign labels after the Laplacian embedding. k-means is a popular choice (
"kmeans"
), but it can be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization ("discretize"
). Thecluster_qr
method directly extract clusters from eigenvectors in spectral clustering. In contrast to k-means and discretization,cluster_qr
has no tuning parameters and runs no iterations, yet may outperform k-means and discretization in terms of both quality and speed. Defaults to"kmeans"
.degree
An integer value specifying the degree of the polynomial kernel. Ignored by other kernels. Defaults to
3L
.coef0
A numeric value specifying the value of the zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels. Defaults to
1.0
.kernel_params
A named list specifying extra arguments to the kernels passed as functions. Ignored by other kernels. Defaults to
NULL
.n_jobs
An integer value specifying the number of parallel jobs to run for neighbors search. Defaults to
1L
. A value of-1L
means using all processors.verbose
A boolean value specifying the verbosity mode. Defaults to
FALSE
.