# Performs clustering according to the agglomerative algorithm

Source:`R/sklearn-cluster.R`

`AgglomerativeClustering.Rd`

Recursively merges pair of clusters of sample data; uses linkage distance. This is a wrapper around the Python class sklearn.cluster.AgglomerativeClustering.

## Super classes

`rgudhi::PythonClass`

-> `rgudhi::SKLearnClass`

-> `rgudhi::BaseClustering`

-> `AgglomerativeClustering`

## Methods

## Inherited methods

### Method `new()`

The AgglomerativeClustering class constructor.

#### Usage

```
AgglomerativeClustering$new(
n_clusters = 2L,
affinity = c("euclidean", "l1", "l2", "manhattan", "cosine", "precomputed"),
memory = NULL,
connectivity = NULL,
compute_full_tree = "auto",
linkage = c("ward", "complete", "average", "single"),
distance_threshold = NULL,
compute_distances = FALSE
)
```

#### Arguments

`n_clusters`

An integer value specifying the number of clusters to find. It must be

`NULL`

if`distance_threshold`

is not`NULL`

. Defaults to`2L`

.`affinity`

A string specifying the metric used to compute the linkage. Can be

`"euclidean"`

,`"l1"`

,`"l2"`

,`"manhattan"`

,`"cosine"`

or`"precomputed"`

. If`linkage`

is`"ward"`

, only`"euclidean"`

is accepted. If`"precomputed"`

, a distance matrix (instead of a similarity matrix) is needed as input for the`$fit()`

method. Defaults to`"euclidean"`

.`memory`

A string specifying the path to the caching directory. Defaults to

`NULL`

in which case no caching is done.`connectivity`

Either a numeric matrix or an object of class stats::dist or an object coercible into a function by

`rlang::as_function()`

specifying for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a function that transforms the data into a connectivity matrix. Defaults to`NULL`

, i.e., the hierarchical clustering algorithm is unstructured.`compute_full_tree`

Either a boolean value or the

`"auto"`

string specifying whether to prematurely stop the construction of the tree at`n_clusters`

. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be`TRUE`

if`distance_threshold`

is not`NULL`

. Defaults to`"auto"`

, which is equivalent to`TRUE`

when`distance_threshold`

is not`NULL`

or that`n_clusters`

is inferior to the maximum between`100`

and`0.02 * n_samples`

. Otherwise,`"auto"`

is equivalent to`FALSE`

.`linkage`

A string specifying which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.

`ward`

: minimizes the variance of the clusters being merged;`average`

: uses the average of the distances of each observation of the two sets;`complete`

: uses the maximum of the distances between all observations of the two sets.`single`

: uses the minimum of the distances between all observations of the two sets. Defaults to`"ward"`

.

`distance_threshold`

A numeric value specifying the linkage distance threshold above which clusters will not be merged. If not

`NULL`

,`n_clusters`

must be`NULL`

and`compute_full_tree`

must be`TRUE`

. Defaults to`NULL`

.`compute_distances`

A boolean value specifying whether to compute distances between clusters even if

`distance_threshold`

is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead. Defaults to`FALSE`

.