# Performs clustering according to the feature agglomeration algorithm

Source:`R/sklearn-cluster.R`

`FeatureAgglomeration.Rd`

Recursively merges pair of clusters of features. This is a wrapper around the Python class sklearn.cluster.FeatureAgglomeration.

## Super classes

`rgudhi::PythonClass`

-> `rgudhi::SKLearnClass`

-> `rgudhi::BaseClustering`

-> `FeatureAgglomeration`

## Methods

## Inherited methods

### Method `new()`

The FeatureAgglomeration class constructor.

#### Usage

```
FeatureAgglomeration$new(
n_clusters = 2L,
affinity = c("euclidean", "l1", "l2", "manhattan", "cosine", "precomputed"),
memory = NULL,
connectivity = NULL,
compute_full_tree = "auto",
linkage = c("ward", "complete", "average", "single"),
pooling_func = rowMeans,
distance_threshold = NULL,
compute_distances = FALSE
)
```

#### Arguments

`n_clusters`

An integer value specifying the number of clusters to find. Defaults to

`2L`

.`affinity`

A string or an object coercible into a function via

`rlang::as_function()`

specifying the metric used to compute the linkage. If a string, choices are`"euclidean"`

,`"l1"`

,`"l2"`

,`"manhattan"`

,`"cosine"`

or`"precomputed"`

. If linkage is`"ward"`

, only`"euclidean"`

is accepted. Defaults to`"euclidean"`

.`memory`

A string specifying path to the caching directory for storing the computation of the tree. Defaults to

`NULL`

in which case no caching is done.`connectivity`

A numeric matrix or an object coercible into a function via

`rlang::as_function()`

specifying the connectivity matrix. Defines for each feature the neighboring features following a given structure of the data. This can be a connectivity matrix itself or a function that transforms the data into a connectivity matrix, such as derived from sklearn.neighbors.kneighbors_graph(). Defaults to`NULL`

in which case the hierarchical clustering algorithm is unstructured.`compute_full_tree`

The string

`"auto"`

or a boolean value specifying whether to stop early the construction of the tree at`n_clusters`

. This is useful to decrease computation time if the number of clusters is not small compared to the number of features. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be`TRUE`

if`distance_threshold`

is not`NULL`

. Defaults to`"auto"`

, which is equivalent to`TRUE`

when`distance_threshold`

is not`NULL`

or when`n_clusters`

is inferior to`max(100, 0.02 * n_samples)`

and to`FALSE`

otherwise.`linkage`

A string specifying which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion:

`"ward"`

: minimizes the variance of the clusters being merged;`"complete"`

: maximum linkage uses the maximum distances between all features of the two sets;`"average"`

: uses the average of the distances of each feature of the two sets;`"single"`

: uses the minimum of the distances between all features of the two sets.

`pooling_func`

An object coercible into a function via

`rlang::as_function()`

specifying the aggregation method to combine the values of agglomerated features into a single value. It should take as input an array of shape \(M \times N\) and the optional argument`axis = 1`

, and reduce it to an array of shape \(M\). Defaults to base::rowMeans.`distance_threshold`

A numeric value specifying the linkage distance threshold above which clusters will not be merged. If not

`NULL`

,`n_clusters`

must be`NULL`

and`compute_full_tree`

must be`TRUE`

. Defaults to`NULL`

.`compute_distances`

A boolean value specifying whether to compute distances between clusters even if

`distance_threshold`

is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead. Defaults to`FALSE`

.