# Performs clustering according to the mini-batch k-means algorithm

Source:`R/sklearn-cluster.R`

`MiniBatchKMeans.Rd`

This is a wrapper around the Python class sklearn.cluster.MiniBatchKMeans.

## Super classes

`rgudhi::PythonClass`

-> `rgudhi::SKLearnClass`

-> `rgudhi::BaseClustering`

-> `MiniBatchKMeans`

## Methods

## Inherited methods

### Method `new()`

The MiniBatchKMeans class constructor.

#### Usage

```
MiniBatchKMeans$new(
n_clusters = 2L,
init = c("k-means++", "random"),
n_init = 10L,
max_iter = 300L,
tol = 1e-04,
verbose = 0L,
random_state = NULL,
batch_size = 1024L,
compute_labels = TRUE,
max_no_improvement = 10L,
init_size = NULL,
reassignment_ratio = 0.01
)
```

#### Arguments

`n_clusters`

An integer value specifying the number of clusters to form as well as the number of centroids to generate. Defaults to

`2L`

.`init`

Either a string or a numeric matrix of shape \(\mathrm{n_clusters} \times \mathrm{n_features}\) specifying the method for initialization. If a string, choices are:

`"k-means++"`

: selects initial cluster centroids using sampling based on an empirical probability distribution of the points’ contribution to the overall inertia. This technique speeds up convergence, and is theoretically proven to be \(\mathcal{O}(\log(k))\)-optimal. See the description of`n_init`

for more details;`"random"`

: chooses`n_clusters`

observations (rows) at random from data for the initial centroids. Defaults to`"k-means++"`

.

`n_init`

An integer value specifying the number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of

`n_init`

consecutive runs in terms of inertia. Defaults to`10L`

.`max_iter`

An integer value specifying the maximum number of iterations of the k-means algorithm for a single run. Defaults to

`300L`

.`tol`

A numeric value specifying the relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. Defaults to

`1e-4`

.`verbose`

An integer value specifying the level of verbosity. Defaults to

`0L`

which is equivalent to no verbose.`random_state`

An integer value specifying the initial seed of the random number generator. Defaults to

`NULL`

which uses the current timestamp.`batch_size`

An integer value specifying the size of the mini-batches. For faster computations, you can set the

`batch_size`

greater than 256 * number of cores to enable parallelism on all cores. Defaults to`1024L`

.`compute_labels`

A boolean value specifying whether to compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit. Defaults to

`TRUE`

.`max_no_improvement`

An integer value specifying how many consecutive mini batches that does not yield an improvement on the smoothed inertia should be used to call off the algorithm. To disable convergence detection based on inertia, set

`max_no_improvement`

to`NULL`

. Defaults to`10L`

.`init_size`

An integer value specifying the number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than

`n_clusters`

. If`NULL`

, the heuristic is`init_size = 3 * batch_size`

if`3 * batch_size < n_clusters`

, else`init_size = 3 * n_clusters`

. Defaults to`NULL`

.`reassignment_ratio`

A numeric value specifying the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering. However, too high a value may cause convergence issues, especially with a small batch size. Defaults to

`0.01`

.