This function massages the input quaternion time series to feed them into the k-means alignment algorithm for jointly clustering and aligning the input QTS.
Usage
kmeans(x, n_clusters, ...)
# S3 method for default
kmeans(
x,
n_clusters = 1,
iter_max = 10,
nstart = 1,
algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"),
trace = FALSE,
...
)
# S3 method for qts_sample
kmeans(
x,
n_clusters = 1L,
seeds = NULL,
seeding_strategy = c("kmeans++", "exhaustive-kmeans++", "exhaustive", "hclust"),
warping_class = c("affine", "dilation", "none", "shift", "srsf"),
centroid_type = "mean",
metric = c("l2", "pearson"),
cluster_on_phase = FALSE,
use_fence = FALSE,
...
)
Arguments
- x
Either a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns) or an object of class qts_sample.
- n_clusters
An integer value specifying the number of clusters to be look for.
- ...
not used.
- iter_max
An integer value specifying the maximum number of iterations for terminating the k-mean algorithm. Defaults to
10L
.- nstart
if
centers
is a number, how many random sets should be chosen?- algorithm
character: may be abbreviated. Note that
"Lloyd"
and"Forgy"
are alternative names for one algorithm.- trace
logical or integer number, currently only used in the default method (
"Hartigan-Wong"
): if positive (or true), tracing information on the progress of the algorithm is produced. Higher values may produce more tracing information.- seeds
An integer value or vector specifying the indices of the initial centroids. If an integer vector, it is interpreted as the indices of the intial centroids and should therefore be of length
n_clusters
. If an integer value, it is interpreted as the index of the first initial centroid and subsequent centroids are chosen according to the k-means++ strategy. It can beNULL
in which case the argumentseeding_strategy
is used to automatically provide suitable indices. Defaults toNULL
.- seeding_strategy
A character string specifying the strategy for choosing the initial centroids in case the argument
seeds
is set toNULL
. Choices are"kmeans++"
,"exhaustive-kmeans++"
which performs an exhaustive search over the choice of the first centroid,"exhaustive"
which tries on all combinations of initial centroids or"hclust"
which first performs hierarchical clustering using Ward's linkage criterion to identify initial centroids. Defaults to"kmeans++"
, which is the fastest strategy.- warping_class
A string specifying the warping class Choices are
"affine"
,"dilation"
,"none"
,"shift"
or"srsf"
. Defaults to"affine"
. The SRSF class is the only class which is boundary-preserving.- centroid_type
A string specifying the type of centroid to compute. Choices are
"mean"
,"median"
"medoid"
,"lowess"
or"poly"
. Defaults to"mean"
. If LOWESS appproximation is chosen, the user can append an integer between 0 and 100 as in"lowess20"
. This number will be used as the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. The default value is 10%. If polynomial approximation is chosen, the user can append an positive integer as in"poly3"
. This number will be used as the degree of the polynomial model. The default value is4L
.- metric
A string specifying the metric used to compare curves. Choices are
"l2"
or"pearson"
. Defaults to"l2"
. Used only whenwarping_class != "srsf"
. For the boundary-preserving warping class, the L2 distance between the SRSFs of the original curves is used.- cluster_on_phase
A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to
FALSE
which implies amplitude variation.- use_fence
A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to
FALSE
. This is used only whenwarping_class != "srsf"
.
Value
An object of class stats::kmeans
or stats::hclust
or
dbscan_fast
if the input x
is NOT of class qts_sample
. Otherwise,
an object of class qtsclust
which is effectively a list with four
components:
qts_aligned
: An object of classqts_sample
storing the sample of aligned QTS;qts_centers
: A list of objects of classqts
representing the centers of the clusters;best_clustering
: An object of classfdacluster::caps
storing the results of the best k-mean alignment result among all initialization that were tried.call_name
: A string storing the name of the function that was used to produce the clustering structure;call_args
: A list containing the exact arguments that were passed to the functioncall_name
that produced this output.
Examples
out <- kmeans(vespa64$igp[1:10], n_clusters = 2)
#> ℹ Computing initial centroids using kmeans++ strategy...
#> Information about the data set:
#> - Number of observations: 10
#> - Number of dimensions: 3
#> - Number of points: 101
#>
#> Information about cluster initialization:
#> - Number of clusters: 2
#> - Initial seeds for cluster centers: 6 7
#>
#> Information about the methods used within the algorithm:
#> - Warping method: affine
#> - Center method: mean
#> - Dissimilarity method: l2
#> - Optimization method: bobyqa
#>
#> Information about warping parameter bounds:
#> - Warping options: 0.1500 0.1500
#>
#> Information about convergence criteria:
#> - Maximum number of iterations: 100
#> - Distance relative tolerance: 0.001
#>
#> Information about parallelization setup:
#> - Number of threads: 1
#> - Parallel method: 0
#>
#> Other information:
#> - Use fence to robustify: 0
#> - Check total dissimilarity: 1
#> - Compute overall center: 0
#>
#> Running k-centroid algorithm:
#> - Iteration #1
#> * Size of cluster #0: 6
#> * Size of cluster #1: 4
#> - Iteration #2
#> * Size of cluster #0: 6
#> * Size of cluster #1: 4
#>
#> Active stopping criteria:
#> - Memberships did not change.