This is a collection of functions that provide test statistics to be used into the permutation scheme for performing two-sample testing. These test statistics can be divided into two categories: traditional statistics that use empirical moments and inter-point statistics that only rely on pairwise dissimilarities between data points.
stat_welch(data, indices1, ...)
stat_student(data, indices1, ...)
stat_t(data, indices1, ...)
stat_fisher(data, indices1, ...)
stat_f(data, indices1, ...)
stat_mean(data, indices1, ...)
stat_hotelling(data, indices1, ...)
stat_bs(data, indices1, ...)
stat_student_ip(data, indices1, ...)
stat_t_ip(data, indices1, ...)
stat_fisher_ip(data, indices1, ...)
stat_f_ip(data, indices1, ...)
stat_bg_ip(data, indices1, ...)
stat_energy_ip(data, indices1, alpha = 1L, ...)
stat_cq_ip(data, indices1, ...)
stat_mod_ip(data, indices1, ...)
stat_dom_ip(data, indices1, standardize = TRUE, ...)
Either a list of the n1 + n2
concatenated observations with the
original n1
observations from the first sample on top and the original
n2
observations from the second sample below. Or a dissimilarity matrix
stored as a dist
object for all inter-point statistics
whose function name should end with _ip()
.
An integer vector specifying the indices in data
that are
considered to belong to the first sample.
Extra parameters specific to some statistics.
A scalar value specifying the power to which the dissimilarities
should be elevated in the computation of the inter-point energy statistic.
Default is 1L
.
A boolean specifying whether the distance between medoids
in the stat_dom_ip
function should be normalized by the
pooled corresponding variances. Default is TRUE
.
A real scalar giving the value of test statistic for the permutation
specified by the integer vector indices
.
stat_hotelling
implements Hotelling's \(T^2\) statistic for
multivariate data with \(p < n\).
stat_student
or stat_t
implements Student's
statistic (originally assuming equal variances and thus using the pooled
empirical variance estimator). See t.test
for details.
stat_welch
implements Student-Welch statistic which is
essentially a modification of Student's statistic accounting for unequal
variances. See t.test
for details.
stat_fisher
or stat_f
implements Fisher's
variance ratio statistic. See var.test
for details.
stat_mean
implements a statistic that computes the difference
between the means.
stat_bs
implements the statistic proposed by Bai & Saranadasa
(1996) for high-dimensional multivariate data.
stat_student_ip
or stat_t_ip
implements a
Student-like test statistic based on inter-point distances only as described
in Lovato et al. (2020).
stat_fisher_ip
or stat_f_ip
implements a
Fisher-like test statistic based on inter-point distances only as described
in Lovato et al. (2020).
stat_bg_ip
implements the statistic proposed by Biswas &
Ghosh (2014).
stat_energy_ip
implements the class of energy-based
statistics as described in Székely & Rizzo (2013);
stat_cq_ip
implements the statistic proposed by Chen & Qin
(2010).
stat_mod_ip
implements a statistic that computes the mean of
inter-point distances.
stat_dom_ip
implements a statistic that computes the distance
between the medoids of the two samples, possibly standardized by the pooled
corresponding variances.
Bai, Z., & Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 311-329.
Lovato, I., Pini, A., Stamm, A., & Vantini, S. (2020). Model-free two-sample test for network-valued data. Computational Statistics & Data Analysis, 144, 106896.
Biswas, M., & Ghosh, A. K. (2014). A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis, 123, 160-171.
Székely, G. J., & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8), 1249-1272.
Chen, S. X., & Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38(2), 808-835.
n <- 10L
mx <- 0
sigma <- 1
delta <- 10
my <- mx + delta
x <- rnorm(n = n, mean = mx, sd = sigma)
y <- rnorm(n = n, mean = my, sd = sigma)
D <- dist(c(x, y))
x <- as.list(x)
y <- as.list(y)
stat_welch(c(x, y), 1:n)
#> [1] -21.58756
stat_t(c(x, y), 1:n)
#> [1] 21.58756
stat_f(c(x, y), 1:n)
#> [1] 1.625402
stat_mean(c(x, y), 1:n)
#> [1] -10.28706
stat_hotelling(c(x, y), 1:n)
#> [1] 93.20457
stat_bs(c(x, y), 1:n)
#> [1] 105.6193
stat_t_ip(D, 1:n)
#> [1] 465.0228
stat_f_ip(D, 1:n)
#> [1] 1.625402
stat_bg_ip(D, 1:n)
#> [1] 165.0684
stat_energy_ip(D, 1:n)
#> [1] 9.204518
stat_cq_ip(D, 1:n)
#> [1] -18.16847
stat_mod_ip(D, 1:n)
#> [1] 10.28706
stat_dom_ip(D, 1:n)
#> [1] 9.705128