DiscrimTwoSample¶

class
hyppo.discrim.
DiscrimTwoSample
(is_dist=False, remove_isolates=True)¶ Two Sample Discriminability test statistic and pvalue.
Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].
Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then,
\[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).
Methods Summary
Helper function that calculates the discriminability test statistic. 


Calculates the test statistic and pvalue for a two sample test for discriminability. 

DiscrimTwoSample.
statistic
(x, y)¶ Helper function that calculates the discriminability test statistic.
 Parameters
x, y (
ndarray
)  Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n). Returns
stat (
float
)  The computed two sample discriminability statistic.

DiscrimTwoSample.
test
(x1, x2, y, reps=1000, alt='neq', workers= 1, random_state=None)¶ Calculates the test statistic and pvalue for a two sample test for discriminability.
 Parameters
x1, x2 (
ndarray
)  Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), andis_dist
must set toTrue
in this case.y (
ndarray
)  A vector containing the sample ids for our n samples. Should be matched to the inputs such thaty[i]
is the corresponding label forx_1[i, :]
andx_2[i, :]
.reps (
int
,optional (default
:1000)
)  The number of replications used to estimate the null distribution when using the permutation test used to calculate the pvalue.alt (
{"greater", "less", "neq"}
(default:"neq"
)
)  The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").workers (
int
,optional (default
:1)
)  The number of cores to parallelize the pvalue computation over. Supply 1 to use all cores available to the Process.
 Returns
Examples
>>> import numpy as np >>> from hyppo.discrim import DiscrimTwoSample >>> x1 = np.ones((100,2), dtype=float) >>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0) >>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0) >>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y) >>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue) '0.5, 1.0, 0.00'