DiscrimOneSample

class hyppo.discrim.DiscrimOneSample(is_dist=False, remove_isolates=True)

One Sample Discriminability test statistic and p-value.

Discriminability index is a measure of whether a data acquisition and preprocessing pipeline is more discriminable among different subjects. The key insight is that each repeated mesurements of the same item should be the more similar to one another than measurements between different items. The one sample test measures whether the discriminability for a dataset differs from random chance. More details are in [1].

With \(D_x\) as the sample discriminability of \(x\), one sample test performs the following test,

\[\begin{split}H_0: D_x &= D_0 \\ H_A: D_x &> D_0\end{split}\]

where \(D_0\) is the discriminability that would be observed by random chance.

Parameters
  • is_dist (bool, default: False) -- Whether inputs are distance matrices.

  • remove_isolates (bool, default: True) -- Whether to remove the measurements with a single instance or not.

Methods Summary

DiscrimOneSample.statistic(x, y)

Helper function that calculates the discriminability test statistics.

DiscrimOneSample.test(x, y[, reps, workers, ...])

Calculates the test statistic and p-value for Discriminability one sample test.


DiscrimOneSample.statistic(x, y)

Helper function that calculates the discriminability test statistics.

Parameters

x,y (ndarray of float) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).

Returns

stat (float) -- The computed two sample discriminability statistic.

DiscrimOneSample.test(x, y, reps=1000, workers=1, random_state=None)

Calculates the test statistic and p-value for Discriminability one sample test.

Parameters
  • x (ndarray of float) -- Input data matrices. x must have shape (n, p) where n is the number of samples and p are the number of dimensions. Alternatively, x can be distance matrices, where the shape must be (n, n), and is_dist must set to True in this case.

  • y (ndarray of float) -- A vector containing the sample ids for our n samples.

  • reps (int, default: 1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

  • workers (int, default: 1) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

  • stat (float) -- The computed discriminability statistic.

  • pvalue (float) -- The computed one sample test p-value.

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimOneSample
>>> x = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> '%.1f, %.2f' % DiscrimOneSample().test(x, y) 
'1.0, 0.00'

Examples using hyppo.discrim.DiscrimOneSample