Note
Click here to download the full example code
D-Variate Independence Testing¶
Here, we consider joint independence testing of \(d\) random variables. This is a more difficult task than pairwise independence testing, but this can be very useful when we are asking the question of whether three or more groups are affecting one another. Joint independence can be tested by combining pairwise independence tests, but using a \(d\)-variate independence test is generally faster.
The \(d\)-variate independence test can be found in hyppo.d_variate
, and
will be explained in detail below. Like all the other tests within hyppo, each
method has a statistic
and test
method. The test
method is
the one that returns the test statistic and p-values, among other outputs, and is
the one that is used most often in the examples, tutorials, etc. The p-value returned
is calculated using a permutation test using hyppo.tools.multi_perm_test
.
Specifics about how the statistic is calculated in hyppo.d_variate
can be
found in the docstring of the test. Here, we give an overview of the \(d\)-variate
independence test we offer in hyppo and some of its properties compared to those
in hyppo.independence
.
D-variable Hilbert Schmidt Independence Criterion (dHsic)¶
dHsic is an extension of hyppo.independence.Hsic
, and it uses the
reproducing kernel Hilbert space to test for the joint independence of \(d\)
random variables. More details can be found in hyppo.d_variate.dHsic
.
Note that unlike hyppo.independence.Hsic
, there is no fast version of
the test. It always uses the permutation method to compute its p-value.
Note
- Pros
Highly accurate independence test for d random variables
Much faster than constructing a joint independence test from multiple pairwise independence tests
- Cons
Is not always more powerful than pairwise Hsic, depends on simulation
and the dependence structure of the variables
dHsic is often computationally less expensive than using pairwise Hsic, and if dimension \(d\) is too large, a pairwise Hsic approach may fail to reject the null hypothesis.
The following is a general use case of dHsic using data points that simulate a
1D linear relationship between random variables \(X\), \(Y\), \(U\),
and \(V\). Note that here we use the default gaussian kernel with a gamma
value of 0.5. For a full list of parameters, see hyppo.d_variate.dHsic
.
from hyppo.d_variate import dHsic
from hyppo.tools import linear
x, y = linear(100, 1)
u, v = linear(100, 1)
stat, pvalue = dHsic(gamma=0.5).test(x, y, u, v)
print(stat, pvalue)
Out:
0.05440564414052551 0.000999000999000999
Total running time of the script: ( 0 minutes 1.290 seconds)