KCI¶
- class hyppo.conditional.KCI(**kwargs)¶
Kernel Conditional Independence Test Statistic and P-Value.
This is a conditional indpendence test utilizing a radial basis function to calculate the kernels of two datasets. The trace of the normalized matrix product is then calculated to extract the test statistic. A Gaussian distribution is then utilized to calculate the p-value given the statistic and approximate mean and variance of the trace values of the independent kernel matrices. This test is consistent against similar tests.
Notes
The statistic is computed as follows 1:
Let \(x\) be a combined sample of \((n, p)\) sample of random variables \(X\) and let \(y\) be a \((n, 1)\) labels of sample classes \(Y\). We can then generate \(K^x\) and \(K^y\) kernel matrices for each of the respective samples. Normalizing, multiplying, and taking the trace of these kernel matrices gives the resulting test statistic. The p-value and null distribution for the corrected statistic are calculated a gamma distribution approximation.
References
- 1
Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI'11, 804–813. Arlington, Virginia, USA, 2011. AUAI Press.
Methods Summary
|
|
|
Calculates the conditional independence test statistic. |
|
Calculates the Kernel Conditional Independence test statistic and p-value. |
- KCI.compute_kern(x, y)¶
- KCI.statistic(x, y)¶
Calculates the conditional independence test statistic.
- KCI.test(x, y)¶
Calculates the Kernel Conditional Independence test statistic and p-value.
- Parameters
x,y (
ndarray
offloat
) -- Input data matrices.x
andy
must have the same number of columns. That is, the shapes must be(n, p)
and(n, 1)
where n is the dimension of samples and p is the number of dimensions.- Returns
Example
>>> from hyppo.conditional import KCI >>> from hyppo.tools.indep_sim import linear >>> np.random.seed(123456789) >>> x, y = linear(100, 1) >>> stat, pvalue = KCI().test(x, y) >>> '%.1f, %.2f' % (stat, pvalue) '544.7, 0.00'