# FCIT¶

class hyppo.conditional.FCIT(model=DecisionTreeRegressor(), cv_grid={'min_samples_split': [2, 8, 64, 512, 0.01, 0.2, 0.4]}, num_perm=8, prop_test=0.1, discrete=(False, False))

Fast Conditional Independence test statistic and p-value

The Fast Conditional Independence Test is a non-parametric conditional independence test 1.

Parameters
• model (Sklearn regressor) -- Regressor used to predict input data $$Y$$

• cv_grid (dict) -- Dictionary of parameters to cross-validate over when training regressor.

• num_perm (int) -- Number of data permutations to estimate the p-value from marginal stats.

• prop_test (float) -- Proportion of data to evaluate test stat on.

• discrete (tuple of string) -- Whether $$X$$ or $$Y$$ are discrete

Notes

The motivation for the test rests on the assumption that if $$X \not\!\perp\!\!\!\perp Y \mid Z$$, then $$Y$$ should be more accurately predicted by using both $$X$$ and $$Z$$ as covariates as opposed to only using $$Z$$ as a covariate. Likewise, if $$X \perp \!\!\! \perp Y \mid Z$$, then $$Y$$ should be predicted just as accurately by solely using $$X$$ or soley using $$Z$$ 1. Thus, the test works by using a regressor (the default is decision tree) to to predict input $$Y$$ using both $$X$$ and $$Z$$ and using only $$Z$$ 1. Then, accuracy of both predictions are measured via mean-squared error (MSE). $$X \perp \!\!\! \perp Y \mid Z$$ if and only if MSE of the algorithm using both $$X$$ and $$Z$$ is not smaller than the MSE of the algorithm trained using only $$Z$$ 1.

References

1(1,2,3,4)

Krzysztof Chalupka, Pietro Perona, and Frederick Eberhardt. Fast conditional independence test for vector variables with large sample sizes. arXiv:1804.02747 [math, stat], 2018.

Methods Summary

FCIT.statistic(x, y, z=None)

Calculates the FCIT test statistic.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

FCIT.test(x, y, z=None)

Calculates the FCIT test statistic and p-value.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

Examples

>>> import numpy as np
>>> from hyppo.conditional import FCIT
>>> from sklearn.tree import DecisionTreeRegressor
>>> np.random.seed(1234)
>>> dim = 2
>>> n = 100000
>>> z1 = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n))
>>> A1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> B1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> x1 = (A1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> y1 = (B1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> model = DecisionTreeRegressor()
>>> cv_grid = {"min_samples_split": [2, 8, 64, 512, 1e-2, 0.2, 0.4]}
>>> stat, pvalue = FCIT(model=model, cv_grid=cv_grid).test(x1.T, y1.T, z1)
>>> '%.1f, %.2f' % (stat, pvalue)
'-3.6, 1.00'