ConditionalDcorr¶
- class hyppo.conditional.ConditionalDcorr(compute_distance='euclidean', use_cov=True, bandwidth=None, **kwargs)¶
Conditional Distance Covariance/Correlation (CDcov/CDcorr) test statistic and p-value.
CDcorr is a measure of dependence between two paired random matrices given a third random matrix of not necessarily equal dimensions 1. The coefficient is 0 if and only if the matrices are independent given third matrix.
- Parameters
compute_distance (
str
,callable
, orNone
, default:"euclidean"
) -- A function that computes the distance among the samples within each data matrix. Valid strings forcompute_distance
are, as defined insklearn.metrics.pairwise_distances
,From scikit-learn: [
"euclidean"
,"cityblock"
,"cosine"
,"l1"
,"l2"
,"manhattan"
] See the documentation forscipy.spatial.distance
for details on these metrics.From scipy.spatial.distance: [
"braycurtis"
,"canberra"
,"chebyshev"
,"correlation"
,"dice"
,"hamming"
,"jaccard"
,"kulsinski"
,"mahalanobis"
,"minkowski"
,"rogerstanimoto"
,"russellrao"
,"seuclidean"
,"sokalmichener"
,"sokalsneath"
,"sqeuclidean"
,"yule"
] See the documentation forscipy.spatial.distance
for details on these metrics.
Set to
None
or"precomputed"
ifx
andy
are already distance matrices. To call a custom function, either create the distance matrix before-hand or create a function of the formmetric(x, **kwargs)
wherex
is the data matrix for which pairwise distances are calculated and**kwargs
are extra arguements to send to your custom function.use_cov (
bool,
) -- If True, then the statistic will compute the covariance rather than the correlation.bandwith (
str
,scalar
,1d-array
) -- The method used to calculate the bandwidth used for kernel density estimate of the conditional matrix. This can be ‘scott’, ‘silverman’, a scalar constant or a 1d-array with lengthr
which is the dimensions of the conditional matrix. If None (default), ‘scott’ is used.**kwargs -- Arbitrary keyword arguments for
compute_distance
.
References
- 1
Xueqin Wang, Wenliang Pan, Wenhao Hu, Yuan Tian, and Heping Zhang. Conditional distance correlation. Journal of the American Statistical Association, 110(512):1726–1734, 2015.
Methods Summary
|
Helper function that calculates the CDcov/CDcorr test statistic. |
|
Calculates the CDcov/CDcorr test statistic and p-value. |
- ConditionalDcorr.statistic(x, y, z)¶
Helper function that calculates the CDcov/CDcorr test statistic.
- Parameters
x,y,z (
ndarray
offloat
) -- Input data matrices.x
,y
andz
must have the same number of samples. That is, the shapes must be(n, p)
,(n, q)
and(n, r)
where n is the number of samples and p, q, and r are the number of dimensions. Alternatively,x
andy
can be distance matrices andz
can be a similarity matrix where the shapes must be(n, n)
.- Returns
stat (
float
) -- The computed CDcov/CDcorr statistic.
- ConditionalDcorr.test(x, y, z, reps=1000, workers=1, random_state=None)¶
Calculates the CDcov/CDcorr test statistic and p-value.
- Parameters
x,y,z (
ndarray
offloat
) -- Input data matrices.x
,y
andz
must have the same number of samples. That is, the shapes must be(n, p)
,(n, q)
and(n, r)
where n is the number of samples and p, q, and r are the number of dimensions. Alternatively,x
andy
can be distance matrices andz
can be a similarity matrix where the shapes must be(n, n)
.reps (
int
, default:1000
) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.workers (
int
, default:1
) -- The number of cores to parallelize the p-value computation over. Supply-1
to use all cores available to the Process.random_state (
int
, default:None
) -- The random_state for permutation testing to be fixed for reproducibility.
- Returns