compute_dist¶
- hyppo.tools.compute_dist(x, y, z=None, metric='euclidean', workers=1, **kwargs)¶
Distance matrices for the inputs.
- Parameters
x,y,z (
ndarray
offloat
) -- Input data matrices.x
,y
andz
must have the same number of samples. That is, the shapes must be(n, p)
,(n, q)
and(n, r)
where n is the number of samples and p, q, and r are the number of dimensions. Alternatively,x
,y
andz
can be distance matrices where the shapes must be(n, n)
.z
is an optional ndarray.metric (
str
,callable
, orNone
, default:"euclidean"
) -- A function that computes the distance among the samples within each data matrix. Valid strings formetric
are, as defined insklearn.metrics.pairwise_distances
,From scikit-learn: [
"euclidean"
,"cityblock"
,"cosine"
,"l1"
,"l2"
,"manhattan"
] See the documentation forscipy.spatial.distance
for details on these metrics.From scipy.spatial.distance: [
"braycurtis"
,"canberra"
,"chebyshev"
,"correlation"
,"dice"
,"hamming"
,"jaccard"
,"kulsinski"
,"mahalanobis"
,"minkowski"
,"rogerstanimoto"
,"russellrao"
,"seuclidean"
,"sokalmichener"
,"sokalsneath"
,"sqeuclidean"
,"yule"
] See the documentation forscipy.spatial.distance
for details on these metrics.
Set to
None
or"precomputed"
ifx
,y
, andz
are already distance matrices. To call a custom function, either create the distance matrix before-hand or create a function of the formmetric(x, **kwargs)
wherex
is the data matrix for which pairwise distances are calculated and**kwargs
are extra arguements to send to your custom function.workers (
int
, default:1
) -- The number of cores to parallelize the p-value computation over. Supply-1
to use all cores available to the Process.**kwargs -- Arbitrary keyword arguments provided to
sklearn.metrics.pairwise_distances
or a custom distance function.
- Returns
distx, disty, distz (
ndarray
offloat
) -- Distance matrices based on the metric provided by the user.distz
is only returned ifz
is provided.