{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n\n# Time Series Testing\n\nA common problem, especially in the realm of data such as fMRI images, is identifying\ncausality within time series data. This is difficult and sometimes impossible to do\namong multivarate and nonlinear data.\n\nIf you are interested in questions of this mold, this module of the package is for you!\nAll our tests can be found in :mod:hyppo.time_series, and will be elaborated in\ndetail below. But before that, let's look at the mathematical formulations:\n\nLet $\\mathbb{N}$ be the non-negative integers ${0, 1, 2, \\ldots}$, and\n$\\mathbb{R}$ be the real line $(\u2212\\infty, \\infty)$. Let $F_X$,\n$F_Y$, and $F_{XY}$ represent the marginal and joint distributions of\nrandom variables $X$ and $Y$, whose realizations exist in\n$\\mathcal{X}$ and $\\mathcal{Y}$, respectively. Similarly, Let $F_X$,\n$F_Y$, and $F_{(X_t, Y_s)}$ represent the marginal and joint distributions\nof the time-indexed random variables $X_t$ and $Y_s$ at timesteps $t$\nand $s$. For this work, assume $\\mathcal{X} = \\mathbb{R}^p$ and\n$\\mathcal{Y} = \\mathbb{R}^q$ for $p, q > 0$. Finally, let\n$\\{(X_t, Y_t)\\}_{t = -\\infty}^\\infty$ represent the full, jointly-sampled time\nseries, structured as a countably long list of observations $(X_t, Y_t)$. Consider\na strictly stationary time series $\\{(X_t, Y_t)\\}_{t = -\\infty}^\\infty$, with the\nobserved sample $\\{(X_1, Y_1), \\ldots, (X_n, Y_n)\\}$. Choose some\n$M \\in \\mathbb{N}$, the maximum_lag hyperparameter. We test the independence of\ntwo series via the following hypothesis.\n\n\\begin{align}H_0: F_{(X_t,Y_{t-j})} &= F_{X_t} F_{Y_{t-j}}\n \\text{ for each } j \\in {0, 1, \\ldots, M} \\\\\n H_A: F_{(X_t,Y_{t-j})} &\\neq F_{X_t} F_{Y_{t-j}}\n \\text{ for some } j \\in {0, 1, \\ldots, M}\\end{align}\n\nThe null hypothesis implies that for any $(M + 1)$-length stretch in the time\nseries, $X_t$ is pairwise independent of present and past values $Y_{t - j}$\nspaced $j$ timesteps away (including $j = 0$). A corresponding test for\nwhether $Y_t$ is dependent on past values of $X_t$ is available by swapping\nthe labels of each time series. Finally, the hyperparameter $M$ governs the\nmaximum number of timesteps in the past for which we check the influence of\n$Y_{t - j}$ on $X_t$. This $M$ can be chosen for computation\nconsiderations, as well as for specific subject matter purposes, e.g. a signal from one\nregion of the brain might only influence be able to influence another within 20 time\nsteps implies $M = 20$.\n\nLike all the other tests within hyppo, each method has a :func:statistic and\n:func:test method. The :func:test method is the one that returns the test statistic\nand p-values, among other outputs, and is the one that is used most often in the\nexamples, tutorials, etc.\nThe p-value returned is calculated using a permutation test.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Cross multiscale graph correlation (MGCX)** and\n**cross distance correlation (DcorrX)** are time series tests of independence. They\nare a more powerful alternative to pairwise Pearson's correlation comparisons that\nare the standard in the literature, and is multivariate and distance based.\nMore details can be found in :class:hyppo.time_series.MGCX and\n:class:hyppo.time_series.DcorrX.\n\n

#### Note

:Pros: - Very accurate\n - Operates of multivariate data\n :Cons: - Slower than pairwise Pearson's correlation

\n\nEach class has a max_lag parameter that indicates the maximum number of lags to\ncheck between an inputs x and the shifted y.\nBoth statistic functions return opt_lag, while MGCX returns the optimal scale\n(see :class:hyppo.independence.MGC for more info).\n\nAs an example, let's generate some simulated data using :class:hyppo.tools.indep_ar:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from hyppo.tools import indep_ar\n\n# 40 samples, Independence AR, lag = 1\nx, y = indep_ar(40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data are points simulating an independent AR(1) process and returns realizations\nas :class:numpy.ndarray:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nimport seaborn as sns\n\n# make plots look pretty\nsns.set(color_codes=True, style=\"white\", context=\"talk\", font_scale=1)\n\n# look at the simulation\nn = x.shape\nt = range(1, n + 1)\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7.5))\nax1.plot(t, x)\nax1.plot(t, y)\nax2.scatter(x, y)\nax1.legend([\"X_t\", \"Y_t\"], loc=\"upper left\", prop={\"size\": 12})\nax1.set_xlabel(r\"$t$\")\nax2.set_xlabel(r\"$X_t$\")\nax2.set_ylabel(r\"$Y_t$\")\nfig.suptitle(\"Independent AR (lag=1)\")\nplt.axis(\"equal\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the simulations are independent, we would expect to see a high p-value.\nWe can verify whether or not we can see a trend within the data by\nrunning a time-series independence test. Let's use MGCX\nWe have to import it, and then run the test.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from hyppo.time_series import DcorrX\n\nstat, pvalue, _ = DcorrX(max_lag=0).test(x, y)\nprint(stat, pvalue)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, we verify that we get a high p-value! Let's repeat this process again for\nsimulations that are correlated. We woulld expect a low p-value in this case.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from hyppo.tools import cross_corr_ar\n\n# 40 samples, Cross Correlation AR, lag = 1\nx, y = cross_corr_ar(40)\n\n# stuff to make the plot and make it look nice\nn = x.shape\nt = range(1, n + 1)\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7.5))\nax1.plot(t[1:n], x[1:n])\nax1.plot(t[0 : (n - 1)], y[0 : (n - 1)])\nax2.scatter(x, y)\nax1.legend([\"X_t\", \"Y_t\"], loc=\"upper left\", prop={\"size\": 12})\nax1.set_xlabel(r\"$t$\")\nax2.set_xlabel(r\"$X_t$\")\nax2.set_ylabel(r\"$Y_t$\")\nfig.suptitle(\"Cross Correlation AR (lag=1)\")\nplt.axis(\"equal\")\nplt.show()\n\nstat, pvalue, _ = DcorrX(max_lag=1).test(x, y)\nprint(stat, pvalue)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }