--- title: "Getting Started with NNS: Correlation and Dependence" author: "Fred Viole" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with NNS: Correlation and Dependence} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE, message=FALSE} knitr::opts_chunk$set(echo = TRUE) library(NNS) library(data.table) data.table::setDTthreads(2L) options(mc.cores = 1) Sys.setenv("OMP_THREAD_LIMIT" = 2) ``` ```{r setup2,message=FALSE,warning = FALSE} library(NNS) library(data.table) require(knitr) require(rgl) ``` # Correlation and Dependence The limitations of linear correlation are well known. Often one uses correlation, when dependence is the intended measure for defining the relationship between variables. NNS dependence **`NNS.dep`** is a signal:noise measure robust to nonlinear signals. Below are some examples comparing NNS correlation **`NNS.cor`** and **`NNS.dep`** with the standard Pearson's correlation coefficient `cor`. ## Linear Equivalence Note the fact that all observations occupy the co-partial moment quadrants. ```{r linear,fig.width=5,fig.height=3,fig.align = "center"} x = seq(0, 3, .01) ; y = 2 * x ``` ```{r linear1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE} NNS.part(x, y, Voronoi = TRUE, order = 3) ``` ```{r res1} cor(x, y) NNS.dep(x, y) ``` ## Nonlinear Relationship Note the fact that all observations occupy the co-partial moment quadrants. ```{r nonlinear,fig.width=5,fig.height=3,fig.align = "center", results='hide'} x = seq(0, 3, .01) ; y = x ^ 10 ``` ```{r nonlinear1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE} NNS.part(x, y, Voronoi = TRUE, order = 3) ``` ```{r res2a} cor(x, y) NNS.dep(x, y) ``` ## Cyclic Relationship Even the difficult inflection points, which span both the co- and divergent partial moment quadrants, are properly compensated for in **`NNS.dep`**. ```{r nonlinear_sin,fig.width=5,fig.height=3,fig.align = "center", results='hide'} x = seq(0, 12*pi, pi/100) ; y = sin(x) ``` ```{r nonlinear1_sin,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE} NNS.part(x, y, Voronoi = TRUE, order = 3, obs.req = 0) ``` ```{r res2_sin} cor(x, y) NNS.dep(x, y) ``` ## Asymmetrical Analysis The asymmetrical analysis is critical for further determining a causal path between variables which should be identifiable, i.e., it is asymmetrical in causes and effects. The previous cyclic example visually highlights the asymmetry of dependence between the variables, which can be confirmed using **`NNS.dep(..., asym = TRUE)`**. ```{r asym1} cor(x, y) NNS.dep(x, y, asym = TRUE) ``` ```{r asym2} cor(y, x) NNS.dep(y, x, asym = TRUE) ``` ## Dependence Note the fact that all observations occupy only co- or divergent partial moment quadrants for a given subquadrant. ```{r dependence,fig.width=5,fig.height=3,fig.align = "center"} set.seed(123) df <- data.frame(x = runif(10000, -1, 1), y = runif(10000, -1, 1)) df <- subset(df, (x ^ 2 + y ^ 2 <= 1 & x ^ 2 + y ^ 2 >= 0.95)) ``` ```{r circle1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE} NNS.part(df$x, df$y, Voronoi = TRUE, order = 3, obs.req = 0) ``` ```{r res3} NNS.dep(df$x, df$y) ``` # p-values for `NNS.dep()` p-values and confidence intervals can be obtained from sampling random permutations of $y \rightarrow y_p$ and running **`NNS.dep(x,$y_p$)`** to compare against a null hypothesis of 0 correlation, or independence between $(x, y)$. Simply set **`NNS.dep(..., p.value = TRUE, print.map = TRUE)`** to run 100 permutations and plot the results. ```{r permutations} ## p-values for [NNS.dep] set.seed(123) x <- seq(-5, 5, .1); y <- x^2 + rnorm(length(x)) ``` ```{r perm1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE} NNS.part(x, y, Voronoi = TRUE, order = 3) ``` ```{r permutattions_res,fig.width=5,fig.height=3,fig.align = "center"} NNS.dep(x, y, p.value = TRUE, print.map = TRUE) ``` # Multivariate Dependence `NNS.copula()` These partial moment insights permit us to extend the analysis to multivariate instances and deliver a dependence measure $(D)$ such that $D \in [0,1]$. This level of analysis is simply impossible with Pearson or other rank based correlation methods, which are restricted to bivariate cases. ```{r multi, warning=FALSE} set.seed(123) x <- rnorm(1000); y <- rnorm(1000); z <- rnorm(1000) NNS.copula(cbind(x, y, z), plot = TRUE, independence.overlay = TRUE) ``` # References If the user is so motivated, detailed arguments and proofs are provided within the following: * [Nonlinear Nonparametric Statistics: Using Partial Moments](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/index.md) * [Nonlinear Correlation and Dependence Using NNS](https://doi.org/10.2139/ssrn.3010414) * [Deriving Nonlinear Correlation Coefficients from Partial Moments](https://doi.org/10.2139/ssrn.2148522) * [Beyond Correlation: Using the Elements of Variance for Conditional Means and Probabilities](https://doi.org/10.2139/ssrn.2745308) ```{r threads, echo = FALSE} Sys.setenv("OMP_THREAD_LIMIT" = "") ```