- Copy "Correlation.ipynb" to google drive
- Open the file with google colab
- Chage x and y value by your data
- Run the code
In this repository, four famous correlation algorithms have been implemented. Pearson, spearman, Chatterjee, and MIC correlation algorithm implemented.
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it normally refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.
Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).
Four Collation algorithm Pearson, spearman, Chatterjee, and MIC implemented in "Corelation.ipynb".
In statistics, the Pearson correlation coefficient, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations
In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman, is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function.
The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.
Intuitively, the Spearman correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two variables.
Chatterjee CC (CCC: Chatterjee, 2021), as a function of ranks correlation, is a new correlation method with a very simple and understandable formula, and quick computing, but significantly robust to deal with the aforementioned data types without having any assumptions for the variables’ distributions.
Article: : https://arxiv.org/abs/1909.10140?source=techstories.org
In statistics, the maximal information coefficient (MIC) is a measure of the strength of the linear or non-linear association between two variables X and Y.
The MIC belongs to the maximal information-based nonparametric exploration (MINE) class of statistics. In a simulation study, MIC outperformed some selected low power tests, however concerns have been raised regarding reduced statistical power in detecting some associations in settings with low sample size when compared to powerful methods such as distance correlation and Heller–Heller–Gorfine (HHG). Comparisons with these methods, in which MIC was outperformed, were made in Simon and Tibshirani and in Gorfine, Heller, and Heller. It is claimed that MIC approximately satisfies a property called equitability which is illustrated by selected simulation studies It was later proved that no non-trivial coefficient can exactly satisfy the equitability property as defined by Reshef et al., although this result has been challenged. Some criticisms of MIC are addressed by Reshef et al. in further studies published on arXiv.
how to use in python: https://minepy.readthedocs.io/en/latest/python.html
- Chatterjee Correlation: https://www.sciencedirect.com/science/article/pii/S0169136822002621#:~:text=Chatterjee%20CC%20(CCC%3A%20Chatterjee%2C,assumptions%20for%20the%20variables'%20distributions.
- Chatterjee Correlation article:https://arxiv.org/abs/1909.10140?source=techstories.org
- install minepy on python: https://minepy.readthedocs.io/en/latest/python.html
- Implementing Pearson correlation with scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
- Implementing Spearman's correlation with scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html
- Implementing Pearson and Spearman Coefficients in python: https://realpython.com/numpy-scipy-pandas-correlation-python/
- Efficient test for nonlinear dependence of two continuous variables : https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0697-7#:~:text=The%20two%20most%20common%20non,statistical%20dependence%20between%20two%20variables