When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be strongly related. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers.
The correlation coefficient is used widely for this purpose but it has two shortcomings:
- It can’t detect non-linear relationships
- It doesn't work with categorical variables
We propose an alternative - the x2y
metric - that remedies these shortcomings.
Please see this article for more detail on what it is and how it works, along with examples.