Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normality test #482

Open
8ctopus opened this issue Nov 11, 2024 · 2 comments
Open

Normality test #482

8ctopus opened this issue Nov 11, 2024 · 2 comments

Comments

@8ctopus
Copy link

8ctopus commented Nov 11, 2024

First of all thank you for this amazing library! Also I want to apologize if I overlooked something as I'm not a math genius.

I'm wondering if there is any implementation of normality tests yet?

The idea is that considering a bunch of data, for example, the height of students in a college, is to check whether the data follows a normal distribution (Gaussian curve).

@markrogoyski
Copy link
Owner

Hi @8ctopus,

Thank you for your interest in MathPHP.

We have the χ² (chi-squared) test in Statistics\Significance, which can be used in further calculations to get at what you are asking, but I don't think we have any normality tests as is that return a true/false answer or some probability.

I think it is a good feature to add. The Wikipedia article lists many tests. If you had to pick only one to implement, which one would be the most useful to have implemented?

Thanks again for your suggestions and feedback.
Mark

@8ctopus
Copy link
Author

8ctopus commented Nov 12, 2024

@markrogoyski Hello Mark,

I have used the chi-squared test before in medical statistics and it works great provided the data is normally distributed (if not you can't use it as you already know).

So far, I have roughly tested normality two ways:

  • creating a histogram, then drawing it (if the curve looks normal then it most likely is)
  • using a not so bad approximation:
/**
 * Approximate normality test
 *
 * @param array $data
 *
 * @return float - percentage
 *
 * @note found here https://www.paulstephenborile.com/2018/03/code-benchmarks-can-measure-fast-software-make-faster/
 */
public static function testNormality(array $data) : float
{
    $mean = self::mean($data);
    $median = self::median($data);

    return abs($mean - $median) / max($mean, $median);
}

Both approaches are empirical and therefore I don't think they fit into your library.

Going back to the Wikipedia article, it says:

A 2011 study concludes that Shapiro–Wilk has the best power for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests.[1]

So my best guess, as I have no experience, would be the Shapiro-Wilk test. I actually found an article that explains really well how it works:

https://medium.com/@austinej86/understanding-the-shapiro-wilk-test-a-key-tool-for-testing-normality-14ae5107b6b5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants