-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variance calculation gives biased results for samples #2
Comments
I wasn't familiar with Bessel's correction, though reading the wikipedia article I saw:
How do you suggest we handle this? By adding additional methods for subsets, or perhaps by creating a subset-only version of this library? |
Yeah, you only need to apply the correction if you're dealing with a sample out of a larger population and you don't know the mean. One of the caveats is that Bessel's correction will give you an unbiased variance when you have samples, but it won't give you an unbiased standard deviation: there is no general method for calculating an unbiased sd in the first place. It does, however, correct some of the bias. There's also the question of which correction factor to use, but n-1 is good enough for most cases (and if someone needs something more sophisticated, it'll probably fall out of scope for stats-lite anyhow.) A simple, backwards-compatible way of implementing this could be to have // Variance = average squared deviation from mean.
// If sample is true, vals represents a sample of a population, so Bessel's correction will be applied
function variance(vals, sample) {
vals = numbers(vals)
var avg = mean(vals)
var diffs = []
for (var i = 0; i < vals.length; i++) {
diffs.push(Math.pow((vals[i] - avg), 2))
}
var res = mean(diffs);
if(sample) {
res *= vals.length / (vals.length - 1);
}
return res;
}
// Standard Deviation = sqrt of variance.
// If sample is true, vals represents a sample of a population, so Bessel's correction will be applied
function stdev(vals, sample) {
return Math.sqrt(variance(vals, sample))
} |
Usually not a huge fan of polymorphic functions in Node where optimization matters due to the way V8 deoptimizes them. That said I don't know how much of a concern it is in this case because in the same application the code would have to call it like Will think about it. In other news I just published v2.0.0 of this module with support for multi-modal |
The current method of calculating variance (and, by extension, standard deviation) is intended for sets that form the whole population. When dealing with a sample, i.e. you pick n elements out of k and you don't know the mean of the whole population, you need to apply Bessel's correction and divide by n-1 instead of n when taking the mean.
The text was updated successfully, but these errors were encountered: