Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Mann-Whitney-U test #182

Open
wants to merge 3 commits into
base: developer
Choose a base branch
from
Open

FR: Mann-Whitney-U test #182

wants to merge 3 commits into from

Conversation

omaus
Copy link
Contributor

@omaus omaus commented Jan 28, 2022

Please reference the issue(s) this PR is related to

Tackles #117

Please list the changes introduced in this PR

  • U test added

Description

Mann-Whitney-U test added to the Testing portfolio. Corresponding unit test as well.

  • The project builds without problems on your machine
  • Added unit tests regarding the added features

open FSharp.Stats
open FSharp.Stats.Testing

// TO DO: Bergmann et al. (2000) showed that there are different implementations of this test that lead to different results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if your computations are correct before PR can be merged. Additionally, check your results against established packages in R, SPSS or puplications rather than volatile wikipedia chapters.

Comment on lines +18 to +25
// let abundance = // method for equal ranks instead of mean ranks when identical values occur.
// sortedMerge
// |> Array.map (
// fun v -> Array.filter (fun v2 -> v2 = v) sortedMerge
// >> Array.length
// )
// let myMap = sortedMerge |> Array.mapi (fun i x -> x, i + 2 - Array.item i abundance) |> Map // wrong: must return mean of ranksums with equal ranks, not always the same rank!
// let rankedMerge = sortedMerge |> Array.map (fun (v,group) -> float myMap.[(v,group)],v,group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

let uTestTests =
// taken from https://de.wikipedia.org/wiki/Wilcoxon-Mann-Whitney-Test#Beispiel
let testList1 =
([0;400;500;550;600;650;750;800;900;950;1000;1100;1200;1500;1600;1800;1900;2000;2200;3500 ],["M";"W";"M";"W";"M";"W";"M";"M";"W";"W";"M";"M";"W";"M";"W";"M";"M";"M";"M";"M"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add duplicates to check ranking procedure

Comment on lines +41 to +42
let u1 = seq1Length * seq2Length + (seq1Length * (seq1Length + 1.) / 2.) - rankSumSeq1
let u2 = seq1Length * seq2Length + (seq2Length * (seq2Length + 1.) / 2.) - rankSumSeq2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary results should not be calculated twice

Comment on lines +35 to +36
|> Array.filter (fun (rank,v,group') -> group' = group)
|> Array.fold (fun state (rank,v,group') -> state + rank) 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with countBy

Comment on lines +35 to +36
|> Array.filter (fun (rank,v,group') -> group' = group)
|> Array.fold (fun state (rank,v,group') -> state + rank) 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with countBy

let sortedMerge =
(seq1 |> Seq.map (fun v -> float v, 0), seq2 |> Seq.map (fun v -> float v, 1)) // 0 = first group; 1 = second group
||> Seq.append
|> Seq.sortByDescending (fun (v,groupIndex) -> v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary that the sequences are sorted? The ranking is order-independent and as I can see there is no need for sorting.

Copy link
Member

@bvenn bvenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check and correct the comments I suggested.

}
let createUTest statistic : UTestTestStatistics =
let cdf = Distributions.Continuous.Normal.CDF 0. 1. statistic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approximation via the z-Distribution is only valid for large samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the distribution is discrete, I cannot find any other (continuous) approximation than the one via normal distribution.
https://stats.stackexchange.com/a/251734

So, what do you suggest? Delete the whole test because of normal approximation might be too inaccurate at low sample sizes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should aim to implement the exact distribution for n+m<100 and the approximation for larger values

see #213 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants