-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] FDR: Calculate FDR using numpy #3625
Conversation
At first I thought this will be a great fix for Text add-on, but then I realized Text implements its own FDR! 😮 I don't think this is optimal. Would you care to check that implementation as well and perhaps replace it with this one? https://github.com/biolab/orange3-text/blob/master/orangecontrib/text/stats.py |
Codecov Report
@@ Coverage Diff @@
## master #3625 +/- ##
==========================================
+ Coverage 84.22% 84.23% +0.01%
==========================================
Files 370 370
Lines 67482 67468 -14
==========================================
- Hits 56834 56832 -2
+ Misses 10648 10636 -12 |
I think those implementations were the same. So Text add-on should probably use the new one in Orange core, instead of implementing its own. |
|
||
fdrs = (p_values * m / np.arange(1, len(p_values) + 1))[::-1] | ||
fdrs = np.array(np.minimum.accumulate(fdrs)[::-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kudos.
def test_FDR_dependent(self): | ||
p_values = np.array([0.0002, 0.0004, 0.00001, 0.0003, 0.0001]) | ||
np.testing.assert_almost_equal( | ||
np.array([0.00076, 0.00091, 0.00011, 0.00086, 0.00057]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you gotten these number independently, that is, from another source or calculated manually, not from the code itself? (I'm asking because I used to be a big sinner myself.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I calculated those using the old function.
Agree. |
ordered = is_sorted(p_values) | ||
if p_values is None or len(p_values) == 0 or \ | ||
(m is not None and m <= 0): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cover this line in tests to get approval by codecov (and to make sure we don't forget these cases at any future refactoring).
FDR in Text add-on replaced with this implementation: biolab/orange3-text#416 |
Issue
Current implementation of False discovery rate is using python list and for loops and is therefore inefficient.
Description of changes
Implement FDR using numpy.
Includes