-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HWP output issues: missing common genotypes and incorrect Guo & Thompson stats #17
Comments
adjust existing tests accordingly add new test using .ini and .pop file from @sjmack move process call into base test module to make command-line easier
@sjmack: the problem had a common origin: generating the genotype output table, the fix I just pushed to In any case, good catch! more test data files like this will improve our overall test coverage, and catch more issues like this. I would encourage @kosoegawa and others to please open up issues and add files like this. |
The dash version is only useful for testing against older versions of PyPop, but that is necessary for validation, so .... I'll run some tests! |
I'll add to the test suite as well then, since it should handle dashes as well as colons. Right now, we would have problems only if you used the genotype separator (which I've hardcoded as a tilde |
@sjmack is this issue fixed from your POV? if so, I'll close. |
The missing common genotypes and incorrect stats issues are resolved; however Issue #19 discusses an additional issue with the common genotypes (which I should have noted using the *_dash.pop version of the data, but didn't).
|
I'll respin the tests as per my comment in #4 and if that works, I'll close this particular issue. |
Hi @sjmack, let me know if the tests look OK and I'll close. |
py.test: 11 passed, 1 skipped in 75.75 seconds. Looks good. |
Transferring from issue #4 comment #4 (comment) originally by @sjmack:
However, I have constructed a test data file (the controls from the BIGDAWG synthetic datafile) that reveals several issues with the current HW implementations (vs version 0.7.0).
I'm attaching two versions of this test file:
BIGDAWG_SynthControl_Data.pop.txt
and
BIGDAWG_SynthControl_Data_dash.pop.txt
And the associated .ini file:
WS_BDCtrl_Test_HW.ini.txt
Be sure to remove the .txt suffices.
The difference between the two datasets is that the *_dash.pop file has the colons converted to dashes. I did this so that I could compare the current developmental version of PyPop on my Mac to v0.7.0 running on my PC. I could only run the *_dash.pop file on my PC.
I have three set of results. The git. versions were generated using this development version of PyPop, and the 070. versions with the current release version.
First of all, you will notice that there are no Common Genotypes being generated with the development version. The results below show the dash datasets, but the same happens when colons are included for the developmental version.
Compare (Git):
With (v0.7.0):
In addition, the stats being reported for the developmental version include errors; especially for the Chen and Diff tests, where obs and exp values are 0. The differences in the p-values for the mcmc results probably stem from the Markov-Chain, but I only did each one once, so I'm not certain.
Compare (Git):
to (version 0.7.0):
Here are the results:
BIGDAWG_SynthControl_Data-out.git.txt
BIGDAWG_SynthControl_Data-out.git.xml.txt
BIGDAWG_SynthControl_Data_dash-out.git.txt
BIGDAWG_SynthControl_Data_dash-out.git.xml.txt
BIGDAWG_SynthControl_Data_dash-out.070.txt
BIGDAWG_SynthControl_Data_dash-out.070.xml.txt
Again, remove the .txt from the .XML filenames.
The text was updated successfully, but these errors were encountered: