-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch to scientific notation when frequencies can't be displayed as decimals #104
Comments
hey @sjmack I had a little time to work on this again. It's almost impossible to fix in XSLT directly, but I was able to create a Python extension for xslt that implements the rules we want for this. I've created a new branch for this: https://github.com/alexlancaster/pypop/tree/104-preserve-scientific-notation-when-truncating-viewable-haplotype-frequencies I had to fix this more globally - not just within haplotype frequencies, but everwhere we have space with limited characters. I currently have a special pypop/src/PyPop/xslt/__init__.py Lines 52 to 65 in 032c6a0
So now if I set the "places" to, say 5, it will look like the following (there are now unit tests that check these examples and more):
This also affects the HW tables etc, which isn't so elegant because of the extra space that 11:04P 1/1.1
13:01P 2/1.6 1/0.6
-13:02P 0/0.4 0/0.3 0/4e-2
-13:03P 0/0.2 0/0.1 0/3e-2 0/6e-3
+13:02P 0/0.4 0/0.3 0/0.0
+13:03P 0/0.2 0/0.1 0/0.0 0/0.0
14:01P 3/2.8 2/2.0 1/0.5 0/0.2 2/1.7
15:01P 12/12.1 9/8.7 2/2.4 1/0.9 15/14.9 32/32.5
11:04P 13:01P 13:02P 13:03P 14:01P 15:01P So this will mean that a lot of unit and pipeline test outputs will need to be changed. How does it sound to you? |
Thanks for addressing this @alexlancaster! Is it possible to have "places" 'float' depending on the number of decimal places, for the formatted output to fill in the space/give the user/reviewer the most information when reviewing the .txt outputs (see below)? For example:
Even if not, this new implementation is definitely an improvement over the previous version. And I agree about not doing actually analyses with the .txt outputs; that's what the TSV outputs are there for. I agree that the HW matrix is less elegant, but it is definitely more informative. Are the + and - in the far left column of the matrix supposed to be old (+) and new (-)? That seems counter intuitive. How difficult would it be to dynamically space the columns in the HWE matrix in these .txt outputs to accommodate the maximum width of an o/e value pair or the allele name? |
Yes, this is the intention. It did a pretty good job of this before, but I just rejiggered it a bit so that it now maximizes the use of any given number of character "places". It's tricky because the exponent length can also vary. So it has to be dynamic: we calculate the precision based on the fixed size of the # input output places
('0.032', '0.03200', 5), # pad out with leading zeros
('0.0433', '0.04330', 5),
('0.04333', '0.04333', 5),
('0.000004333', '4.33e-6', 5), # converts to scientific notation to fit in the 5 character ('places') limit
('0.0000000004333', '4.3e-10', 5),
('0.00000433', '0.000004', 6), # does not convert to scientific notation, because we have 6 characters
('0.00000491', '0.000005', 6), # check rounding!
('0.000000433', '4.33e-7', 6), # again need scientific notation to fit
('0.000000', '0.0000', 4), # handle zero as float, not sci notation Feel free to add additional corner cases so we can make sure we capture all the variations you want to cover, and I'll put them in the unit test.
I will add some text to the documentation as part of the PR.
Oh that's just an artifact of the order in which I did the
It's fiddly - it already does this partially, but it was tuned before we added scientific notation. I'd rather get this feature in, and maybe you can spawn another feature request for that? So long as it's readable and doesn't overlap other characters, that's what I care about most now. |
@sjmack I've created PR #192 that should fix most things, please take a moment to review it. For the HW tables, I added an extra space to all cells to handle the exponential notation You can see all the differences in the diffs for the PR. If it looks good, I'll merge it either in the next patch release (1.0.1), or if you think it's different enough to be considered a "minor" release - i.e. user visible changes, it might be 1.1.0. At one level it's a just bug fix, but it's also an enhancement and somewhat of a change in behaviour. |
Issue #38 fixed the issue of truncating frequencies that eliminate the exponent when shown in scientific notation by forcing decimal notation. However, this means for extremely low frequency haplotypes, they might be shown as zero (e.g.,
0.00
). The better solution would be to truncate but preserve the scientific notation. This is non-trivial using XSLT, so opening up a new issue. Because the current temporary solution, at least makes the output is no longer incorrect, making this an enhancment.Cc @sjmack
The text was updated successfully, but these errors were encountered: