Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to scientific notation when frequencies can't be displayed as decimals #192

Conversation

alexlancaster
Copy link
Owner

No description provided.

* use for the formatting if 'use_python_extensions' parameter is set,
  otherwise fallback to the previous processing (which skips scientific
  notation for small numbers and just truncates them)
…ic-notation-when-truncating-viewable-haplotype-frequencies
…ic-notation-when-truncating-viewable-haplotype-frequencies
* account for variable length exponents to maximize use of allocated
  space
note that text output has limited precision compared with the TSV output
@alexlancaster alexlancaster added the enhancement New feature or request label Feb 7, 2024
@alexlancaster alexlancaster self-assigned this Feb 7, 2024
* we `break` during the loop, rather than `return`, this allows the
  `with` file handling to close the file
* update all unit test output and MD5 where appropriate

* pad the HW tables by an additional space in each cell to account for
  scientific notation
@sjmack
Copy link
Collaborator

sjmack commented Feb 8, 2024

Some of the examples have non-standard allele names, like 02:025 and 03:012. I think it would be used to use "real world" examples for tests like these to avoid confusion downstream, e.g., 02:125 and 03:212.

However, in general, this looks great. I doubt that we're going to see/need to care about anything with a frequency approaching X*10^-9 (unless you've really increased the 2N capacity), so this looks good.

@alexlancaster
Copy link
Owner Author

Some of the examples have non-standard allele names, like 02:025 and 03:012. I think it would be used to use "real world" examples for tests like these to avoid confusion downstream, e.g., 02:125 and 03:212.

I think most of those derive from the various original synthetic files you generated with the USAFEL and BIGDAWG prefixes. We could fix them by changing the specific alleles globally in the .pop and unit test output.

But functionally, they work. Except for the files that are "included" in the documentation and run as unit tests, they aren't really officially documented examples. But I take your point that it's a good idea for those examples to be real as possible, without being any actual individuals - and we do mention the availability of unit tests serving as templates, in our paper.

However, in general, this looks great. I doubt that we're going to see/need to care about anything with a frequency approaching X*10^-9 (unless you've really increased the 2N capacity), so this looks good.

Yes, and it would still handle that case. Could you click the "review" button on this PR, and "approve"? Then I can think about the merge timing.

@alexlancaster
Copy link
Owner Author

@sjmack do you think this is a big enough change to warrant making this next release a 1.1.0 (i.e. incrementing the "minor" version number), or could it be part of the next "patch" release 1.0.1?

I'm trying to follow "semantic versioning": https://semver.org/ which defines:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

I'm not 100% clear about whether this might be considered "bug fix" (incorrectly rounded ) or added functionality (more precision is now shown)? I'm leaning towards the patch (1.0.1) release.

@rsingle
Copy link
Collaborator

rsingle commented Feb 9, 2024 via email

@sjmack
Copy link
Collaborator

sjmack commented Feb 9, 2024 via email

@alexlancaster alexlancaster merged commit 6126a29 into main Feb 9, 2024
68 checks passed
@alexlancaster alexlancaster deleted the 104-preserve-scientific-notation-when-truncating-viewable-haplotype-frequencies branch February 9, 2024 17:58
@alexlancaster alexlancaster added bug Something isn't working, should not be used for new features: use "enhancement" for those python Pull requests that update Python code and removed enhancement New feature or request labels Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, should not be used for new features: use "enhancement" for those python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch to scientific notation when frequencies can't be displayed as decimals
3 participants