-
Notifications
You must be signed in to change notification settings - Fork 28
Add code to compute coverage and all sites AN #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mike-w-wilson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me at first pass -- I have some initial feedback. I know we still need to update to AoU for real testing and also add in downsamplings/histogram merge. Most of my comments are pretty minor/style/org stuff.
| ) | ||
| hl.default_reference("GRCh38") | ||
|
|
||
| # TODO: Remove this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a suggestion from Dan King during v4 prod but it may have become the default -- we can ask Chris in our next meeting, or dive into the hail codebase
| # TODO: Add support for subtracting gnomAD v4 samples. | ||
| aou_ht = _rename_fields(aou_ht, "aou") | ||
| gnomad_ht = _rename_fields(gnomad_ht, "gnomad") | ||
| ht = aou_ht.join(gnomad_ht, "outer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above where coverage used a "left" join. I cant think of a reason why they should be different so just looking for why we specify and why specify differently per stat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. 🤔 not sure...I'll move this to "left" but with the above, happy to use "inner"
|
this is finally ready for re-review, though I've added some discussion questions into the code as TODOs (e.g., should we only merge the overall adj AN when merging the gnomAD + AoU ANs?) Jobs from gnomAD code tests:
AoU testing: https://workbench.researchallofus.org/workspaces/aou-rw-4b0bd63f/gnomadproduction/analysis/z_PR_692_test_run.ipynb |
gnomad_qc/v5/resources/release.py
Outdated
| :param environment: Environment to use. Default is "rwb". | ||
| :return: Coverage TSV path. | ||
| """ | ||
| return f"{_release_root(release_version, test=test, extension='tsv', environment=environment)}/gnomad.genomes.v{release_version}.coverage.all.tsv.bgz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this ever not be "all" (looks like "all" was removed in release_coverage_path cause no longer needed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this should always be all. I think I forgot to remove this here after porting the paths from the v4 resources
|
back to you @klaricch ! New tests:
New AoU tests also in https://workbench.researchallofus.org/workspaces/aou-rw-4b0bd63f/gnomadproduction/analysis/preview/z_PR_692_test_run.ipynb |
Also adds release resources and constants
Code is currently written to use v4 for testing; testing done here https://workbench.researchallofus.org/workspaces/aou-rw-4b0bd63f/gnomadproduction/analysis/preview/z_PR_692_test_run.ipynb