-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: preparation for somatic cnv #590
Draft
ericblanc20
wants to merge
29
commits into
main
Choose a base branch
from
587-preparation-for-somatic-cnv
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…of Rscript, for more verbose logs
…to normal sample mapping
… BAQ, changed skip on flags), protect against None model values, changed regions option name
5d0043e
to
9660c7c
Compare
tedil
requested changes
Jan 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, this is a lot! Thank you for all the work and effort that went into it:
Here are some remarks, questions & feedback, in no particular order:
- make sure
VCF_TAG_PATTERN
andANNOTATION_VCF_TAG_PATTERN
cover the definitions of the VCF specification - try to use the model and its attributes where applicable instead of dict access (i.e.
wf.config.property.…
instead ofconfig["step_config"][…]
) - use functions for getting resources and params/args, so we can more easily add dynamic resource estimation later on
- I think "last" is an unfortunate name for an action, I'd prefer something akin to "finalize" or "gather_final_result" or …)
- make use of
dictify
andlistify
consistently - should we produce a little illustration for the documentation on how the steps introduced here interact/intertwine/support one another?
- avoid code duplication:
- I think
_collapsed_arg_value
andcollapse_args
appear multiple times with basically the same code; in this case, we may also find a cleaner way to do it, but it's fine for now! get_args
is often the same 2 lines I think, is that not part of Base/AbstractStepXYZ already?
- I think
- hardcoded
extra_args
seem weird to me, can they not be part of the default config instead? - instead of using the
do_md5
argument, we could add an automatic check for this - TODO: check regular expressions for correctness
- use named groups in regexes
- for even more comments, see the respective line comments ;)
snappy_wrappers/wrappers/bcftools/merge_germline_and_somatic/wrapper.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Till Hartmann <[email protected]>
Co-authored-by: Till Hartmann <[email protected]>
…variants_for_cnv steps
…aside (because it's not quite complete)
… workflow config attributes & removed unnecessary resource allocation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds several (fairly simple & simple-minded) steps required for proper CNV calling:
guess_sex
: simple inference of sex for autosome & sex chromosome coveragegermline_snvs
: simple identification of well-supported germline SNPs. Thevariant_calling
step unfortunately cannot be used for this task, as it is designed for trios.somatic_variants_for_cnv
: creates input for cnv tools using B-allele fractions to improve/verify CNV calls based on coverage alone. Thesomatic_variant_calling
step cannot be used, as the somatic variants frommutect2
differ greatly when germline variants are included or not.The current code is OK, but can certainly be improved:
germline_snvs/__init__.py
snappy_wrapper
is probably possible. Also, the derivedBcftoolsWrapper
is a first attempt at streamlining UNIX-like tools (such asbcftools
,bedtools
,bedops
,samtools
,rnaqc
, ...). Its design should be critically reviewed, before similar wrappers are built.ignored_chroms
should also be seen as a first attempt to be critically reviewed. The code ingenome_windows
is exercised in theignored_chroms
wrapper, called from thegermline_snvs
&somatic_variants_for_cnv
snakefiles.