-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gene reference files to generate E gene trees #48
Conversation
Until we have a set of E gene serotype and genotype defining mutations, skip the augur clades call for E gene builds. This was originally discussed for commit: 2f48ddb Used in measles: nextstrain/measles@ab92a1b And diagrammed in: #17 (comment)
Add ncbi_serotype annotation to phylogeny, since this can be used as a source-of-truth when adding E gene defining serotypes in subsequent PRs.
Mirroring the change from measles commit: * nextstrain/measles@862dbaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for breaking down the steps @j23414! I've left some non-blocking comments for consideration, but this looks good to merge to me 👍
@@ -42,7 +42,7 @@ rule prepare_auspice_config: | |||
output: | |||
auspice_config="results/config/{gene}/auspice_config_{serotype}.json", | |||
params: | |||
replace_clade_key="clade_membership", | |||
replace_clade_key=lambda wildcard: r"clade_membership" if wildcard.gene in ['genome'] else r"nextclade_subtype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the clade_membership
is automatically used to create the clade
branch label in augur export
, this change means the E gene build will not have the automatic clade
branch label.
It's possible to create custom branch labels since nextstrain/augur#728, but this uses augur clades
to create the labels and the workflow is skipping augur clades
for the E gene build 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the nextclade_subtype
field is being added through augur traits
, maybe augur traits
should be updated to support adding branch labels as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! I'm currently planning to explore serotype and genotype-defining E mutations (which would create clade_membership
for E gene builds) in a future PR.
But if that doesn't work out, I'll explore "create custom branch labels ..." route you've referenced.
@@ -42,7 +42,7 @@ rule prepare_auspice_config: | |||
output: | |||
auspice_config="results/config/{gene}/auspice_config_{serotype}.json", | |||
params: | |||
replace_clade_key="clade_membership", | |||
replace_clade_key=lambda wildcard: r"clade_membership" if wildcard.gene in ['genome'] else r"nextclade_subtype", | |||
replace_clade_title=lambda wildcard: r"Serotype" if wildcard.serotype in ['all'] else r"DENV genotype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To mirror the title NCBI serotype
added below, maybe the default clade_membership
title should be updated to Nextstrain serotype
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for flagging! I'm considering renaming NCBI serotype
toSerotype (NCBI)
to better match naming convention used in measle's Genotype (NCBI)
title. Then, I agree that the default title for clade_membership
should be renamed from Serotype
to Serotype (Nextstrain)
.
To recap:
NCBI serotype
->Serotype (NCBI)
: indicating that denv1-4 assignment is based on NCBI GenBank record annotationSerotype
->Serotype (Nextstrain)
: indicating that denv1-4 assignment is based on augur clades call using full-genome-level-serotype-defining amino acid mutationsNextclade genotype
->Genotype (Nextclade)
: indicating genotype level assignment within the serotype (e.g.DENV1/S
) based on Nextclade call
This naming adjustment leaves space for a potential Genotype (NCBI)
if we develop a script for parsing genotype annotations from the GenBank data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually I think I'll merge this pull request as it is, and handle the renaming mentioned above in a later PR. That way, we can address it along with fixing this issue: #41.
Description of proposed changes
Following up on the Generating gene reference files PR, use the E gene reference files to create E gene trees.
Visual summary (view whole pipeline plan so far)
Related issue(s)
Checklist
Links to resulting trees below
A manual run of the dengue serotype 4 E gene tree might look like:
# from dengue repo top level nextstrain build phylogenetic auspice/dengue_denv4_E.json