Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gene reference files to generate E gene trees #48

Merged
merged 8 commits into from
May 13, 2024
Merged
2 changes: 1 addition & 1 deletion phylogenetic/rules/export.smk
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ rule prepare_auspice_config:
output:
auspice_config="results/config/{gene}/auspice_config_{serotype}.json",
params:
replace_clade_key="clade_membership",
replace_clade_key=lambda wildcard: r"clade_membership" if wildcard.gene in ['genome'] else r"nextclade_subtype",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the clade_membership is automatically used to create the clade branch label in augur export, this change means the E gene build will not have the automatic clade branch label.

It's possible to create custom branch labels since nextstrain/augur#728, but this uses augur clades to create the labels and the workflow is skipping augur clades for the E gene build 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the nextclade_subtype field is being added through augur traits, maybe augur traits should be updated to support adding branch labels as well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! I'm currently planning to explore serotype and genotype-defining E mutations (which would create clade_membership for E gene builds) in a future PR.

But if that doesn't work out, I'll explore "create custom branch labels ..." route you've referenced.

replace_clade_title=lambda wildcard: r"Serotype" if wildcard.serotype in ['all'] else r"DENV genotype",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mirror the title NCBI serotype added below, maybe the default clade_membership title should be updated to Nextstrain serotype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for flagging! I'm considering renaming NCBI serotype toSerotype (NCBI) to better match naming convention used in measle's Genotype (NCBI) title. Then, I agree that the default title for clade_membership should be renamed from Serotype to Serotype (Nextstrain).

To recap:

  • NCBI serotype -> Serotype (NCBI): indicating that denv1-4 assignment is based on NCBI GenBank record annotation
  • Serotype -> Serotype (Nextstrain): indicating that denv1-4 assignment is based on augur clades call using full-genome-level-serotype-defining amino acid mutations
  • Nextclade genotype -> Genotype (Nextclade): indicating genotype level assignment within the serotype (e.g. DENV1/S) based on Nextclade call

This naming adjustment leaves space for a potential Genotype (NCBI) if we develop a script for parsing genotype annotations from the GenBank data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually I think I'll merge this pull request as it is, and handle the renaming mentioned above in a later PR. That way, we can address it along with fixing this issue: #41.

run:
data = {
Expand Down