-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow for producing the Nextclade dengue dataset #25
Conversation
8c7755d
to
e8b059b
Compare
https://github.com/nextstrain/dengue/tree/75d9c5fc01e48d1d8385b11fd8cf295ec5b995c2/phylogenetic Subsequent commits will reuse the phylogenetic config and bin directories to avoid duplication.
e8b059b
to
5717648
Compare
5717648
to
d1fef70
Compare
This PR so-far creates a Nextclade nextstrain build nextclade test_output/all
|
6c778db
to
9705755
Compare
Co-authored-by: Jover Lee <[email protected]>
Since dengue sequences seem to contain many mutations - too many for the browser SVG engine to render efficiently in Nextclade's sequence views - we will set the default CDS to display to the E gene as the "main" gene of interest. Viewing the full genome and other gene/CDS regions can still be displayed by selection from the dropdown menu at the top. Flagged by the following comment: nextstrain/nextclade_data#203 (comment)
Applies fixes to the dataset so far 1. Gff coordinate fixup 2. Adding the example sequences 3. Set defaultCds to the E gene
9705755
to
baf0263
Compare
After some discussion with a few people, I may move the 'fine-tuning' of the "dengue/all" dataset commits to a new draft PR since we are still testing solutions. This approach allows us to merge a functional workflow for assembling a Nextclade dataset, providing a base from which we can test different solutions. @joverlee521, this scoped PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added workflow makes sense to me. This looks good to merge and leave fine-tuning the all dataset in another PR.
I do wonder if we can just drop nextclade/datasets/
since the datasets are being officially added in nextstrain/nextclade_data#203? There's no need to maintain the datasets in two places.
Yes, I wondered that as well. But then decided to keep it as a foundation for a "fine-tuning" PR or for others who might want to create separate branches to explore different solutions from the existing dataset. My plan is to delete this when nextstrain/nextclade_data#203 is finalized and merged. |
Description of proposed changes
Introduce a workflow dedicated to generating the Nextclade dataset for dengue serotypes and
subtypesgenotypes. This workflow will be housed in a designatednextclade
folder, aligning with the pathogen-repo-guide/nextclade. This workflow is for streamlined dataset creation, testing, and debugging.The changes can be summarized as follows:
nextclade
directory to adhere to the pathogen-repo-guide/nextclade. Start with a copy of the Nextclade README from thepathogen-repo-guide/nextclade
repository.tree.json
files.pathogen.json
). Rules copied from mpox.Related issue(s)
Checklist