From f75390b1f71988ab239615f1b0b2c557bd82d5c4 Mon Sep 17 00:00:00 2001 From: John SJ Anderson Date: Tue, 20 Aug 2024 16:45:21 -0700 Subject: [PATCH 1/3] Relabel clades numerically, rather than geographically [#12] Also update nextclade build README to give mapping back to literature genotypes. --- nextclade/README.md | 15 ++++++- nextclade/defaults/clades.tsv | 78 +++++++++++++++++------------------ nextclade/defaults/colors.tsv | 14 +++---- 3 files changed, 60 insertions(+), 47 deletions(-) diff --git a/nextclade/README.md b/nextclade/README.md index 0fe50bf..fbd5ced 100644 --- a/nextclade/README.md +++ b/nextclade/README.md @@ -1,7 +1,7 @@ # Yellow Fever Virus Nextclade Dataset Tree This workflow creates a phylogenetic tree that can be used as part of -a Nextclade dataset to assign genotypes to yellow fever virus samples +a Nextclade dataset to assign clades to yellow fever virus samples based on [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75). @@ -14,6 +14,19 @@ based on [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and * Provide the following coloring options on the tree: * Genotype assignment from `augur clades` +The clades we annotate (Clade I-VII) are roughly equivalent with the +following genotypes as described in the aforementioned two papers: + +| Clade | Genotype | +|-----------|---------------------| +| Clade I | Angola | +| Clade II | East Africa | +| Clade III | East Central/Africa | +| Clade IV | West Africa I | +| Clade V | West Africa II | +| Clade VI | South America I | +| Clade VII | South America II | + ## How to create a new tree * Run the workflow: `nextstrain build .` diff --git a/nextclade/defaults/clades.tsv b/nextclade/defaults/clades.tsv index 72af86e..39117a7 100644 --- a/nextclade/defaults/clades.tsv +++ b/nextclade/defaults/clades.tsv @@ -1,40 +1,40 @@ clade gene site alt -Angola nuc 111 G -Angola nuc 219 T -Angola nuc 240 C -Angola nuc 246 A -Angola nuc 252 A -Angola nuc 255 A -Angola nuc 291 G -Angola nuc 294 A -Angola nuc 300 A -Angola nuc 315 G -Angola nuc 327 G -Angola nuc 372 A -Angola nuc 420 A -Angola nuc 432 A -Angola nuc 453 T -Angola nuc 492 G -Angola nuc 651 T -Angola nuc 72 A -Angola nuc 81 G -Angola nuc 88 C -Angola nuc 90 A -Angola nuc 99 T -East Africa nuc 171 G -East Africa nuc 438 G -East Africa nuc 45 A -East Africa nuc 468 T -East/Central Africa nuc 228 G -South America I nuc 219 A -South America I nuc 532 A -South America II nuc 114 C -South America II nuc 193 T -South America II nuc 249 A -South America II nuc 639 G -West Africa I nuc 183 G -West Africa I nuc 255 C -West Africa II nuc 270 A -West Africa II nuc 321 T -West Africa II nuc 477 A -West Africa II nuc 93 T +Clade I nuc 111 G +Clade I nuc 219 T +Clade I nuc 240 C +Clade I nuc 246 A +Clade I nuc 252 A +Clade I nuc 255 A +Clade I nuc 291 G +Clade I nuc 294 A +Clade I nuc 300 A +Clade I nuc 315 G +Clade I nuc 327 G +Clade I nuc 372 A +Clade I nuc 420 A +Clade I nuc 432 A +Clade I nuc 453 T +Clade I nuc 492 G +Clade I nuc 651 T +Clade I nuc 72 A +Clade I nuc 81 G +Clade I nuc 88 C +Clade I nuc 90 A +Clade I nuc 99 T +Clade II nuc 171 G +Clade II nuc 438 G +Clade II nuc 45 A +Clade II nuc 468 T +Clade III nuc 228 G +Clade VI nuc 219 A +Clade VI nuc 532 A +Clade VII nuc 114 C +Clade VII nuc 193 T +Clade VII nuc 249 A +Clade VII nuc 639 G +Clade IV nuc 183 G +Clade IV nuc 255 C +Clade V nuc 270 A +Clade V nuc 321 T +Clade V nuc 477 A +Clade V nuc 93 T diff --git a/nextclade/defaults/colors.tsv b/nextclade/defaults/colors.tsv index 75226a7..6b8036c 100644 --- a/nextclade/defaults/colors.tsv +++ b/nextclade/defaults/colors.tsv @@ -1,8 +1,8 @@ # genotypes assigned by augur clades -clade_membership Angola #3F63CF -clade_membership East Africa #529AB6 -clade_membership East/Central Africa #75B681 -clade_membership South America I #A6BE55 -clade_membership South America II #D4B13F -clade_membership West Africa I #E68133 -clade_membership West Africa II #DC2F24 +clade_membership Clade I #3F63CF +clade_membership Clade II #529AB6 +clade_membership Clade III #75B681 +clade_membership Clade IV #A6BE55 +clade_membership Clade V #DC2F24 +clade_membership Clade VI #E68133 +clade_membership Clade VII #D4B13F From 6fbb5fec6e1b08dcb91e1e361780b50d075cef3b Mon Sep 17 00:00:00 2001 From: John SJ Anderson Date: Thu, 22 Aug 2024 10:27:48 -0700 Subject: [PATCH 2/3] Add TSV version of clade-to-genotype mapping, per PR feedback [#12] --- nextclade/README.md | 3 +++ nextclade/defaults/clade-to-genotype.tsv | 8 ++++++++ 2 files changed, 11 insertions(+) create mode 100644 nextclade/defaults/clade-to-genotype.tsv diff --git a/nextclade/README.md b/nextclade/README.md index fbd5ced..991cabc 100644 --- a/nextclade/README.md +++ b/nextclade/README.md @@ -27,6 +27,9 @@ following genotypes as described in the aforementioned two papers: | Clade VI | South America I | | Clade VII | South America II | +(N.b., this table is available as a TSV in this repo, at +`nextclade/defaults/clade-to-genotype.tsv`.) + ## How to create a new tree * Run the workflow: `nextstrain build .` diff --git a/nextclade/defaults/clade-to-genotype.tsv b/nextclade/defaults/clade-to-genotype.tsv new file mode 100644 index 0000000..ff69d0a --- /dev/null +++ b/nextclade/defaults/clade-to-genotype.tsv @@ -0,0 +1,8 @@ +Clade Genotype +Clade I Angola +Clade II East Africa +Clade III East Central/Africa +Clade IV West Africa I +Clade V West Africa II +Clade VI South America I +Clade VII South America II From 5051b9603ab26f7bfd20851eb357d781100449c7 Mon Sep 17 00:00:00 2001 From: John SJ Anderson Date: Thu, 22 Aug 2024 10:39:22 -0700 Subject: [PATCH 3/3] Update nextclade dataset README to reflect clade changes [#12] --- nextclade/defaults/nextclade-dataset/README.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/nextclade/defaults/nextclade-dataset/README.md b/nextclade/defaults/nextclade-dataset/README.md index 409611a..10bb176 100644 --- a/nextclade/defaults/nextclade-dataset/README.md +++ b/nextclade/defaults/nextclade-dataset/README.md @@ -10,7 +10,7 @@ ## Scope of this dataset -This dataset assigns genotypes to yellow fever virus samples based on +This dataset assigns clades to yellow fever virus samples based on strain and genotype information from [Mutebi et al.][] (J Virol. 2001 Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75) @@ -21,6 +21,19 @@ comprises the 3' end of the pre-membrane protein (prM) gene, the entire membrane protein (M) gene, and the 5' end of the envelope protein (E) gene. +The clades we annotate (Clade I-VII) are roughly equivalent with the +following genotypes as described in the aforementioned two papers: + +| Clade | Genotype | +|-----------|---------------------| +| Clade I | Angola | +| Clade II | East Africa | +| Clade III | East Central/Africa | +| Clade IV | West Africa I | +| Clade V | West Africa II | +| Clade VI | South America I | +| Clade VII | South America II | + (N.b., the reference sequence used in this data set is actually 672nt long, from bases 641-1312 of the genome reference. The 2 extra bases make the reference an complete open reading frame.)