-
Notifications
You must be signed in to change notification settings - Fork 43
Initial upload of CVA16 dataset #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
4dd28c8
Initial upload of CVA16 dataset
nneune 93532e3
fix: rename recombinant branch to "RFs"
nneune 2a711df
fix: pathogen issues
nneune 848a275
fix links in dataset description
nneune 101c14a
fix author links
nneune 8dd7fb8
fix typos and missing line endings
nneune 81eecc6
fix: clade B2 was replaced by C
nneune 7ded68a
fix: remove nucMutLabelMapReverse and divergence qc
nneune b9678f9
Increase minSeedCover, and privateMutations thresholds and use ancest…
nneune 4f142f8
Update data/enpen/enterovirus/cva16/README.md
nneune c46b281
update README: dataset uses ancestral sequence as reference. Remove r…
nneune ba6cb38
update README: clarify reference terminology for ancestral sequence
nneune 3d8de68
update README: mention ongoing improvements for multiple virus assign…
nneune e9b6248
chore: rebuild [skip ci]
nextstrain-bot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,7 @@ | |
| ] | ||
| }, | ||
| "dataset_order": [ | ||
| "enpen/enterovirus/ev-d68" | ||
| "enpen/enterovirus/ev-d68", | ||
| "enpen/enterovirus/cva16" | ||
| ] | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| ## Unreleased | ||
|
|
||
| Initial release of a Coxsackievirus A16 dataset for lineage classification! | ||
|
|
||
| Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Coxsackievirus A16 dataset | ||
|
|
||
| | Key | Value | | ||
| |----------------------|-----------------------------------------------------------------------| | ||
| | authors | [Nadia Neuner-Jehle](https://eve-lab.org/people/nadia-neuner-jehle/), [Alejandra González-Sánchez](https://www.vallhebron.com/en/professionals/alejandra-gonzalez-sanchez), [Emma B. Hodcroft](https://eve-lab.org/people/emma-hodcroft/), [ENPEN](https://escv.eu/european-non-polio-enterovirus-network-enpen/) | | ||
| | name | Coxsackievirus A16 | | ||
| | reference | [Static Inferred Ancestor](https://github.com/enterovirus-phylo/nextclade_a16/blob/master/resources/inferred-root.fasta) | | ||
| | workflow | https://github.com/enterovirus-phylo/nextclade_a16 | | ||
| | path | `enpen/enterovirus/cva16` | | ||
| | clade definitions | A–F | | ||
|
|
||
| ## Scope of this dataset | ||
|
|
||
| This dataset uses the [Static Inferred Ancestor](https://github.com/enterovirus-phylo/nextclade_a16/blob/master/resources/inferred-root.fasta) instead of the historical G-10 prototype sequence ([U05876.1](https://www.ncbi.nlm.nih.gov/nuccore/U05876)). It is intended for broad subgenogroup classification, mutation quality control, and phylogenetic analysis of CVA16 diversity. | ||
|
|
||
| *Note: The G-10 reference differs substantially from currently circulating strains.* This is common for enterovirus datasets, in contrast to some other virus datasets (e.g., seasonal influenza), where the reference is updated more frequently to reflect recent lineages. | ||
|
|
||
| To address this, the dataset is *rooted* on a Static Inferred Ancestor, a phylogenetically reconstructed ancestral sequence near the tree root. This provides a stable reference point that can be used as an alternative for mutation calling. | ||
|
|
||
| ## Features | ||
|
|
||
| This dataset supports: | ||
|
|
||
| - Assignment of subgenotypes | ||
| - Phylogenetic placement | ||
| - Sequence quality control (QC) | ||
|
|
||
| ## Subgenogroups of Coxsackievirus A16 | ||
|
|
||
| Subgenogroups B1a, B1b and B1c represent the major phylogenetic divisions of CVA16 and are commonly used in virological surveillance and the literature. They are defined based on phylogenetic clustering and do not necessarily reflect antigenic differences. | ||
|
|
||
| In recent years, additional recombinant forms have been identified and labeled C-F (also referred to as B2, B3, and D). These recombinant forms cluster with the prototype strain (clade A). | ||
|
|
||
| Overall, these designations are based on phylogenetic structure and characteristic mutations, and are widely used in molecular epidemiology, similar to subgenotype systems for other enteroviruses. Unlike influenza (H1N1, H3N2) or SARS-CoV-2, there is no universally standardized global lineage nomenclature for enteroviruses; naming instead follows conventions established in published studies and surveillance practices. | ||
|
|
||
| ## Related Enteroviruses | ||
|
|
||
| CVA16 is closely related to other Enterovirus A (EV-A) viruses, including EV-A71, EV-A120, and CVA5. If you are not certain that your sequences contain only CVA16, we recommend using the "[Multiple Datasets](https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextclade-web/getting-started.html#multi-dataset-mode)" tab instead of "Single Dataset". We are currently working on improving multiple virus assignment. | ||
|
|
||
| This prevents Nextclade from forcing sequences to align to the CVA16 reference tree. For example, EV-A71 sequences may still align and receive a clade assignment (often near recombinant forms). | ||
|
|
||
| Please be cautious when working with short genes or fragments (e.g., 5'UTR sequences). These regions can be highly conserved across EV-A viruses, making genogroup and subgenogroup assignment prone to errors. In addition, such fragments may originate from recombinant genomes. Recombination is common in enteroviruses, and when analyzing only a fragment, this may go undetected. | ||
|
|
||
| If you are unsure how to proceed, please contact us. We are happy to assist. | ||
|
|
||
| ## Reference types | ||
|
|
||
| This dataset includes several reference points used in analyses: | ||
| - *Static Inferred Ancestor:* Reconstructed ancestral sequence inferred with an outgroup, representing the likely founder of CVA16. Serves as a stable reference. | ||
|
|
||
| - *Parent:* The nearest ancestral node of a sample in the tree, used to infer branch-specific mutations. | ||
|
|
||
| - *Clade founder:* The inferred ancestral node defining a clade (e.g., B1a, B2). Mutations "since clade founder" describe changes that define that clade. | ||
|
|
||
| - *Reference:* RefSeq or similarly established prototype sequence. Here G-10 (U05876.1). | ||
|
|
||
| - *Tree root:* Corresponds to the root of the tree, it may change in future updates as more data become available. | ||
|
|
||
| All references use the coordinate system of the G-10 sequence. | ||
|
|
||
| ## Issues & Contact | ||
| - For questions or suggestions, please [open an issue](https://github.com/enterovirus-phylo/nextclade_a16/issues) or email: eve-group[at]swisstph.ch | ||
|
|
||
| ## What is a Nextclade dataset? | ||
|
|
||
| A Nextclade dataset includes the reference sequence, genome annotations, tree, clade definitions, and QC rules. Learn more in the [Nextclade documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| ##gff-version 3 | ||
| #!gff-spec-version 1.21 | ||
| #!processor NCBI annotwriter | ||
| ##sequence-region U05876.1 1 7413 | ||
| ##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31704 | ||
| U05876.1 Genbank region 1 7413 . + . ID=U05876.1:1..7413;Dbxref=taxon:31704;gb-acronym=CV-A16;gbkey=Src;mol_type=genomic RNA;nat-host=Homo sapiens;strain=G-10 | ||
| U05876.1 Genbank CDS 751 957 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAA50478.1:1..69 | ||
| U05876.1 Genbank CDS 958 1719 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAA50478.1:70..323 | ||
| U05876.1 Genbank CDS 1720 2445 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAA50478.1:324..565 | ||
| U05876.1 Genbank CDS 2446 3336 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAA50478.1:566..862 | ||
| U05876.1 Genbank CDS 3337 3786 . + . Name=2A;product=2A;gbkey=Prot;ID=id-AAA50478.1:863..1012 | ||
| U05876.1 Genbank CDS 3787 4083 . + . Name=2B;product=2B;gbkey=Prot;ID=id-AAA50478.1:1013..1111 | ||
| U05876.1 Genbank CDS 4084 5070 . + . Name=2C;product=2C;gbkey=Prot;ID=id-AAA50478.1:1112..1440 | ||
| U05876.1 Genbank CDS 5071 5328 . + . Name=3A;product=3A;gbkey=Prot;ID=id-AAA50478.1:1441..1526 | ||
| U05876.1 Genbank CDS 5329 5394 . + . Name=3B;product=3B;gbkey=Prot;ID=id-AAA50478.1:1527..1548 | ||
| U05876.1 Genbank CDS 5395 5943 . + . Name=3C;product=3C;gbkey=Prot;ID=id-AAA50478.1:1549..1731 | ||
| U05876.1 Genbank CDS 5944 7329 . + . Name=3D;product=3D;gbkey=Prot;ID=id-AAA50478.1:1732..2193 | ||
|
ivan-aksamentov marked this conversation as resolved.
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.