Update SC2 datasets#444
Conversation
There was a problem hiding this comment.
Pull request overview
This PR performs a periodic update of the Nextstrain/Nextclade SARS‑CoV‑2 datasets, adding refreshed “unreleased” dataset artifacts and updating dataset index metadata and changelogs to reflect newly designated Pango lineages.
Changes:
- Add/refresh
unreleased/dataset files (reference FASTA, GFF3 annotations, READMEs) for Wuhan-Hu-1 (ORFs + proteins) and BA.2.86. - Update
data_output/index.jsonto reflect updated Pango lineage counts and expose anunreleasedtag in dataset versions. - Update dataset changelogs under
data/to include newly designated Pango lineages (but currently duplicated multiple times—see comments).
Reviewed changes
Copilot reviewed 14 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/reference.fasta | Adds/refreshes Wuhan-Hu-1 reference sequence for the proteins “unreleased” dataset artifact. |
| data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/genome_annotation.gff3 | Adds/refreshes NCBI-derived genome/protein feature annotation for the proteins “unreleased” artifact. |
| data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/README.md | Adds/refreshes dataset README describing scope/capabilities for proteins “unreleased”. |
| data_output/nextstrain/sars-cov-2/wuhan-hu-1/orfs/unreleased/genome_annotation.gff3 | Adds/refreshes ORF-level genome annotations for Wuhan-Hu-1 ORFs “unreleased”. |
| data_output/nextstrain/sars-cov-2/wuhan-hu-1/orfs/unreleased/README.md | Adds/refreshes dataset README for Wuhan-Hu-1 ORFs “unreleased”. |
| data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/reference.fasta | Adds/refreshes BA.2.86 reference sequence for “unreleased” dataset artifact. |
| data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/genome_annotation.gff3 | Adds/refreshes ORF-level annotations for BA.2.86 “unreleased”. |
| data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/README.md | Adds/refreshes dataset README for BA.2.86 “unreleased”. |
| data_output/index.json | Updates dataset index metadata (Pango counts + unreleased version entries/tags). |
| data/nextstrain/sars-cov-2/wuhan-hu-1/proteins/CHANGELOG.md | Records newly added Pango lineages for the update (currently duplicated multiple times). |
| data/nextstrain/sars-cov-2/wuhan-hu-1/orfs/CHANGELOG.md | Records newly added Pango lineages for the update (currently duplicated multiple times). |
| data/nextstrain/sars-cov-2/BA.2.86/CHANGELOG.md | Records newly added Pango lineages for the update (currently duplicated multiple times). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## Unreleased | ||
|
|
||
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | ||
|
|
||
| <details> | ||
| <summary> Newly included lineages, with designation date in parentheses</summary> | ||
|
|
||
| - RG.2 (2026-01-06) | ||
| - RG.3 (2026-01-06) | ||
| - XFG.3.3.2 (2026-01-06) | ||
| - XFG.3.3.3 (2026-01-06) | ||
| - XGN (2026-01-06) | ||
| - RD.1.1.1 (2026-01-07) | ||
| - RE.3 (2026-01-07) | ||
| - RE.3.1 (2026-01-07) | ||
| - RE.1.1.1 (2026-01-07) | ||
| - RE.1.1.2 (2026-01-07) | ||
| - RE.1.1.3 (2026-01-07) |
There was a problem hiding this comment.
This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).
| ## Unreleased | |
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | |
| <details> | |
| <summary> Newly included lineages, with designation date in parentheses</summary> | |
| - RG.2 (2026-01-06) | |
| - RG.3 (2026-01-06) | |
| - XFG.3.3.2 (2026-01-06) | |
| - XFG.3.3.3 (2026-01-06) | |
| - XGN (2026-01-06) | |
| - RD.1.1.1 (2026-01-07) | |
| - RE.3 (2026-01-07) | |
| - RE.3.1 (2026-01-07) | |
| - RE.1.1.1 (2026-01-07) | |
| - RE.1.1.2 (2026-01-07) | |
| - RE.1.1.3 (2026-01-07) |
|
|
||
| ## Unreleased | ||
|
|
||
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | ||
|
|
||
| <details> | ||
| <summary> Newly included lineages, with designation date in parentheses</summary> | ||
|
|
||
| - RG.2 (2026-01-06) | ||
| - RG.3 (2026-01-06) | ||
| - XFG.3.3.2 (2026-01-06) | ||
| - XFG.3.3.3 (2026-01-06) | ||
| - XGN (2026-01-06) | ||
| - RD.1.1.1 (2026-01-07) | ||
| - RE.3 (2026-01-07) | ||
| - RE.3.1 (2026-01-07) | ||
| - RE.1.1.1 (2026-01-07) | ||
| - RE.1.1.2 (2026-01-07) | ||
| - RE.1.1.3 (2026-01-07) |
There was a problem hiding this comment.
This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).
| ## Unreleased | |
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | |
| <details> | |
| <summary> Newly included lineages, with designation date in parentheses</summary> | |
| - RG.2 (2026-01-06) | |
| - RG.3 (2026-01-06) | |
| - XFG.3.3.2 (2026-01-06) | |
| - XFG.3.3.3 (2026-01-06) | |
| - XGN (2026-01-06) | |
| - RD.1.1.1 (2026-01-07) | |
| - RE.3 (2026-01-07) | |
| - RE.3.1 (2026-01-07) | |
| - RE.1.1.1 (2026-01-07) | |
| - RE.1.1.2 (2026-01-07) | |
| - RE.1.1.3 (2026-01-07) |
| ## Unreleased | ||
|
|
||
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | ||
|
|
||
| <details> | ||
| <summary> Newly included lineages, with designation date in parentheses</summary> | ||
|
|
||
| - RG.2 (2026-01-06) | ||
| - RG.3 (2026-01-06) | ||
| - XFG.3.3.2 (2026-01-06) | ||
| - XFG.3.3.3 (2026-01-06) | ||
| - XGN (2026-01-06) | ||
| - RD.1.1.1 (2026-01-07) | ||
| - RE.3 (2026-01-07) | ||
| - RE.3.1 (2026-01-07) | ||
| - RE.1.1.1 (2026-01-07) | ||
| - RE.1.1.2 (2026-01-07) | ||
| - RE.1.1.3 (2026-01-07) |
There was a problem hiding this comment.
This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).
| ## Unreleased | |
| - Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates: | |
| <details> | |
| <summary> Newly included lineages, with designation date in parentheses</summary> | |
| - RG.2 (2026-01-06) | |
| - RG.3 (2026-01-06) | |
| - XFG.3.3.2 (2026-01-06) | |
| - XFG.3.3.3 (2026-01-06) | |
| - XGN (2026-01-06) | |
| - RD.1.1.1 (2026-01-07) | |
| - RE.3 (2026-01-07) | |
| - RE.3.1 (2026-01-07) | |
| - RE.1.1.1 (2026-01-07) | |
| - RE.1.1.2 (2026-01-07) | |
| - RE.1.1.3 (2026-01-07) |
Description of proposed changes
Periodic update
https://master.clades.nextstrain.org/?dataset-server=gh:@sc2-260416@
Checklist