Skip to content

Update SC2 datasets#444

Merged
corneliusroemer merged 4 commits intomasterfrom
sc2-260416
Apr 21, 2026
Merged

Update SC2 datasets#444
corneliusroemer merged 4 commits intomasterfrom
sc2-260416

Conversation

@corneliusroemer
Copy link
Copy Markdown
Member

@corneliusroemer corneliusroemer commented Apr 21, 2026

Description of proposed changes

Periodic update

https://master.clades.nextstrain.org/?dataset-server=gh:@sc2-260416@

Checklist

  • Check if changes affect downstream workflows which depend on this dataset. For instance, Nextstrain ingest workflows may break if clade nomenclature changes. Consider fixing those workflows or making an issue at least.

Copilot AI review requested due to automatic review settings April 21, 2026 09:05
@corneliusroemer corneliusroemer deployed to refs/pull/444/merge April 21, 2026 09:05 — with GitHub Actions Active
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR performs a periodic update of the Nextstrain/Nextclade SARS‑CoV‑2 datasets, adding refreshed “unreleased” dataset artifacts and updating dataset index metadata and changelogs to reflect newly designated Pango lineages.

Changes:

  • Add/refresh unreleased/ dataset files (reference FASTA, GFF3 annotations, READMEs) for Wuhan-Hu-1 (ORFs + proteins) and BA.2.86.
  • Update data_output/index.json to reflect updated Pango lineage counts and expose an unreleased tag in dataset versions.
  • Update dataset changelogs under data/ to include newly designated Pango lineages (but currently duplicated multiple times—see comments).

Reviewed changes

Copilot reviewed 14 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/reference.fasta Adds/refreshes Wuhan-Hu-1 reference sequence for the proteins “unreleased” dataset artifact.
data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/genome_annotation.gff3 Adds/refreshes NCBI-derived genome/protein feature annotation for the proteins “unreleased” artifact.
data_output/nextstrain/sars-cov-2/wuhan-hu-1/proteins/unreleased/README.md Adds/refreshes dataset README describing scope/capabilities for proteins “unreleased”.
data_output/nextstrain/sars-cov-2/wuhan-hu-1/orfs/unreleased/genome_annotation.gff3 Adds/refreshes ORF-level genome annotations for Wuhan-Hu-1 ORFs “unreleased”.
data_output/nextstrain/sars-cov-2/wuhan-hu-1/orfs/unreleased/README.md Adds/refreshes dataset README for Wuhan-Hu-1 ORFs “unreleased”.
data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/reference.fasta Adds/refreshes BA.2.86 reference sequence for “unreleased” dataset artifact.
data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/genome_annotation.gff3 Adds/refreshes ORF-level annotations for BA.2.86 “unreleased”.
data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/README.md Adds/refreshes dataset README for BA.2.86 “unreleased”.
data_output/index.json Updates dataset index metadata (Pango counts + unreleased version entries/tags).
data/nextstrain/sars-cov-2/wuhan-hu-1/proteins/CHANGELOG.md Records newly added Pango lineages for the update (currently duplicated multiple times).
data/nextstrain/sars-cov-2/wuhan-hu-1/orfs/CHANGELOG.md Records newly added Pango lineages for the update (currently duplicated multiple times).
data/nextstrain/sars-cov-2/BA.2.86/CHANGELOG.md Records newly added Pango lineages for the update (currently duplicated multiple times).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +213 to +230
## Unreleased

- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).

Suggested change
## Unreleased
- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:
<details>
<summary> Newly included lineages, with designation date in parentheses</summary>
- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +230

## Unreleased

- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).

Suggested change
## Unreleased
- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:
<details>
<summary> Newly included lineages, with designation date in parentheses</summary>
- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +230
## Unreleased

- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog repeats the entire "## Unreleased" section multiple times (starting again at this header), which bloats the file and makes it hard to understand what actually changed. Please deduplicate so there is a single "Unreleased" entry for this update (and ensure the generation script doesn't append the same block repeatedly).

Suggested change
## Unreleased
- Add all 202 Pango lineages newly designated between 2025-12-24 and 2026-04-15. Unfold below to see a list of all newly included lineages with their designation dates:
<details>
<summary> Newly included lineages, with designation date in parentheses</summary>
- RG.2 (2026-01-06)
- RG.3 (2026-01-06)
- XFG.3.3.2 (2026-01-06)
- XFG.3.3.3 (2026-01-06)
- XGN (2026-01-06)
- RD.1.1.1 (2026-01-07)
- RE.3 (2026-01-07)
- RE.3.1 (2026-01-07)
- RE.1.1.1 (2026-01-07)
- RE.1.1.2 (2026-01-07)
- RE.1.1.3 (2026-01-07)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants