jang1563 · jang1563 · Jun 5, 2026 · Jun 5, 2026
diff --git a/docs/SPACEBIOBENCH_EVALUATION_CARD.md b/docs/SPACEBIOBENCH_EVALUATION_CARD.md
@@ -34,16 +34,15 @@ excluded from the current public-review path.
 
 ## Evaluation Flow
 
-```mermaid
-flowchart LR
-  A["Source inventory"] --> B["Task manifest"]
-  B --> C["Held-out mission fold"]
-  C --> D["Baseline or submitted predictions"]
-  D --> E["Metrics with task/fold ids"]
-  E --> F["Per-task interpretation"]
-  F --> G["Pooled summary with caveats"]
-  G --> H["Claim register language"]
-```
+| Stage | Evidence to inspect | Interpretation control |
+|---|---|---|
+| 1. Source inventory | OSDR accessions, tissue labels, mission labels, access status, and checksum-manifest evidence | Confirms the public data source before interpreting any score |
+| 2. Task manifest | Task id, tissue, feature namespace, source ids, label map, and metric ids | Defines what the evaluation is actually testing |
+| 3. Held-out mission fold | Train/test mission split, row counts, and selected-gene counts | Keeps mission-held-out validation separate from random-split performance |
+| 4. Prediction and metric files | Baseline or submitted predictions, task/fold ids, AUROC, macro-F1, balanced accuracy, calibration | Ties every metric to a concrete task and fold surface |
+| 5. Per-task interpretation | Tissue-specific and fold-specific behavior | Prevents pooled means from hiding failures or confounding |
+| 6. Pooled summary | Aggregate result only after task/fold checks | Allows navigation-level summaries with mission, tissue, baseline, and payload caveats |
+| 7. Claim register language | Allowed, blocked, and future-only wording | Converts evaluation evidence into release-safe public claims |
 
 The evaluation flow is intentionally claim-aware. A score is first interpreted
 at the task and fold level, then summarized only with caveats about mission,

diff --git a/docs/SPACEBIOBENCH_SYSTEM_CARD.md b/docs/SPACEBIOBENCH_SYSTEM_CARD.md
@@ -43,16 +43,13 @@ The project currently has multiple surfaces with different maturity levels:
 
 ## System Boundary Map
 
-```mermaid
-flowchart LR
-  A["Public NASA OSDR sources"] --> B["Task manifests and fold definitions"]
-  B --> C["Baseline runs and result summaries"]
-  C --> D["System, evaluation, release, and claim cards"]
-  D --> E["Allowed benchmark claims"]
-  D --> F["Blocked clinical, crew-health, countermeasure, and Mars-regime claims"]
-  B --> G["v9 metadata-alpha scaffold"]
-  G --> H["Payload hashing pending"]
-```
+| Boundary layer | Evidence entering the layer | What the current card allows | What remains blocked |
+|---|---|---|---|
+| Source layer | Public NASA OSDR sources, source inventory rows, OSDR API evidence, checksum-manifest evidence | Public source/provenance claims with accession-level traceability | Claims about private, controlled, or non-public human sequence data |
+| Task layer | Task manifests, fold definitions, held-out mission labels, feature namespaces | Mission-held-out benchmark task claims when task and fold ids are named | Treating mission labels as pure biology or operational readiness evidence |
+| Result layer | Baseline runs, metric files, prediction rows, v7.1 canonical result summaries | Benchmark and workflow-evidence claims tied to the correct release surface | Mixed-surface leaderboard, model-superiority, or biological mechanism claims |
+| Transparency layer | System card, evaluation card, release readiness card, and claim register | Allowed benchmark claims with explicit scope and caveats | Blocked clinical, crew-health, countermeasure, and Mars-regime claims |
+| v9 metadata-alpha layer | Public bulk task/source/provenance scaffold and baseline anchors | Metadata-alpha and scaffold-baseline language | Frozen payload release claims until payload hashing and release gates pass |
 
 This map shows the boundary the cards enforce: benchmark evidence can support
 task, fold, metric, provenance, and release-readiness claims, but it cannot

diff --git a/docs/SPACEBIOBENCH_TRANSPARENCY_CARD_PACK.md b/docs/SPACEBIOBENCH_TRANSPARENCY_CARD_PACK.md
@@ -39,16 +39,14 @@ extension lanes remain outside this public-review path.
 
 ## Three-Minute Review Map
 
-```mermaid
-flowchart LR
-  A["Portfolio brief"] --> B["System card"]
-  B --> C["Evaluation card"]
-  C --> D["Release readiness card"]
-  D --> E["Claim register"]
-  B --> F["Data and provenance boundary"]
-  C --> G["Task, fold, metric, and baseline interpretation"]
-  E --> H["Allowed, blocked, and future-only language"]
-```
+| Review step | Open this | What to verify |
+|---|---|---|
+| 1 | [Portfolio brief](SPACEBIOBENCH_PORTFOLIO_BRIEF.md) | Project contribution, role-relevant signal, and concise application summary |
+| 2 | [System card](SPACEBIOBENCH_SYSTEM_CARD.md) | Benchmark scope, data surfaces, provenance boundary, and out-of-scope claims |
+| 3 | [Evaluation card](SPACEBIOBENCH_EVALUATION_CARD.md) | Task, fold, metric, baseline, and pooled-summary interpretation |
+| 4 | [Release readiness card](SPACEBIOBENCH_RELEASE_READINESS_CARD.md) | Release tier, evidence gates, and blockers for stronger public wording |
+| 5 | [Claim register](SPACEBIOBENCH_CLAIM_REGISTER.md) | Allowed wording, blocked wording, support level, and future-only claims |
+| Cross-check | [Canonical v7.1 results](CANONICAL_RESULTS_V7_1.md) and [v9 dataset card draft](v9_hf_dataset_card.md) | Whether a statement belongs to the canonical result surface, metadata-alpha scaffold, or a future release lane |
 
 The intended reading order is portfolio brief first, then the system card,
 evaluation card, release readiness card, and claim register. This keeps the

diff --git a/docs/hf_dataset_card.md b/docs/hf_dataset_card.md
@@ -16,7 +16,7 @@ tags:
   - single-cell
   - spatial-transcriptomics
 size_categories:
-  - 1GB<n<10GB
+  - 100M<n<1GB
 language:
   - en
 pretty_name: "GeneLab Spaceflight Transcriptomics Benchmark"

diff --git a/tests/test_review_fixes.py b/tests/test_review_fixes.py
@@ -318,6 +318,7 @@ def test_public_release_metadata_uses_v7_consistently(self):
         self.assertIn("note    = {v7.1.2 documentation, public-card, metadata, and evidence-visibility patch over canonical v7.1 results; data freeze 2026-03-01}", readme)
         self.assertIn("Version: v7.1.2 public-card/metadata/evidence-visibility patch | Canonical results: v7.1 | Dataset freeze: 2026-03-01", hf_card)
         self.assertIn("note    = {v7.1.2 documentation, public-card, metadata, and evidence-visibility patch over canonical v7.1 results; data freeze 2026-03-01}", hf_card)
+        self.assertIn("  - 100M<n<1GB", hf_card)
         self.assertIn('version: "7.1.2"', citation)
         self.assertIn('date-released: "2026-06-05"', citation)
         self.assertIn('notes: "Manuscript in preparation; v7.1.2 documentation, public-card, metadata, and evidence-visibility patch."', citation)
@@ -328,6 +329,7 @@ def test_public_release_metadata_uses_v7_consistently(self):
         self.assertNotIn("Kang", citation)
         self.assertNotIn("Jaeyoung", citation)
         self.assertNotIn("Jihoon", readme + hf_card + citation)
+        self.assertNotIn("1GB<n<10GB", hf_card)
         self.assertNotIn("blob/v3/docs/SPACEBIOBENCH", hf_card)
         self.assertNotIn('version: "5.0.0"', citation)
         self.assertNotIn("Target journal:", citation)
@@ -338,12 +340,15 @@ def test_public_card_pack_includes_visual_review_path(self):
         evaluation_card = self.read_repo_text("docs/SPACEBIOBENCH_EVALUATION_CARD.md")
 
         self.assertIn("## Three-Minute Review Map", card_pack)
-        self.assertIn("```mermaid", card_pack)
+        self.assertIn("| Review step | Open this | What to verify |", card_pack)
         self.assertIn("[docs/SPACEBIOBENCH_SYSTEM_CARD.md](SPACEBIOBENCH_SYSTEM_CARD.md)", card_pack)
         self.assertIn("## System Boundary Map", system_card)
+        self.assertIn("| Boundary layer | Evidence entering the layer | What the current card allows | What remains blocked |", system_card)
         self.assertIn("Blocked clinical, crew-health, countermeasure, and Mars-regime claims", system_card)
         self.assertIn("## Evaluation Flow", evaluation_card)
+        self.assertIn("| Stage | Evidence to inspect | Interpretation control |", evaluation_card)
         self.assertIn("Claim register language", evaluation_card)
+        self.assertNotIn("```mermaid", card_pack + system_card + evaluation_card)
 
     def test_public_v9_metadata_alpha_subset_is_inspectable(self):
         readme = self.read_repo_text("README.md")