A Claude skill for evidence-graded research synthesis. Turns research corpora into auditable, cross-referenced claim inventories ready for downstream content work — articles, exhibition catalogues, content series, illustration briefs.
Last updated: 2026-05-25
Source-synthesis extracts atomic claims from research documents and grades each one on a seven-level A–G evidence scale. Every claim traces back to a specific source and section. Cross-reference passes find where sources contradict each other, where claims inflate as they travel through the corpus, and where the source base has gaps.
The output is not a summary. It is a queryable inventory: every fact tagged with evidence quality, source key, content potential, and — in artifact mode — physical-object metadata for illustration work.
Concrete extraction units the skill produces:
- Atomic claims with the full per-claim template (8 fields)
- Routine one-liners for uncontested facts that don't need full ceremony
- Issue table of contradictions, tensions, inflation chains, formulation variants, and single-source gaps
- Facts inventory sorted by theme and evidence grade
- Content seeds ranked anchor / supporting / sidebar for downstream writing
- Object map linking the same artifact across multiple sources (artifact mode)
- Visual briefs at single-object, assemblage, or type-level scope (artifact mode)
The skill picks one of three modes based on corpus size:
| Corpus | Mode | Headline output |
|---|---|---|
| 1–2 documents | Full atomic extraction | Every claim, every field |
| 3–6 documents | Batched thematic extraction | Theme tables with atomic detail for loaded claims |
| 7+ documents | Synthesis-with-cross-references | Cross-cluster contradictions, inflation chains, structural gaps |
For corpora of 7+ documents the skill also runs a cross-cluster pass that surfaces findings no single cluster reveals. A documented delegation pattern lets you split the corpus across parallel Claude subagents without losing consistency — the orchestrator inlines the skill's invariants into each subagent prompt so all clusters produce structurally identical outputs.
- Journalists and nonfiction writers building articles from many sources
- Museum curators structuring exhibition catalogues
- Medical and scientific writers separating RCT evidence from observational signal
- Researchers turning literature reviews into a usable claim database
- Anyone who needs to separate what is proven from what is assumed from what is legend
The skill uses the same A–G letter scale across domains, but the meaning of each letter is set by a domain preset chosen at setup. Two presets ship in the box; for any other domain the skill asks you up-front, proposes the closest preset, and records the adopted mapping in the output.
| Grade | What it means | Example |
|---|---|---|
| A | Excavated object with inventory number, measurements, published report | "Artik dagger, HMA inv. 2346-331, 44 × 4.6 cm" |
| B | Archaeological context without full catalogue | "Karmir Blur yielded iron swords with bronze handles" |
| C | Archival or historical record (subtypes: C-doc, C-stat, C-trans) | "297 of 454 weaponsmiths were Armenian" |
| D | Historical text (chronicle, ancient author) | "Xenophon describes Armenian cavalry at the Centrites" |
| E | Modern scholarly interpretation | "Dymydyuk concludes the saber arrived via Seljuk influence" |
| F | Legend, epic, oral tradition | "David of Sassoun wielded a bulat steel sword" |
| G | Popular secondary or unverified web content | "Metsamor is the world's oldest foundry" |
| Grade | What it means | Example |
|---|---|---|
| A | Pre-registered high-quality RCT or Cochrane-grade systematic review | "Semaglutide reduced MACE 20% (HR 0.80, CI 0.72–0.90) in SELECT" |
| B | Smaller RCT or prospective cohort with hard endpoint | "MIND-AD-mini (n=137): cognitive composite β=0.18, p=0.04" |
| C | Large observational cohort or case-control (subtypes: C-cohort, C-case-control, C-registry) | "UK Biobank (n=502,629): short sleep HR 1.30 for dementia" |
| D | Mechanistic, animal, cross-sectional, or surrogate endpoint | "Intermittent fasting improved hippocampal neurogenesis in mice" |
| E | Narrative review, clinical guideline, consensus statement | "2024 AAN guideline recommends ≥150 min/wk physical activity" |
| F | Case report or case series | "Three patients showed remission after psilocybin-assisted therapy" |
| G | Marketing, popular media, unsourced web content | "Omega-3 supports brain health (manufacturer page, no citation)" |
Grade by the weakest essential element of the claim, not the strongest. If a claim mixes a measurable fact with an evaluative judgment, the skill splits it. The measurable part gets its strong grade; the evaluative part gets its weaker grade separately. This prevents interpretive leaps from hiding behind hard facts — the most common failure mode of research synthesis.
Copy the source-synthesis/ folder into your skills directory. Use ~/.claude/skills/ for a global install, or .claude/skills/ inside a project for project-local:
source-synthesis/
├── SKILL.md
└── references/
├── evidence-grades.md # universal principles, compound notation
├── visual-brief-guide.md # illustration brief rules (artifact mode)
└── grading/
├── archaeology-history.md # A–G preset for material-culture corpora
└── medical-scientific.md # A–G preset for clinical / biomedical corpora
Add the contents of SKILL.md, references/evidence-grades.md, and whichever grading preset matches your corpus to your project's knowledge base. Include references/visual-brief-guide.md only if you'll use artifact mode for physical-object illustration briefs.
Upload your research documents and say any of:
- "Extract facts from these research files"
- "Find contradictions across these sources"
- "What do the sources actually say about X?"
- "Build a content plan from this research"
- "Grade the evidence in these reports"
- "Where is the corpus inflating observational signals?"
- "Which interventions have RCT-grade evidence vs. only observational?" (medical corpora)
- "Generate a visual brief for the Artik dagger" (archaeology, artifact mode)
The skill processes documents section by section — no cherry-picking. The first thing it does is fix four parameters: output language, corpus-size mode, grading preset, artifact mode on/off. Those four decisions get stated at the top of every output so the rules of the run are visible to a reviewer.
The skill was built and refined on real corpora, not synthetic test cases:
- 19-document Armenian-weapons corpus (~1.8 MB markdown) split into 8 thematic clusters processed by parallel subagents. Produced 290 atomic claims plus 150 routine one-liners, 10 cross-cluster contradictions, 6 inflation chains, and 13 structural gaps. Wall-clock: ~9 minutes.
- 3-document brain-health corpus in medical-scientific mode (batched). Produced 30 atomic claims plus 23 one-liners and 33 cross-reference issues, including a structural finding that a 2026 systematic review omitted the three largest null RCTs in its own domain.
Both runs were used to harden the skill: the Armenian run produced the synthesis-with-cross-references workflow, the cross-cluster pass, the object map, and the delegation pattern; the medical run cross-validated the framework outside its origin domain.
What is source-synthesis? A Claude skill that extracts atomic claims from research documents and grades each on a seven-level A–G evidence scale, then cross-references claims to detect contradictions, inflation chains, and gaps. It ships with archaeology and medical-scientific grading presets and scales from 1 to 19+ documents.
How is this different from asking Claude to summarize my research? Summarization compresses everything into prose. Source-synthesis produces a queryable claim inventory where every fact is graded for evidence quality, traced back to a specific source section, and cross-referenced for contradictions. The output is auditable — every claim links to its source, so a reviewer can verify the chain.
What document formats does it accept? Any plain-text format Claude can read: Markdown, PDF, TXT, HTML, Word documents. The skill processes documents section by section in their original order, so well-structured Markdown gives the cleanest output. Image-only PDFs need OCR first.
Does it need the internet or external APIs? No. Source-synthesis works entirely on documents you provide. It does not call external databases, search engines, or fact-checking APIs. All cross-referencing happens within your supplied corpus, which makes it suitable for confidential research, internal company reports, or pre-publication academic work.
How large a corpus can it handle? The largest documented run processed 19 documents (~1.8 MB) in 8 thematic clusters using parallel subagents. The skill defines explicit modes for 1–2, 3–6, and 7+ documents, with different output formats per mode. Beyond ~20 documents, the delegation pattern across parallel subagents becomes load-bearing.
Can I add my own evidence-grading preset for a new domain?
Yes. The universal grading framework in references/evidence-grades.md is domain-agnostic. For social science, policy research, design research, business analysis, or any other domain, the skill prompts you to confirm or adapt the A–G mapping and records the adopted mapping at the top of every output.
Does it work in Claude.ai without Claude Code?
Yes. Add SKILL.md, references/evidence-grades.md, and the relevant grading preset to your Claude.ai project's knowledge base. The workflow runs identically; only the install mechanism differs.
How accurate is the evidence grading? The grading is rule-based: each letter has explicit requirements (e.g., A in medical-scientific requires pre-registration ID, trial identifier, primary endpoint, sample size, and CI). The skill applies these rules consistently — accuracy depends on the source documents being faithful about what they cite. The skill does not verify primary literature; it grades what the sources claim.
Built for a project on the history of Armenian bladed weapons (Bronze Age through 1914), where the source base spans archaeological excavation reports, medieval chronicles, museum catalogues, 19th-century guild records, and modern scholarly interpretations — all with different evidence standards. The skill needed to handle "this dagger has an inventory number" and "the epic says the hero had a magic sword" in the same framework without pretending they are the same kind of evidence.
Refined through a three-persona audit (journalist, data analyst, museum curator), then validated on the 19-document Armenian corpus and cross-domain-tested on a medical-research corpus.
MIT