The autopsy engine for clinical trials.
Give it a trial, a drug, or a target. It classifies why a trial stopped, links the biology across ClinicalTrials.gov, openFDA, Open Targets, and ChEMBL, grades every claim, and returns an evidence-backed mechanistic report. When a trial did not stop for a scientific reason, it says so and refuses to invent one.
About 1 in 10 drugs that enter clinical trials reaches approval, and most of the failures are never written up in one place. The registry records a free-text reason for stopping, with no controlled vocabulary, so "the science failed" and "the site lost its coordinator" sit in the same field. Two very different questions get conflated:
- A single trial usually stops for operational reasons. Insufficient accrual is the single most common cause across the literature.
- A whole drug program usually dies from biology.
Reasoning from one NCT id straight to "the mechanism failed" is wrong far more often than it is right. trialmortem keeps the two layers apart on purpose, and it treats knowing when not to tell a biological story as the core feature, not a caveat.
Thin wrappers around the ClinicalTrials.gov API already exist. The forensic reasoning layer does not. The precise claim, the one worth scrutiny, is this:
A single-command, open-source, reproducible tool that ingests a trial (or drug, or target), classifies why it stopped, links it to mechanistic evidence across ClinicalTrials.gov, openFDA, Open Targets, and ChEMBL plus the literature, and emits a confidence-graded post-mortem with explicit abstention.
Prior art is real and worth citing: population-level retrospectives (Nature Genetics 2024), labeled datasets (ClinicalRisk), and target-evidence platforms (Open Targets). None of them is a thing you can install and run on one trial.
# with uv (recommended)
uv tool install trialmortem
# or run without installing
uvx trialmortem NCT00134264
# or plain pip
pip install trialmortemFrom source:
git clone https://github.com/ssatanis/trialmortem
cd trialmortem
uv pip install -e ".[dev]"No API key is required. The --no-llm path does full retrieval, classification,
and evidence assembly using deterministic heuristics.
trialmortem NCT01399593 --no-llm╭─ trialmortem ──────────────────────────────────────────────────────────────╮
│ Safety & Efficacy of Eculizumab to Prevent AMR in ... TERMINATED │
│ NCT01399593 Phase 2 Alexion Pharmaceuticals, Inc. │
│ │
│ Verdict: Mechanistic analysis warranted │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Why it stopped ────────────────────────────────────────────────────────────╮
│ primary EFFICACY_FUTILITY │
│ confidence B Reported, single source │
│ rationale whyStopped explicitly cites "did not achieve". │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Evidence ──────────────────────────────────────────────────────────────────╮
│ 2 B Eculizumab acts as a complement c5 inhibitor. chembl │
│ 3 B Eculizumab appears in 2256 post-market reports ... openfda │
│ 4 A 100 registered trials use this intervention; 15 ... ctgov │
╰─────────────────────────────────────────────────────────────────────────────╯
Operational stops look different. The headline is the abstention:
trialmortem NCT02145078 --no-llm
# Verdict: Not a scientific failure
# primary RECRUITMENT rationale whyStopped explicitly cites "accrual".The most famous failures teach the same lesson from the other side. The registry field for the torcetrapib outcome trial is blank, so the single-trial verdict is honest about that rather than inventing a cause:
trialmortem NCT00134264 --no-llm
# Verdict: Investigate, reason unreported (whyStopped was empty)Other surfaces and modes:
trialmortem NCT00134264 --json > postmortem.json # the stable JSON contract
trialmortem NCT00134264 --md > postmortem.md # a shareable Markdown brief
trialmortem --drug semagacestat # program-level synthesis
trialmortem --target CETP # class-level view
trialmortem batch trials.csv > out.jsonl # one JSON object per line
trialmortem runs # list reproducible runs
trialmortem replay 2026-06-04T15-22-09 # re-render a past run offlineTo turn on synthesis, give any one provider a key and drop --no-llm:
export ANTHROPIC_API_KEY=... # or OPENAI_API_KEY, or run a local Ollama
trialmortem NCT00134264 --model anthropic:claude-3-5-haiku-latestSix stages. The first two always run. Stage three is a gate, and stages four through six only run on its verdict.
NCT / drug / target
-> [1] retrieve and normalize ClinicalTrials.gov v2 to a canonical Trial
-> [2] classify the stop whyStopped to a controlled taxonomy + confidence
-> [3] abstention gate is biology even relevant here?
| \
| operational \ safety / efficacy / unknown-with-signal
v v
honest report [4] build the dossier ChEMBL, Open Targets, openFDA, literature, siblings
"not a scientific [5] synthesize RAG-constrained, ranked hypotheses with support and counter-evidence
failure" [6] render TUI + JSON + Markdown + a provenance manifest
The gate is the whole point. If the stop class is operational (recruitment, sponsor or business, logistics) the tool produces a useful report, the program context and the target's evidence profile, but withholds mechanistic hypotheses and headlines that this was not a scientific failure. Synthesis can cite only the retrieved dossier, so it cannot wander off into invented mechanisms.
Three surfaces render from one object.
- Terminal (default): a Rich report with a status badge, a color-coded verdict, the classification, ranked hypotheses with confidence chips, a "what we don't know" panel, and sourced evidence.
--json: the versioned contract below. Stable, for pipelines.--md: a Markdown brief for a PR, a Slack thread, or teaching.
{
"query": {"type": "nct", "value": "NCT01399593"},
"trial": {"nct_id": "NCT01399593", "phase": "PHASE2", "status": "TERMINATED", "...": "..."},
"stop_classification": {"primary": "EFFICACY_FUTILITY", "confidence": "B", "rationale": "..."},
"verdict": "MECHANISTIC_ANALYSIS_WARRANTED",
"mechanistic_analysis": {"hypotheses": [{"statement": "...", "confidence": "C"}]},
"program_context": {"target": "C5", "drug_total_trials": 100, "drug_terminated": 15},
"what_we_dont_know": ["..."],
"evidence": [{"claim": "...", "provenance": "REPORTED", "confidence": "A", "sources": ["..."]}],
"headline_confidence": "B"
}from trialmortem import Postmortem
pm = Postmortem.from_nct("NCT00134264", no_llm=True)
print(pm.verdict) # Verdict.MECHANISTIC_ANALYSIS_WARRANTED
print(pm.headline_confidence) # Confidence.B
for h in pm.hypotheses:
print(h.statement, h.confidence, [s.db for s in h.sources])
pm.write_json("out.json")
pm.write_markdown("out.md")All free, all queried directly, all degrade gracefully. A novel terminated compound may have rich registry data, zero FAERS reports, partial ChEMBL, and strong Open Targets evidence; the report composes from whatever exists and states what is missing.
| Source | What it gives | What to respect |
|---|---|---|
| ClinicalTrials.gov v2 | status, whyStopped, phase, enrollment, interventions, conditions, results |
free-text reason; about 60% of completed trials never post results, and absence is not failure |
| openFDA FAERS | post-market adverse-event reports | post-market only, often empty for never-approved drugs, no denominators, a report is not causation |
| Open Targets | target-disease association, genetic evidence, tractability, drug safety flags | single-query GraphQL; release is pinned |
| ChEMBL | mechanism of action, molecule to target | intervention strings must be mapped carefully |
| Europe PMC | post-hoc explanations and review context | co-mention is not causation |
Every claim carries two machine-readable tags.
- Provenance:
REPORTED(stated in a registry, label, or paper) versusINFERRED(model reasoning over evidence). These are never blurred. - Confidence: an ordinal scale with explicit anchors.
| Grade | Meaning |
|---|---|
| A | Reported and corroborated by an independent source |
| B | Reported, single source |
| C | Inferred, strong evidence |
| D | Inferred, weak or indirect |
| E | Speculative |
A report's headline confidence is the floor of the conclusions it actually asserts. A confident-sounding paragraph resting on grade D evidence renders as grade D.
failbench ships in the repo. It scores the two things that matter: does the
classifier label the reason correctly, and does the gate abstain on operational
stops while opening on genuine safety and efficacy stops. The bundled cases run
fully offline, so the numbers are deterministic.
trialmortem failbenchEvery run writes a manifest: the tool and schema versions, the exact retrievals
with their data versions and cache keys, the model used, and the final report.
Because the raw payloads live in a content-addressed cache, a run is
reconstructable and auditable offline. trialmortem replay <run-id> re-renders a
past run without touching the network.
- Not a predictor of future trial success.
- Not medical advice or a basis for any treatment decision.
- Not a claim of causation from adverse-event counts.
- Not a replacement for reading the trial. It is a forensic first read with receipts.
Research and informational use only. See DISCLAIMER. The presence of an adverse-event report is not causation, and the absence of posted results is not failure. Verify anything that matters against the primary sources.
Apache License 2.0. See LICENSE.