Skip to content

ssatanis/trialmortem

Repository files navigation

trialmortem

The autopsy engine for clinical trials.

Give it a trial, a drug, or a target. It classifies why a trial stopped, links the biology across ClinicalTrials.gov, openFDA, Open Targets, and ChEMBL, grades every claim, and returns an evidence-backed mechanistic report. When a trial did not stop for a scientific reason, it says so and refuses to invent one.

CI Python License


Why this exists

About 1 in 10 drugs that enter clinical trials reaches approval, and most of the failures are never written up in one place. The registry records a free-text reason for stopping, with no controlled vocabulary, so "the science failed" and "the site lost its coordinator" sit in the same field. Two very different questions get conflated:

  • A single trial usually stops for operational reasons. Insufficient accrual is the single most common cause across the literature.
  • A whole drug program usually dies from biology.

Reasoning from one NCT id straight to "the mechanism failed" is wrong far more often than it is right. trialmortem keeps the two layers apart on purpose, and it treats knowing when not to tell a biological story as the core feature, not a caveat.

What is actually new

Thin wrappers around the ClinicalTrials.gov API already exist. The forensic reasoning layer does not. The precise claim, the one worth scrutiny, is this:

A single-command, open-source, reproducible tool that ingests a trial (or drug, or target), classifies why it stopped, links it to mechanistic evidence across ClinicalTrials.gov, openFDA, Open Targets, and ChEMBL plus the literature, and emits a confidence-graded post-mortem with explicit abstention.

Prior art is real and worth citing: population-level retrospectives (Nature Genetics 2024), labeled datasets (ClinicalRisk), and target-evidence platforms (Open Targets). None of them is a thing you can install and run on one trial.

Install

# with uv (recommended)
uv tool install trialmortem

# or run without installing
uvx trialmortem NCT00134264

# or plain pip
pip install trialmortem

From source:

git clone https://github.com/ssatanis/trialmortem
cd trialmortem
uv pip install -e ".[dev]"

Quickstart

No API key is required. The --no-llm path does full retrieval, classification, and evidence assembly using deterministic heuristics.

trialmortem NCT01399593 --no-llm
╭─ trialmortem ──────────────────────────────────────────────────────────────╮
│ Safety & Efficacy of Eculizumab to Prevent AMR in ...         TERMINATED    │
│ NCT01399593   Phase 2   Alexion Pharmaceuticals, Inc.                       │
│                                                                             │
│ Verdict: Mechanistic analysis warranted                                     │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Why it stopped ────────────────────────────────────────────────────────────╮
│    primary  EFFICACY_FUTILITY                                               │
│ confidence  B  Reported, single source                                      │
│  rationale  whyStopped explicitly cites "did not achieve".                  │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─ Evidence ──────────────────────────────────────────────────────────────────╮
│ 2  B  Eculizumab acts as a complement c5 inhibitor.            chembl        │
│ 3  B  Eculizumab appears in 2256 post-market reports ...       openfda       │
│ 4  A  100 registered trials use this intervention; 15 ...      ctgov         │
╰─────────────────────────────────────────────────────────────────────────────╯

Operational stops look different. The headline is the abstention:

trialmortem NCT02145078 --no-llm
# Verdict: Not a scientific failure
# primary  RECRUITMENT   rationale  whyStopped explicitly cites "accrual".

The most famous failures teach the same lesson from the other side. The registry field for the torcetrapib outcome trial is blank, so the single-trial verdict is honest about that rather than inventing a cause:

trialmortem NCT00134264 --no-llm
# Verdict: Investigate, reason unreported   (whyStopped was empty)

Other surfaces and modes:

trialmortem NCT00134264 --json > postmortem.json   # the stable JSON contract
trialmortem NCT00134264 --md  > postmortem.md       # a shareable Markdown brief
trialmortem --drug semagacestat                     # program-level synthesis
trialmortem --target CETP                            # class-level view
trialmortem batch trials.csv > out.jsonl             # one JSON object per line
trialmortem runs                                     # list reproducible runs
trialmortem replay 2026-06-04T15-22-09               # re-render a past run offline

To turn on synthesis, give any one provider a key and drop --no-llm:

export ANTHROPIC_API_KEY=...      # or OPENAI_API_KEY, or run a local Ollama
trialmortem NCT00134264 --model anthropic:claude-3-5-haiku-latest

How it works

Six stages. The first two always run. Stage three is a gate, and stages four through six only run on its verdict.

NCT / drug / target
  -> [1] retrieve and normalize   ClinicalTrials.gov v2 to a canonical Trial
  -> [2] classify the stop        whyStopped to a controlled taxonomy + confidence
  -> [3] abstention gate          is biology even relevant here?
         |                  \
         | operational       \ safety / efficacy / unknown-with-signal
         v                    v
   honest report        [4] build the dossier   ChEMBL, Open Targets, openFDA, literature, siblings
   "not a scientific    [5] synthesize          RAG-constrained, ranked hypotheses with support and counter-evidence
    failure"            [6] render              TUI + JSON + Markdown + a provenance manifest

The gate is the whole point. If the stop class is operational (recruitment, sponsor or business, logistics) the tool produces a useful report, the program context and the target's evidence profile, but withholds mechanistic hypotheses and headlines that this was not a scientific failure. Synthesis can cite only the retrieved dossier, so it cannot wander off into invented mechanisms.

Output

Three surfaces render from one object.

  1. Terminal (default): a Rich report with a status badge, a color-coded verdict, the classification, ranked hypotheses with confidence chips, a "what we don't know" panel, and sourced evidence.
  2. --json: the versioned contract below. Stable, for pipelines.
  3. --md: a Markdown brief for a PR, a Slack thread, or teaching.
{
  "query": {"type": "nct", "value": "NCT01399593"},
  "trial": {"nct_id": "NCT01399593", "phase": "PHASE2", "status": "TERMINATED", "...": "..."},
  "stop_classification": {"primary": "EFFICACY_FUTILITY", "confidence": "B", "rationale": "..."},
  "verdict": "MECHANISTIC_ANALYSIS_WARRANTED",
  "mechanistic_analysis": {"hypotheses": [{"statement": "...", "confidence": "C"}]},
  "program_context": {"target": "C5", "drug_total_trials": 100, "drug_terminated": 15},
  "what_we_dont_know": ["..."],
  "evidence": [{"claim": "...", "provenance": "REPORTED", "confidence": "A", "sources": ["..."]}],
  "headline_confidence": "B"
}

Python API

from trialmortem import Postmortem

pm = Postmortem.from_nct("NCT00134264", no_llm=True)
print(pm.verdict)               # Verdict.MECHANISTIC_ANALYSIS_WARRANTED
print(pm.headline_confidence)   # Confidence.B

for h in pm.hypotheses:
    print(h.statement, h.confidence, [s.db for s in h.sources])

pm.write_json("out.json")
pm.write_markdown("out.md")

Data sources

All free, all queried directly, all degrade gracefully. A novel terminated compound may have rich registry data, zero FAERS reports, partial ChEMBL, and strong Open Targets evidence; the report composes from whatever exists and states what is missing.

Source What it gives What to respect
ClinicalTrials.gov v2 status, whyStopped, phase, enrollment, interventions, conditions, results free-text reason; about 60% of completed trials never post results, and absence is not failure
openFDA FAERS post-market adverse-event reports post-market only, often empty for never-approved drugs, no denominators, a report is not causation
Open Targets target-disease association, genetic evidence, tractability, drug safety flags single-query GraphQL; release is pinned
ChEMBL mechanism of action, molecule to target intervention strings must be mapped carefully
Europe PMC post-hoc explanations and review context co-mention is not causation

Evidence grading

Every claim carries two machine-readable tags.

  • Provenance: REPORTED (stated in a registry, label, or paper) versus INFERRED (model reasoning over evidence). These are never blurred.
  • Confidence: an ordinal scale with explicit anchors.
Grade Meaning
A Reported and corroborated by an independent source
B Reported, single source
C Inferred, strong evidence
D Inferred, weak or indirect
E Speculative

A report's headline confidence is the floor of the conclusions it actually asserts. A confident-sounding paragraph resting on grade D evidence renders as grade D.

Benchmark

failbench ships in the repo. It scores the two things that matter: does the classifier label the reason correctly, and does the gate abstain on operational stops while opening on genuine safety and efficacy stops. The bundled cases run fully offline, so the numbers are deterministic.

trialmortem failbench

Reproducibility

Every run writes a manifest: the tool and schema versions, the exact retrievals with their data versions and cache keys, the model used, and the final report. Because the raw payloads live in a content-addressed cache, a run is reconstructable and auditable offline. trialmortem replay <run-id> re-renders a past run without touching the network.

Non-goals

  • Not a predictor of future trial success.
  • Not medical advice or a basis for any treatment decision.
  • Not a claim of causation from adverse-event counts.
  • Not a replacement for reading the trial. It is a forensic first read with receipts.

Disclaimer

Research and informational use only. See DISCLAIMER. The presence of an adverse-event report is not causation, and the absence of posted results is not failure. Verify anything that matters against the primary sources.

License

Apache License 2.0. See LICENSE.

About

Forensic post-mortems for clinical trials. Give it an NCT ID, drug, or target: it classifies why a trial stopped, links the biology across ClinicalTrials.gov, openFDA, Open Targets, and ChEMBL, and returns an evidence-graded report, or refuses to speculate when the stop was not scientific.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages