Transcript Analysis Pipeline

A three-stage LLM pipeline for turning meeting transcripts into structured, decision-grade analysis — with explicit grounding, observed-vs-inferred labeling, and an audit pass that catches fabrication before it reaches the reader.

Most LLM-powered analysis tools optimize for output that looks confident. This one optimizes for output you can act on — which means letting the system say "not enough signal here" and giving the audit stage no power to invent.

Why this exists

LLMs are good at producing prose that reads like analysis. They are not, by default, good at producing analysis that holds up under scrutiny. The failure modes are well-known: fabricated quantities, manufactured commitments, plausible-sounding but ungrounded business framing, recommendations padded with "align with stakeholders" filler.

This repo is one bet about how to do better: explicit grounding plus a bounded audit beats trying to make an LLM be careful in prose.

Architecture

transcript + (optional) business context
        │
        ▼
┌───────────────┐
│  1. EXTRACT   │  high-fidelity structured extraction
│   (OpenAI)    │  meeting_type, decision_state, evidence_map, ...
└───────┬───────┘
        ▼
┌───────────────┐
│ 2. SYNTHESIZE │  structured claim objects in 8 sections
│  (Anthropic)  │  every claim labeled observed | inferred | recommendation
└───────┬───────┘
        ▼
┌───────────────┐
│   3. AUDIT    │  five mutation ops only:
│  (Anthropic)  │  delete, downgrade, move, replace, collapse
└───────┬───────┘
        ▼
┌───────────────┐
│   4. RENDER   │  deterministic Python; claim objects → prose
│   (no LLM)    │
└───────┬───────┘
        ▼
   Markdown report

Three LLM stages plus a deterministic Python renderer. Each stage has a narrow job and constrained authority.

Design principles

These are the calls that make the pipeline different from a generic "summarize this meeting" tool:

Sparse is good. Empty is allowed. Sections may legitimately be short or empty. The model has explicit permission to abstain. "No recommendation yet, here is what we need first" is a valid output.
Observed vs inferred is always labeled. Every claim is a structured object that declares what it is and where it came from. Inferred claims show as [INFERRED] in the rendered output.
No quantity unless grounded. Numbers, percentages, and timelines must trace to the transcript or the business context packet. Invented precision (e.g. "audit the past 60-90 days") is what audit strips.
Business framing is source-bounded. Only the transcript and the provided business context packet may be used for the Business Read. No generic SaaS framing, no default exec vocabulary.
Audit cannot create. It can only delete, downgrade, move, replace_with_insufficient_evidence, or collapse_section. Its failure modes are weakening or removal — never fabrication.
Commitments are sacred. A commitment requires someone in the meeting to have committed. Analyst suggestions live in a separate section.

Output structure

The rendered output has 8 sections:

Context — topic, meeting_type, decision_state, participants
What Happened — observed facts only
DS Read — five methodological checks (claim validity, metric validity, design quality, data quality, decision sufficiency)
Business Read — observed implications, inferred implications, missing context
Strategic Options — gated; only produced when alternatives were actually discussed; explicitly handles "options selected in the room without formal evaluation"
Recommended Path — allowed to abstain ("no recommendation yet"); supports partial_direction for meetings that confirmed direction but deferred execution
Commitments and Next Steps — committed-in-meeting kept strictly separate from analyst recommendations
Open Questions — questions raised in the room, plus questions the analyst thinks should have been raised

Try it

The fastest way to understand the system is to open the notebook:

📓 walkthrough.ipynb — walks one transcript through every stage, then runs the eval harness across all fixtures.

Or run the CLI:

# 1. Install
pip install -r requirements.txt

# 2. Configure keys
cp .env.example .env
# edit .env with your OpenAI and Anthropic API keys

# 3. Run on a fixture
python cli.py transcripts/01_decision_meeting.txt

# Output: 01_decision_meeting_analysis.md

CLI options:

python cli.py transcripts/02_working_session.txt --output report.md
python cli.py transcripts/03_thin_transcript.txt --json full_result.json
python cli.py transcripts/01_decision_meeting.txt --skip-audit  # debug
python cli.py transcripts/01_decision_meeting.txt --context my_context.txt

Trust metrics: the eval harness

A demo isn't an evaluation. The harness runs all fixtures and reports metrics that matter for trust in an LLM analysis system:

Metric	What it measures	What "good" looks like
`abstention_rate`	Fraction of section slots that came back empty	Should match the transcript — high on thin transcripts, low on rich ones
`audit_operations_count`	How many fabrications the audit caught	Lower is better, but non-zero is honest
`fabricated_commitments`	Commitments the audit had to move out of `committed_in_meeting`	Should be zero. This is the most dangerous failure mode
`ungrounded_quantities`	Audit operations that flagged invented numbers/dates	Should be zero

python eval_harness.py
python eval_harness.py --fixture decision        # filter by name
python eval_harness.py --skip-audit              # see raw synthesis quality
python eval_harness.py --out eval_results.json   # full results to JSON

Fixtures

Three transcripts ship with the repo, each designed to stress a different pipeline behavior:

Fixture	Tests
`01_decision_meeting.txt`	A clear decision among real alternatives. Should populate `Strategic Options.selected` and `Recommended Path`.
`02_working_session.txt`	Productive problem-solving with no decision. Should abstain on `Recommended Path` while still producing a useful DS Read.
`03_thin_transcript.txt`	Sparse, low-signal content. Most sections should come back empty. The system showing restraint is the right answer.

Add new fixtures by dropping .txt files into transcripts/ and listing them in eval_harness.py:FIXTURES.

Project structure

transcript-analysis-pipeline/
├── prompts.py              # the three core prompts (extract, synthesize, audit)
├── agents.py               # LLM call wrappers with fallback logic
├── engine.py               # 3-stage orchestrator
├── renderer.py             # deterministic claim objects → prose-ready dict
├── exporter.py             # rendered dict → Markdown
├── cli.py                  # CLI entry point
├── eval_harness.py         # fixture-based evaluation
├── walkthrough.ipynb       # end-to-end demo notebook
├── transcripts/            # eval fixtures
├── examples/               # frozen example outputs
├── requirements.txt
├── .env.example
└── LICENSE

What this is not

Not a transcription tool. Bring your own transcript (Otter, Granola, Plaud, manual notes — anything that produces text).
Not a meeting management tool. No scheduling, no follow-up tracking, no integration with calendars.
Not Notion-integrated. The output is Markdown. You can paste it into any tool that takes Markdown.
Not optimized for cost. It runs three LLM calls per analysis. A typical run is a few cents, but this is the wrong tool if you need to process thousands of transcripts a day.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript Analysis Pipeline

Why this exists

Architecture

Design principles

Output structure

Try it

Trust metrics: the eval harness

Fixtures

Project structure

What this is not

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
transcripts		transcripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agents.py		agents.py
cli.py		cli.py
engine.py		engine.py
eval_harness.py		eval_harness.py
exporter.py		exporter.py
prompts.py		prompts.py
renderer.py		renderer.py
requirements.txt		requirements.txt
walkthrough.ipynb		walkthrough.ipynb

Folders and files

Latest commit

History

Repository files navigation

Transcript Analysis Pipeline

Why this exists

Architecture

Design principles

Output structure

Try it

Trust metrics: the eval harness

Fixtures

Project structure

What this is not

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors