Second Opinion

AI-powered pre-mortem review for engineering teams. Catch distributed-systems failure modes before you ship — grounded in your own org's incident history.

Second Opinion analyzes architecture and design documents against 24 curated distributed-systems failure patterns, then matches findings against your team's past incidents — so every design review is grounded in your org's real production history, not just generic best practices.

Before you ship, ask for a second opinion.

Screenshots

Analyze a design doc	Org incident matches

Failure mode detail	Incident library

Why Second Opinion?

The failures that page you at 3 a.m. are rarely the obvious ones. They're the thundering herd that happens when three caches expire simultaneously. The poison message that wedges a queue. The cascading timeout that turns a 500ms dependency into a 30-second outage.

Your post-mortems already document those failures — but most teams read them once and move on. Second Opinion turns your incident history into institutional memory that participates in every future design review.

What it does:

Evaluates a design document against 24 distributed-systems failure archetypes
Matches findings against your org's stored incidents, explaining exactly how the new design could reproduce a past failure
Surfaces implicit assumptions and critical information gaps the design doesn't address
Produces a structured, exportable report with evidence, trigger conditions, and discussion questions per finding

Features

Org Incident Memory — paste post-mortems once; every future review is grounded in your real failure history
24 Failure Patterns — covering load, data, timing, resource, dependency, and distributed failure classes
Multi-provider LLM — NVIDIA NIM (free tier, default), OpenAI GPT-4o, Anthropic Claude, or local Ollama
Vercel-ready — step-based API keeps every serverless function call under 10s
PDF + Markdown upload — accepts .pdf, .md, .txt, .rst, .adoc
Bulk incident import — upload multiple post-mortem files or paste several at once (--- separated)
Mobile-first UI — works on phones; useful during live design review meetings
Export — copy as Markdown or download as JSON

How It Works

┌─────────────────────────────────────────────────────────┐
│  1. Build your Incident Library                          │
│     Paste past post-mortems → AI extracts structured     │
│     failure data → stored in Postgres                    │
└──────────────────────────┬──────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────┐
│  2. Paste or upload a design doc                         │
│     Add context: scale, SLOs, dependencies               │
└──────────────────────────┬──────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────┐
│  3. Analysis runs in parallel (3 rounds, ~20s total)     │
│                                                          │
│  Round 1 (parallel):                                     │
│    • Pattern matching against 24 failure archetypes      │
│    • Implicit assumption extraction                      │
│    • Known unknowns / information gaps                   │
│                                                          │
│  Round 2 (parallel, uses Round 1 findings):              │
│    • Ruled-out risk detection                            │
│    • Org incident library matching                       │
│                                                          │
│  Round 3: Summary                                        │
└──────────────────────────┬──────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────┐
│  4. Structured report                                    │
│     • Org incident matches (with relevance explanation)  │
│     • Failure modes (confidence, evidence, triggers)     │
│     • Implicit assumptions                               │
│     • Known unknowns                                     │
│     • Ruled-out risks                                    │
└─────────────────────────────────────────────────────────┘

Failure Patterns

24 curated distributed-systems failure archetypes:

Category	Patterns
Load	Thundering Herd, Load Shedding Blind Spot, Retry Storm, Hotspot/Hot Shard, Fan-out Amplification
Dependency	Hidden Synchronous Dependency, Degraded but Not Dead, Single Point of Failure, Bulkhead Absence
Data	Silent Data Loss, Metadata Corruption, Poison Message, State Machine Explosion, Dual Write Inconsistency, Missing Idempotency
Timing	Cascading Timeout, Clock Skew Issues
Resource	Resource Exhaustion, Unbounded Growth, Noisy Neighbor
Distributed	Partial Outage Inconsistency, Version Skew, Coordination Overhead, Event Ordering Assumption

Quick Start

Option 1 — NVIDIA NIM (recommended, free tier, no GPU)

git clone https://github.com/divarun/second_opinion.git
cd second_opinion

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Get a free key at https://build.nvidia.com
cat > .env << EOF
LLM_PROVIDER=nvidia
NVIDIA_API_KEY=nvapi-...
DATABASE_URL=postgresql://second_opinion:second_opinion@localhost:5432/second_opinion
EOF

docker compose up postgres -d   # start Postgres
uvicorn app.app:app --reload

Open http://localhost:8000

Option 2 — OpenAI

echo "LLM_PROVIDER=openai" >> .env
echo "OPENAI_API_KEY=sk-..." >> .env
uvicorn app.app:app --reload

Option 3 — Anthropic Claude

echo "LLM_PROVIDER=anthropic" >> .env
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env
uvicorn app.app:app --reload

Option 4 — Local models (Ollama, no API key)

# Start Postgres + Ollama together
docker compose --profile ollama up

# App runs separately
echo "LLM_PROVIDER=ollama" >> .env
echo "DATABASE_URL=postgresql://second_opinion:second_opinion@localhost:5432/second_opinion" >> .env
uvicorn app.app:app --reload

Option 5 — Deploy to Vercel

See Vercel Deployment.

Vercel Deployment

Second Opinion is built for Vercel Hobby (free tier). The API is split into per-step endpoints so each serverless function call makes exactly one LLM call and stays well under the 10-second timeout.

1. Fork and import to Vercel

Fork this repo, then import it in the Vercel dashboard.

2. Add Neon Postgres

In your Vercel project: Storage → Add → Neon. This sets POSTGRES_URL automatically.

3. Set LLM environment variables

In Settings → Environment Variables:

LLM_PROVIDER=nvidia
NVIDIA_API_KEY=nvapi-...

4. Deploy

npm i -g vercel
vercel --prod

Vercel picks up vercel.json automatically. That's it.

LLM Providers

Provider	Free Tier	Setup
NVIDIA NIM ✅ default	Yes (generous)	build.nvidia.com
OpenAI	No	platform.openai.com
Anthropic	No	console.anthropic.com
Ollama	Local only	ollama.com

NVIDIA NIM default model: nvidia/llama-3.1-nemotron-70b-instruct — reasoning-optimized, reliable JSON output, free tier.

API Reference

The step endpoints are what the browser uses. Each makes exactly one LLM call.

Method	Endpoint	Description
`POST`	`/api/analyze/step/patterns`	Pattern matching (one LLM call)
`POST`	`/api/analyze/step/assumptions`	Implicit assumptions (one LLM call)
`POST`	`/api/analyze/step/unknowns`	Known unknowns / gaps (one LLM call)
`POST`	`/api/analyze/step/ruledout`	Ruled-out risks (one LLM call, needs findings)
`POST`	`/api/analyze/step/incidents`	Org incident matching (one LLM call, needs findings)
`POST`	`/api/analyze/step/summary`	Summary (no LLM)
`POST`	`/api/analyze`	Full analysis, single call (local dev only)
`POST`	`/api/extract-pdf`	PDF → text (no LLM)
`GET`	`/api/incidents`	List incident library
`POST`	`/api/incidents`	Add incident from post-mortem text
`DELETE`	`/api/incidents/{id}`	Remove incident
`GET`	`/api/patterns`	List all 24 failure patterns
`GET`	`/api/health`	LLM connectivity check

Configuration

Variable	Default	Description
`LLM_PROVIDER`	`nvidia`	`nvidia` \| `openai` \| `anthropic` \| `ollama`
`NVIDIA_API_KEY`	—	Required when `LLM_PROVIDER=nvidia`
`NVIDIA_MODEL`	`nvidia/llama-3.1-nemotron-70b-instruct`	NIM model ID
`OPENAI_API_KEY`	—	Required when `LLM_PROVIDER=openai`
`OPENAI_MODEL`	`gpt-4o`	OpenAI model
`ANTHROPIC_API_KEY`	—	Required when `LLM_PROVIDER=anthropic`
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Anthropic model
`OLLAMA_MODEL`	`llama3`	Ollama model
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`DATABASE_URL`	—	Postgres URL (local/Docker)
`POSTGRES_URL`	—	Postgres URL (set by Vercel/Neon)
`CONFIDENCE_THRESHOLD`	`0.6`	Minimum score to include a finding
`MAX_FAILURE_MODES`	`10`	Max findings returned
`MAX_DOCUMENT_SIZE`	`50000`	Max input characters

Project Structure

second_opinion/
├── api/
│   └── index.py              # Vercel entry point
├── app/
│   ├── app.py                # FastAPI routes
│   ├── analyzer.py           # Analysis pipeline + step methods
│   ├── patterns.py           # 24 failure pattern definitions
│   ├── llm.py                # Multi-provider LLM client (lazy singleton)
│   ├── models.py             # Pydantic models
│   ├── config.py             # Settings from environment variables
│   ├── database.py           # asyncpg connection pool
│   ├── incident_store.py     # Incident CRUD
│   ├── incident_extractor.py # LLM extraction from post-mortems
│   ├── templates/            # Jinja2 HTML
│   └── static/               # CSS + JS (no build step)
├── samples/
│   ├── design-doc/           # Example design documents to analyze
│   └── postmortem/           # Example post-mortems to load into incident library
├── docs/
│   └── screenshots/          # UI screenshots for README
├── Dockerfile
├── docker-compose.yml        # Postgres (default) + Ollama (--profile ollama)
├── vercel.json
└── requirements.txt

Roadmap

✅ Phase 1 — Incident Memory (complete)

Build an org-specific incident library from past post-mortems. Design reviews are grounded against your real failure history, not generic patterns.

🔜 Phase 2 — Code-Doc Drift Detection

Accept a code snippet or PR diff alongside the design doc. Detect divergences between what the doc claims and what the code actually implements.

Example: "The doc says circuit breakers are in place on all external calls. payment_client.py has no circuit breaker."

🔜 Phase 3 — Production Grounding

Connect to Prometheus or Datadog. When the design says "will handle 10K RPS", pull real metrics for the named services and flag the delta between assumed and actual.

Example: "Design assumes 4x headroom to peak. Current P99 at 500 req/min is 1.8s. Connection pool is at 90% utilization."

Contributing

Contributions are welcome. See CONTRIBUTING.md.

Key areas: new failure patterns, test suite, .docx support, OCR for scanned PDFs, more LLM providers.

License

MIT — see LICENSE.

Second Opinion assists in design reviews but does not guarantee correctness or completeness. Always apply human judgment to the results.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
api		api
app		app
docs/screenshots		docs/screenshots
samples		samples
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.vercelignore		.vercelignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Second Opinion

Screenshots

Why Second Opinion?

Features

How It Works

Failure Patterns

Quick Start

Option 1 — NVIDIA NIM (recommended, free tier, no GPU)

Option 2 — OpenAI

Option 3 — Anthropic Claude

Option 4 — Local models (Ollama, no API key)

Option 5 — Deploy to Vercel

Vercel Deployment

LLM Providers

API Reference

Configuration

Project Structure

Roadmap

✅ Phase 1 — Incident Memory (complete)

🔜 Phase 2 — Code-Doc Drift Detection

🔜 Phase 3 — Production Grounding

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Second Opinion

Screenshots

Why Second Opinion?

Features

How It Works

Failure Patterns

Quick Start

Option 1 — NVIDIA NIM (recommended, free tier, no GPU)

Option 2 — OpenAI

Option 3 — Anthropic Claude

Option 4 — Local models (Ollama, no API key)

Option 5 — Deploy to Vercel

Vercel Deployment

LLM Providers

API Reference

Configuration

Project Structure

Roadmap

✅ Phase 1 — Incident Memory (complete)

🔜 Phase 2 — Code-Doc Drift Detection

🔜 Phase 3 — Production Grounding

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages