Skip to content

W00DSRULES/launch

Repository files navigation

Smart Opportunity Finder

A self-hosted, multi-agent system that scans public MedTech signals (patents, clinical trials, regulatory filings, funding rounds, academic literature, grants) and surfaces a short, ranked list of growth opportunities matched to a company's manufacturing competencies.

The output answers two questions per opportunity:

  • Why us? — deterministic match against the company's competency model
  • Why now? — derived from timing proxies in the public record (trial phase transitions, patent grant windows, regulatory clearance events, funding rounds)

The hard invariant: no claim without traceable evidence. A trust gate in Python forces any claim with no retrieved evidence to unsupported, regardless of what the LLM said.


Quick start

cp .env.example .env          # fill in at least one provider key (or set DRY_RUN=true)
docker compose up -d
# UI:    http://localhost:3000
# API:   http://localhost:3000/api/v1
# Docs:  http://localhost:3000/api/docs   (nginx-proxied OpenAPI)

DRY_RUN=true runs the full pipeline without any LLM call (deterministic placeholder JSON). Useful for CI and offline demos.


Architecture

flowchart TD
    User([Browser SPA]) -->|HTTPS| API[FastAPI]
    API --> Pipe{{OpportunityOrchestrator}}

    Pipe --> S1[1. PLAN<br/>LLM]
    S1 --> S2[2. RESEARCH<br/>deterministic fan-out]
    S2 -->|parallel| A1[ClinicalTrials]
    S2 -->|parallel| A2[Patents — Google & EPO]
    S2 -->|parallel| A3[Regulatory — openFDA]
    S2 -->|parallel| A4[Funding — SEC EDGAR]
    S2 -->|parallel| A5[Academic — OpenAlex & PubMed]
    S2 -->|parallel| A6[Grants — CORDIS & NIH]
    A1 & A2 & A3 & A4 & A5 & A6 --> S3[3. CHECK<br/>LLM + Trust Gate]
    S3 --> S4[4. MATCH<br/>Competency Matcher]
    S4 --> S5[5. TIME<br/>rules + LLM rationale]
    S5 --> S6[6. RANK<br/>weighted blend]
    S6 --> S7[7. BRIEF<br/>LLM]
    S7 --> S8[8. CRITIQUE<br/>LLM quality tier]
    S8 --> Out[Ranked Briefing]

    API -.->|persistence| DB[(SQLite / MariaDB)]
    API -.->|embeddings| QD[(Qdrant)]
    API -.-> LLM[UnifiedLLM<br/>Anthropic / OpenAI / DeepSeek / Ollama]
Loading

The 8-step pipeline

Entry point: core/orchestrator/opportunity_flow.py::OpportunityOrchestrator.run().

Step Function Notes
1. PLAN step_plan LLM decomposes the focus area into atomic claims with competency_hint and search_terms.
2. RESEARCH _research Parallel fan-out across all registered SourceAdapter instances (20 s per-adapter timeout).
3. CHECK step_check LLM verifies each claim against retrieved evidence. The Python trust gate (enforce_trust_gate) overwrites any verified verdict that has no evidence. Coherence pass halves confidence when invented identifiers (NCT IDs, DOIs, dollar amounts) are detected.
4. MATCH step_match Deterministic competency match using competencies/schott.yaml. Below MATCH_THRESHOLD=0.20 is dropped.
5. TIME step_time Rules + LLM rationale. Maps timing signals to {NOW, 2_YEARS, 5_YEARS, EMERGING} with a [0,1] urgency score. Rules in scoring/timing_rules.yaml.
6. RANK core/scoring/ranking.rank Weighted blend (see below). Top 5 returned.
7. BRIEF step_brief LLM writes the final card; coherence pass downgrades confidence if invented refs are spotted.
8. CRITIQUE step_critique LLM quality grade. Critique score < 0.4 or blocking issues → card dropped.

Trust gate (the invariant)

def enforce_trust_gate(verdict: str, evidence: list[EvidenceItem]) -> str:
    if not evidence and verdict == "verified":
        return "unsupported"
    return verdict

Source: core/orchestrator/opportunity_flow.py.

Scoring formula

$$\text{score} = \frac{\sum_i w_i \cdot s_i}{\sum_i w_i}$$

Axis Default weight Source
competency_fit 0.40 matcher score
timing 0.30 TIME step urgency
signal_strength 0.15 min(len(evidence)/10, 1.0) × coherence
market_potential 0.10 CHECK confidence × coherence
competitive_window 0.05 default 0.6, tunable per opportunity

Weights renormalise if they don't sum to 1.0. All five RANK_W_* overrides come from environment variables — no code change needed.

Competency matching

core/scoring/matcher.py. Fully deterministic — no embeddings, no LLM. Four signals blended into one [0,1] score against each competency in competencies/schott.yaml:

Signal Weight How
Keyword overlap 50% Fraction present; substring match for multi-word terms
Application overlap 30% Phrase hit = 1.0; partial token overlap ≤ 0.5
Material match 20% Exact / prefix / order-agnostic token-set; generic tokens excluded
Negative-keyword penalty subtractive Caps at 30%

Results memoised with a 1 024-entry LRU keyed on (text_hash, competencies_id).


Source adapters

All registered in core/orchestrator/adapter_factory.py::build_default_adapters. Adapters that need a key skip themselves if their env var is empty.

Adapter Source Notes
ClinicalTrialsAdapter clinicaltrials.gov v2 Graceful token-limit degradation
GooglePatentsAdapter Google Patents (scraper) Requires pip install -e .[patents]
EpoOpsAdapter EPO Open Patent Services Skipped without EPO_OPS_API_KEY
OpenFDADevicesAdapter openFDA devices FDA 510(k) / PMA clearances
OpenFDADrugAdapter openFDA drug NDA / ANDA approvals
SecEdgarAdapter SEC EDGAR full-text Funding / M&A signals
CordisAdapter EU CORDIS Horizon grants
NihReporterAdapter NIH Reporter US NIH grants
OpenAlexAdapter OpenAlex 250M+ academic works
PubmedAdapter PubMed / MEDLINE Biomedical literature
EuClinicalTrialsAdapter EudraCT EU trial registrations

A new source = subclass core.sources.base.SourceAdapter, return EvidenceItems, register in build_default_adapters.


LLM routing

providers/unified.py::UnifiedLLM. Three cost tiers configured independently:

Tier Default model Used by
quality claude-sonnet-4-20250514 PLAN, BRIEF, CRITIQUE
balanced gpt-4o-mini CHECK
cheap deepseek-chat TIME rationale, fallback

Runtime guarantees:

  • Circuit breakers per provider (pybreaker) — a failing provider trips for the rest of the request.
  • Health cache with 60 s TTL — unhealthy providers are skipped without a probe call.
  • Budget guard (providers/budget.py) — pre-call cost estimate, skipped if it would exceed MAX_BUDGET_USD.
  • Fallback chainUnifiedLLM tries every configured provider in priority order before raising.
  • Ollama passthroughQUALITY_PROVIDER=ollama + OLLAMA_HOST point at any on-prem instance.
  • DRY_RUN=true — returns deterministic placeholder JSON, no API call.

API reference

apps/api/main.py. Base path served by nginx at /api/v1/.

All routes except /health and /api/v1/auth/login require an active session cookie. Unauthenticated requests get a 401.

Method Route Description
GET /health Liveness check (open)
POST /api/v1/auth/login Body {username, password}. Sets HttpOnly cookie on 200; 401 on bad creds (open)
POST /api/v1/auth/logout Invalidates the session and clears the cookie
GET /api/v1/auth/me Current caller
GET /api/v1/users/me/runs Runs launched by the caller, newest first
GET /api/v1/users/me/schedules Schedules owned by the caller
POST /api/v1/runs Start a new opportunity scan (returns immediately, async)
GET /api/v1/runs/{run_id} Fetch run status + result
GET /api/v1/runs/{run_id}/results Just the RankedBrief payload (409 if not done)
GET /api/v1/runs/ List recent runs (global, newest first)
GET /api/v1/events/{run_id} SSE stream of pipeline stage transitions
POST /api/v1/schedules Create a cron schedule
GET /api/v1/schedules List schedules (global)
DELETE /api/v1/schedules/{id} Delete a schedule
POST /api/v1/schedules/{id}/trigger Fire a schedule once, outside cron cadence

Authentication

Cookie-based sessions. The flow:

  1. The backend reads AUTH_USERS=user:pw,user:pw from its env at startup and seeds users with pbkdf2-hashed passwords. Plaintext only lives in that env var.
  2. The SPA posts {username, password} to POST /api/v1/auth/login. The backend verifies the hash and sets schott_session as an HttpOnly, SameSite=Lax cookie (set COOKIE_SECURE=true for HTTPS deploys).
  3. Every fetch from the SPA uses credentials: 'include', so the browser attaches the cookie automatically. JS never touches the token (that's what HttpOnly buys us against XSS).
  4. POST /api/v1/auth/logout deletes the row from sessions and clears the cookie.

Session TTL defaults to 7 days. Expired sessions are purged on every API startup.


Database schema

core/persistence.py. SQLAlchemy 2.x. SQLite by default; MariaDB in production (docker compose). Postgres is still accepted via the postgresql:// scheme.

users — owner of runs and schedules:

Column Type Notes
username VARCHAR(64) PK Seeded from the backend AUTH_USERS env
display_name VARCHAR(128) Optional
password_hash VARCHAR(256) pbkdf2_hmac(sha256), 100k iters, salted
created_at timestamp

sessions — active login cookies:

Column Type Notes
token VARCHAR(64) PK Opaque random; lives in HttpOnly cookie
user_id FK → users.username ON DELETE CASCADE
created_at / expires_at timestamp 7-day default TTL

runs — one row per pipeline execution:

Column Type Notes
run_id UUID PK
status enum pending / running / done / error
focus_area text User-supplied query
created_at / completed_at timestamp
current_stage text Last pipeline stage name
result JSON Full RankedBrief payload
error text Traceback on failure
triggered_by text manual or schedule
schedule_id FK → schedules Nullable
user_id FK → users.username Nullable; ON DELETE SET NULL

schedules — cron-triggered scans:

Column Type
schedule_id UUID PK
name, focus_area, cron, enabled text / text / text / bool
created_at, last_run_at, last_run_id timestamps + FK
user_id FK → users.username

Migrating from SQLite

If you have a legacy data/launch.db from before the MariaDB switch, copy it into the new database in one shot:

DATABASE_URL=mysql+pymysql://launch:launch@localhost:3306/launch?charset=utf8mb4 \
  python -m tools.migrate_sqlite_to_db --source data/launch.db

The migrator upserts by primary key, so it is safe to re-run.


Frontend

frontend/ — React 18 + Vite + TypeScript + Tailwind, served as a static SPA by the nginx layer in the API container.

Path Page Backed by
/ Dashboard GET /runs
/scan/:runId ScanMonitor GET /runs/:id + SSE /events/:id
/opportunity/:id OpportunityDeepDive GET /runs (client-side lookup)
/bets StrategicBets GET /runs
/radar RadarFeed GET /runs
/schedules Schedules GET/POST/DELETE /schedules
/research ResearchAgent Ad-hoc manual query (client-side)
/competencies Competencies YAML editor (client-side)
/admin Admin Settings UI (client-side)
/login Login POST /auth/login — sets the HttpOnly session cookie

On boot, the SPA calls GET /auth/me to check whether the existing cookie is still valid; if not, it falls back to the login screen.


Install (without Docker)

git clone https://github.com/W00DSRULES/launch.git
cd launch
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env

# backend
uvicorn apps.api.main:app --reload
# frontend (separate terminal)
cd frontend && npm install && npm run dev

Python 3.11+ required. SQLite + an embedded Qdrant are used by default — zero external services needed for a first run.


Develop

make test         # pytest
make lint         # ruff + black
make typecheck    # mypy on core/ + providers/
make format       # ruff --fix + black

Pre-commit hooks (.pre-commit-config.yaml) run ruff + black on every commit.


License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors