Skip to content
View bdschi1's full-sized avatar
💭
24/7
💭
24/7

Highlights

  • Pro

Block or report bdschi1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bdschi1/README.md

Evaluation infrastructure testing LLMs on financial reasoning, portfolio construction, and risk decomposition.


What's here

Evaluation

fin-reasoning-eval 306 finance problems — valuation, accounting, credit, portfolio math. Difficulty grading, multi-model leaderboard, AI vendor assessment framework.
investment-workflow-evals Scoring rubrics for the full investment workflow: thesis → catalysts → sizing → risk → monitoring → post-mortem. RLHF Studio, IC memo templates, institutional-to-retail research translator.
excel-model-eval Structural auditing AND construction of financial models — DCF builder, comps table, operating model, plus dependency graphs, circular ref detection, balance sheet consistency checks.
institutional-investor-casebook PM-level case studies scored against quantized local models. Likert ratings, CLI pipeline.
judgment-under-uncertainty-eval Healthcare investing judgment quality — adversarial testing and calibration.

Red teaming

redflag-ex1-analyst Scans analyst research for MNPI, tipping, regulatory arbitrage, construction traps. PASS / PM_REVIEW / AUTO_REJECT in <60s.

Multi-agent

multi-agent-investment-committee Four-agent IC with adversarial debate and RL-ready T-signal. 200+ tests, 6 LLM providers, Bloomberg Terminal/IBKR adapters.

RAG + Data

investment-research-rag Full RAG pipeline for investment research — SEC filings, earnings transcripts, equity research, Excel models. 4 doc-type-specific chunkers, hybrid retrieval, reranking, 3 LLM/embedding providers. 255 tests.

Portfolio tools

ls-portfolio-lab L/S risk workbench — 40+ metrics, trade simulator, PM scorecard. Streamlit + Polars + Plotly.
backtest-lab Event-driven backtesting with execution realism and bias prevention. 322 tests.
fund-tracker-13f 13F filing analyzer — 52 hedge funds, position changes, consensus trades, crowding.

About

  • 20+ years buy-side equity PM (SAC Capital/Point72, BAM, WRC), global healthcare across all six GICS industries. More generalist in recent years. CFA, MBA.
  • Built and systematized investment processes. Hired, trained, and developed analyst teams. Taught valuation and idea pitching internally across firms.
  • Building LLM evaluation frameworks and agentic tools for investment research and probability assignment — scoring rubrics, difficulty grading, multi-model leaderboards, adversarial red teams.
  • The failure modes in investment decisions — anchoring, false precision, narrative over data, footnote blindness — appear in LLM outputs too. These repos measure that.
  • Eval-first, adversarial by default, open source.

Python PyTorch Hugging Face LangGraph Polars Pydantic Streamlit Anthropic OpenAI SQLite Plotly ChromaDB NetworkX

Risk Decomposition L/S Portfolio Mgmt Quant + Fundamental Regulatory Risk Healthcare Sector LLM Evaluation

 

Curiosity compounds. Rigor endures.

LinkedIn

Pinned Loading

  1. ls-portfolio-lab ls-portfolio-lab Public

    Long/short equity portfolio risk workbench — 40+ metrics, trade simulator, paper portfolio, PM scorecard. Streamlit + Polars + Plotly.

    Python

  2. multi-agent-investment-committee multi-agent-investment-committee Public

    Multi-agent investment committee with structured reasoning, adversarial debate, eval harness, and RL-ready T signal.

    Python

  3. fin-reasoning-eval fin-reasoning-eval Public

    Benchmark for evaluating LLM performance on financial reasoning tasks

    Python

  4. investment-research-rag investment-research-rag Public

    RAG pipeline for investment research — SEC filings, earnings transcripts, equity research. FAISS/Qdrant vector stores, hybrid retrieval, reranking.

    Python

  5. investment-workflow-evals investment-workflow-evals Public

    Domain expertise demonstration for AI training and evaluation in institutional investment research

    Python

  6. excel-model-eval excel-model-eval Public

    A tool for structural analysis of Excel‑based financial models

    Python