Skip to content
View parthtiwari-dev's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report parthtiwari-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
parthtiwari-dev/README.md
██████╗  █████╗ ██████╗ ████████╗██╗  ██╗    ████████╗██╗██╗    ██╗ █████╗ ██████╗ ██╗
██╔══██╗██╔══██╗██╔══██╗╚══██╔══╝██║  ██║    ╚══██╔══╝██║██║    ██║██╔══██╗██╔══██╗██║
██████╔╝███████║██████╔╝   ██║   ███████║       ██║   ██║██║ █╗ ██║███████║██████╔╝██║
██╔═══╝ ██╔══██║██╔══██╗   ██║   ██╔══██║       ██║   ██║██║███╗██║██╔══██║██╔══██╗██║
██║     ██║  ██║██║  ██║   ██║   ██║  ██║       ██║   ██║╚███╔███╔╝██║  ██║██║  ██║██║
╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝       ╚═╝   ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝

Typing SVG


LinkedIn Email GitHub Views



◈   SYSTEM BOOT

$ initializing parth_tiwari.profile ...

[✓] identity          →  AI Systems Engineer
[✓] location          →  Bengaluru, India
[✓] status            →  open to the right problem
[✓] philosophy        →  first principles, not tutorials
[✓] vibe-coding       →  NOT DETECTED
[✓] evaluation        →  ACTIVE
[✓] systems deployed  →  3  (running right now, not on my laptop)
[✓] slides shipped    →  0

[READY] parth_tiwari.profile loaded successfully.



◈   WHO I AM   (told through what broke)

Most profiles show you the wins. Here's what actually happened.


Building a fraud engine. Backtesting revealed this:

train ROC-AUC      →  0.895   ← model looked great
production ROC     →  0.60    ← system was lying to itself the whole time

cause:  temporal features bled future signal into past training windows
fix:    20+ leakage validation tests, point-in-time enforcement, rebuilt from scratch
result: 0.895 ROC-AUC that's actually trustworthy

Shipped a Text-to-SQL agent. Hallucination detector reported 100% hallucination:

hallucination_rate  →  100%   ← every query hallucinating?
actual rate         →  0%     ← the metric was wrong, not the system

cause:  schema_tables_used returned ["schema_dict", "tables"] — dict keys, not table names
fix:    one-line patch
lesson: I found this because I wrote a hallucination detector in the first place

Deployed to Render. LLM mixed up two different databases:

question  →  "what is the total revenue?"      (ecommerce schema)
sql       →  SELECT SUM(amount) FROM fines      (library schema — wrong database entirely)

cause:  both schemas lived in the same Chroma collection, embeddings leaked cross-schema
fix:    prompt isolation + schema-scoped retrieval + re-evaluated full 82-query benchmark

The pattern: I find these things because I build evaluation harnesses before I trust results.

- "it works on my machine" → ship it
+ measure → break it intentionally → fix it → measure again → then ship it



◈   MODEL CARD

model_id         : parth-tiwari-v1
type             : AI Systems Engineer  (fresher)
architecture     : first-principles → build → evaluate → break → fix → deploy
training_data    : production constraints, real failure modes, measurable outcomes

benchmarks:
  text_to_sql_execution_success  : 95.7%   # 82-query ecommerce benchmark
  cross_schema_generalization    : 100%    # zero-shot on unseen library schema
  syntactic_hallucination_rate   : 0.0%    # schema-grounded generation
  fraud_roc_auc                  : 0.895   # 590K transactions, temporal integrity enforced
  fraud_precision_in_budget      : 92%     # ≤0.5% daily alert constraint
  rag_faithfulness_score         : 0.80    # RAGAS evaluated, 20-query medical benchmark
  rag_overall_score              : 0.71    # holistic, not cherry-picked

serving:
  fraud_p95_latency   : < 500ms
  sql_agent_p50       : ~2.3s
  deployment          : Docker · Render · Neon PostgreSQL · Streamlit · HuggingFace

known_limitations    : still a fresher  ·  will fix this with time  ·  but fresh perspective · optimized for rapid iteration and high ownership



◈   DEPLOYED SYSTEMS

Every link below is live. Not a demo. Not a notebook. A running service.



⚡   QUERYPILOT  ·  Self-Correcting Text-to-SQL Agent

Live API Source

  Natural Language
        │
        ▼
  Schema-Aware RAG  ──►  SQL Generator
                               │
                         Static Validator
                               │
               ┌───────────────┼───────────────┐
          Regex Repair       LLM Fix        Executor
               └───────────────┴───────────────┘
                        Self-Correction Loop
                           (max 3 attempts)
Metric Result Context
First-attempt success 90.0% No correction, cold generation
After self-correction 95.7% 3-stage loop on 82-query benchmark
Hallucination rate 0.0% Zero invented tables or columns
Cross-schema generalization 100% Library schema, zero domain tuning
Cold-start reduction ~400ms Per-schema agent caching

Python LangGraph FastAPI ChromaDB PostgreSQL Docker GitHub Actions



🛡   UPI FRAUD ENGINE  ·  Real-Time Fraud Decision System

Live API Live UI Source

  HARD CONSTRAINTS (non-negotiable):
  ├── score transaction at T using only pre-T features   (no future leakage)
  ├── ≤ 0.5% daily alert budget                         (precision is everything)
  └── simulate delayed fraud labels                     (real-world label lag)

  590K transactions → 480+ point-in-time features → DuckDB feature store
  day-by-day backtest surfaced 0.895→0.60 train/serve drift → rebuilt
  A/B: XGBoost vs two-stage ensemble → winner selected under real budget
Metric Result Context
ROC-AUC 0.895 Offline, leakage-validated
Precision in alert budget 92% Only flags what matters
P95 latency < 500ms Production SLA
Leakage tests 20+ Temporal integrity proven
Modeled fraud savings ₹21.6 Cr/yr Stakes were real

Python XGBoost FastAPI DuckDB Great Expectations Docker



🧬   EVIDENCE-BOUND DRUG RAG  ·  Medical Knowledge Retrieval

Live App HuggingFace Source

  HARD CONSTRAINT: medical domain — hallucination is patient harm
  ├── every claim needs a citation source
  ├── adversarial queries must trigger refusal, not a guess
  └── faithfulness is measured, not assumed

  FDA + NICE PDFs → 853 semantic chunks → hybrid retrieval (vector + BM25)
  RAGAS benchmark: 20 structured queries → faithfulness 0.80, overall 0.71
  zero-score failure cases logged → retrieval + refusal logic refined
Metric Result Context
RAGAS Faithfulness 0.80 Claims grounded in source
Overall RAGAS Score 0.71 Holistic, 20-query eval
Eval cost $0.168 Cost-aware, not burning tokens
Refusal behavior controlled Hallucination suppressed by design

Python FastAPI ChromaDB SentenceTransformers LangChain RAGAS Streamlit




◈   HOW I ACTUALLY BUILD

step 1  →  define what "working" means before writing a single line
step 2  →  build the evaluation harness
step 3  →  write the system
step 4  →  break it intentionally  (adversarial inputs, edge cases, drift simulation)
step 5  →  fix what breaks
step 6  →  measure again
step 7  →  deploy with monitoring hooks
step 8  →  repeat when production proves you wrong

This is how 0.895 ROC-AUC became trustworthy instead of suspicious. This is how "100% hallucination" turned out to be a metric bug, not a model bug. This is how a 3-stage correction loop beat a bigger model with a better prompt.




◈   STACK

Python SQL XGBoost LangGraph LangChain FastAPI Docker ChromaDB DuckDB PostgreSQL Streamlit GitHub Actions




◈   STATS




Typing SVG


LinkedIn   Email


$ ./parth --shutdown

[saving state]   ✓  3 systems deployed
[saving state]   ✓  all evaluation harnesses active
[saving state]   ✓  no hallucinations in production
[saving state]   ✓  open to the right problem

[goodbye]  see you on the other side of the next PR.

Pinned Loading

  1. querypilot querypilot Public

    QueryPilot -Production-ready multi‑agent Text-to-SQL API for Postgres. Schema‑aware LangGraph pipeline with ChromaDB + sentence‑transformers, Neon-backed DB, and full evaluation on real ecommerce &…

    Python

  2. upi-fraud-engine upi-fraud-engine Public

    Real-time UPI fraud detection system (0.8953 ROC-AUC) with <500ms FastAPI scoring, 480+ temporal features, and budget-aware alerts under fintech constraints

    HTML 2

  3. Evidence-Bound-Drug-RAG Evidence-Bound-Drug-RAG Public

    Evidence-grounded medical RAG system that retrieves FDA and NICE drug guidelines, generates cited answers, and safely refuses unsupported queries to minimize hallucinations.

    Python

  4. Fraud-Risk-Intelligence-System Fraud-Risk-Intelligence-System Public

    End-to-end Fraud Risk Intelligence System: data ingestion, feature engineering, supervised and unsupervised models, ensemble risk scoring, SHAP explainability, FastAPI serving, Dockerized deploymen…

    Jupyter Notebook

  5. bangalore-traffic-wasted-time-prediction bangalore-traffic-wasted-time-prediction Public

    Jupyter Notebook

  6. FlowForge FlowForge Public

    Python workflow orchestration engine for data & ML pipelines.

    Python