Nishad D Ndhakeph

Nishad Dhakephalkar

AI/GenAI engineer building production-oriented AI systems - RAG, agents, evaluation, and safety. Computer Engineering @ I²IT Pune · Grad 2027 · Pune, India

I care more about the parts of an AI system that decide whether it actually works - retrieval quality, evaluation, safety, and orchestration - than about model hype. Most of what I build is an attempt to make LLM behaviour reliable and measurable, not just demo-able.

Selected work

AI Evaluation Platform · live demo ↗ An LLM-as-judge that scores outputs against a rubric with per-criterion written reasoning, plus pairwise A/B comparison that runs both orderings (A-B and B-A) to neutralise the judge's own position bias and flags when the two disagree. Next.js · TypeScript · FastRouter · Supabase

ServiceBench · live on 🤗 Spaces ↗ An OpenEnv-compatible environment for training LLM agents to orchestrate calls across three interconnected backend services. The agent has to traverse user → order → inventory foreign keys in the right order, not just call one tool in isolation, with dense reward shaping over milestones and completion. Built for the Meta × Hugging Face × PyTorch hackathon. Python · FastAPI · Docker · HF Spaces

RAG Knowledge Assistant Local-first document Q&A that runs fully offline - Ollama/Gemma for inference, Supabase pgvector (HNSW) for retrieval, and a hybrid reranker that blends vector similarity with keyword, position, and recency signals. No API keys, nothing leaves the machine. Next.js · LangChain.js · Ollama · pgvector

AI Safety Harness A red-team platform that runs adversarial prompts through a five-layer guardrail pipeline - jailbreaks, prompt injection, harmful content, role manipulation, encoding tricks - and scores which layer caught or missed each attack, with incident logging. Python · FastAPI · Docker · PostgreSQL

Multi-Agent Content Pipeline Four coordinated agents - researcher → writer → fact-checker → polisher - with quality gates and a revision loop where the fact-checker can send a draft back to the writer under a bounded retry budget. Next.js · LangChain · Gemini · Tavily

GST Shield · live demo ↗ Scans receipts with Claude Vision to extract GSTINs, then validates them deterministically - format, state code, and mod-36 checksum - instead of trusting the model's raw output. Built at a hackathon. Next.js · Claude Vision · Supabase

Currently

Deepening the evaluation and agent-eval work - single-output scoring, pairwise judging, position-bias mitigation - and digging into retrieval-quality metrics for RAG.

Reach me

ndhakeph@gmail.com · LinkedIn · 🤗 Hugging Face

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nishad D Ndhakeph

Block or report Ndhakeph

Nishad Dhakephalkar

Selected work

Currently

Reach me

Pinned Loading

Uh oh!