Skip to content
View HemantBK's full-sized avatar
  • Tucson, Arizona
  • 05:18 (UTC -07:00)
  • LinkedIn in/hemantbk

Block or report HemantBK

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
HemantBK/README.md

Hemant Kumar B K

Typing SVG

LinkedIn Email arXiv


ML Engineer building production-grade AI systems with safety at the core. Currently researching Multi-Agent RL for cybersecurity at the University of Arizona and co-authoring StepShield β€” a safety benchmark for autonomous code agents (submitted to ICML 2026). Previously built recommendation engines at Escape LLC (30% engagement lift) and agentic RAG chatbots at Omdena (95% reduction in harmful responses).

I don't treat AI safety as a checkbox β€” I treat it as an engineering discipline.


πŸ”¬ Research

πŸ›‘οΈ StepShield β€” Co-Author Β  Paper ICML 2026

First benchmark for evaluating when autonomous code agents go rogue β€” not just whether they do. Detects specification violations (data exfiltration, unauthorized access) in real-time across 9,213 agent trajectories. Early detection cuts monitoring costs by 75% (~$108M projected savings).

Python PyTorch LLM Safety Red-Teaming Autonomous Agents


πŸš€ Featured Projects

πŸ›‘οΈ LLM Eval Pipeline

Production-grade LLM evaluation + red-teaming

Hybrid n8n + FastAPI architecture with 4 LLM providers, LLM-as-Judge scoring, circuit breaker, DLQ, Redis caching, Prometheus/Grafana monitoring.

Python FastAPI Redis Prometheus Red-Teaming

Code

πŸ” MLShield

ML-infra-aware defense for model weights

Protects against model-weight exfiltration using a 3-layer cascaded architecture (Rules β†’ ML β†’ LLM). Kubernetes-native, GPU-aware anomaly detection.

Python Kubernetes Model Security Anomaly Detection

Code

βš–οΈ LLM Bias Sentinel

7-benchmark bias evaluation + guardrails

Open-source LLM bias evaluation framework with red-teaming, guardrails, and monitoring β€” all running locally via Ollama. Zero API costs.

Python Ollama Red-Teaming Guardrails Responsible AI

Code

πŸ’° Dynamic Pricing Engine

Production-grade ML pricing system

XGBoost demand forecasting + price elasticity estimation + scipy revenue optimization. FastAPI serving, Streamlit dashboard, MLflow tracking, Evidently drift monitoring.

Python XGBoost FastAPI MLflow Streamlit

Code

πŸ—£οΈ AI Voice Assistant

Full-stack speech pipeline: STT β†’ LLM β†’ TTS

End-to-end voice assistant running entirely on your own machine β€” FastAPI backend, React frontend, Docker. Private by design: zero cloud calls.

Python FastAPI React Docker LLM

Code

🏍️ RideShala

AI motorcycle advisor for Indian riders

RAG over motorcycle specs with vLLM serving, Qdrant vector store, FastAPI. Personalized bike recommendations with source citations.

Python vLLM RAG Qdrant FastAPI

Code


πŸ“‚ All Projects

πŸ›‘οΈ AI Safety & Responsible AI

  • chatbot-auditor β€” Quality auditor for AI chatbots; analyzes conversation logs to surface where bots underperform.
  • credit-scoring-fairness-mlops β€” End-to-end MLOps with automated fairness gates, drift monitoring, EU AI Act compliance (XGBoost, Fairlearn, MLflow).
  • healthcare-bias-audit β€” Bias audit of healthcare ML on the MEPS dataset; AIF360 mitigation, SHAP/LIME explainability.

πŸ€– LLM Systems & RAG

  • AI-Chief β€” Food science assistant with multi-agent RAG, real-time safety monitoring, dangerous-advice detection (TypeScript, Fastify, HNSW).
  • Interactive-Multilingual-AI-Audiobook-Assistant β€” OCR extraction β†’ neural TTS β†’ multilingual translation β†’ real-time Q&A audiobook pipeline.
  • AI-Wildlife-Tracker β€” RAG identifying 500+ Indian wildlife species from text or photos; hybrid retrieval, ONNX inference, Langfuse observability.

βš™οΈ Applied ML & MLOps


πŸ› οΈ Tech Stack

πŸ’» Languages

Python C++ SQL Java Bash

πŸ€– ML / DL

PyTorch TensorFlow HuggingFace scikit-learn W&B

🧠 LLM & Agents

LangChain vLLM Ollama ONNX CrewAI RAG

πŸ› οΈ MLOps / Cloud

AWS GCP Docker Kubernetes FastAPI MLflow GitHub Actions

πŸ“Š Observability

Prometheus Grafana Evidently Langfuse

πŸ›‘οΈ AI Safety & Responsible AI

Red Teaming AIF360 Fairlearn SHAP Guardrails

πŸ—„οΈ Data

PostgreSQL Redis Pandas Power BI Tableau


Open to ML Engineer, AI Safety, and AI Researcher roles β€” remote & relocation
Let's build AI systems that are powerful AND trustworthy.

Pinned Loading

  1. dynamic-pricing-engine dynamic-pricing-engine Public

    Production-grade ML pricing system β€” XGBoost demand forecasting + price elasticity estimation + scipy revenue optimization. FastAPI serving, Streamlit dashboard, MLflow tracking, Evidently drift mo…

    Python

  2. AI-Voice-Assistant AI-Voice-Assistant Public

    A voice assistant that runs entirely on your own computer instead of the cloud. It lets you talk to an AI and hear a natural-sounding response back in real-time, keeping your voice data 100% privat…

    Python

  3. Multilingual-Sentiment-Emotion-Intelligence-Engine Multilingual-Sentiment-Emotion-Intelligence-Engine Public

    Production-grade multilingual sentiment & emotion analysis engine covering 5 languages + Hindi-English code-switching. Multi-task XLM-RoBERTa with LoRA adapters, ONNX INT8 inference, and cross-ling…

    Python

  4. Algorithmic-Trading-AI Algorithmic-Trading-AI Public

    AI-powered algorithmic trading system that combines FinBERT sentiment analysis, spaCy NER, and TimeGPT forecasting to generate BUY/SELL/HOLD signals from real-time financial news.

    Jupyter Notebook 18

  5. LLaMA-Sum-Fine-Tuning LLaMA-Sum-Fine-Tuning Public

    Fine-tuned Meta's LLaMA 3.2 1B for text summarization using QLoRA (4-bit quantization + LoRA), achieving 40%+ improvement in ROUGE-2 over the base model on CNN/DailyMail dataset.

    Python 3

  6. MLShield MLShield Public

    ML-infrastructure-aware anomaly detection system for protecting model weights against exfiltration, using a 3-layer cascaded architecture (Rules β†’ ML β†’ LLM).

    Python