I am a Generative AI Engineer specializing in the intersection of High-Performance Inference and Complex RAG Orchestration. With 2 years of experience, I focus on transitioning research-grade models into production-ready systems using quantization (AutoRound), vLLM optimization, and stateful agentic workflows.
- Current Focus: Investigating compatibility between AutoRound W4A16 quantization and vLLM for Vision-Language Models.
- Core Philosophy: "Optimization is not just about speed; it's about making advanced AI architecturally sustainable."
- Quantization: Advanced W4A16/INT4 quantization using AutoRound.
- High-Throughput Serving: Production deployment of LLMs/VLMs via vLLM.
- GPU Efficiency: Memory-efficient serving and compute optimization for NVIDIA RTX 3090/4070/5090 and AWS G5 instances.
- Hybrid Retrieval: Orchestrating BM25 (Elasticsearch) and Dense Vector Search.
- Advanced RAG: Implementation of LightRAG and GraphRAG architectures for complex document intelligence.
- Infrastructure: Building scalable backends with PostgreSQL, Neo4j, and FastAPI.
- Stateful Agents: Designing multi-step reasoning pipelines using LangGraph.
- Tool Integration: Autonomous agent systems with persistent memory and complex tool-calling capabilities.
Active contributor to the HKUDS/LightRAG ecosystem, focusing on enterprise-grade integrations:
- Gemini Integration: Implemented the Google Gemini demo for the core framework.
- Storage Backends: Developed the PostgreSQL-backed LightRAG implementation.
- Enterprise Features: Added workspace isolation demos for multi-tenant knowledge management.
- PRs: #2538 | #2556 | #2615
Technical investigation into the export pipelines for quantized Vision-Language Models:
- Analyzing AutoRound W4A16 behavior with Qwen3-VL-8B.
- Debugging AWQ export compatibility for seamless vLLM integration.
- Issue: intel/auto-round #1377
| Category | Tools & Technologies |
|---|---|
| AI Frameworks | LangChain, LangGraph, LightRAG, PyTorch |
| Inference/Quant | vLLM, AutoRound |
| Data/Search | Elasticsearch, PostgreSQL, Neo4j, Redis |
| Infrastructure | Docker, AWS (ECS, ECR, G5), RunPod |
| Languages | Python (Advanced), SQL |
- LinkedIn: vishva-r
- Instagram: @justt_vishva
- Portfolio: GitHub Repositories
βοΈ Building the infrastructure that makes AI smarter, faster, and more accessible.
