Vishva R vishvaRam

🌌 Vishva Ram | Generative AI Engineer

LLM Optimization • Hybrid RAG • Agentic Systems

⚡ Professional Profile

I am a Generative AI Engineer specializing in the intersection of High-Performance Inference and Complex RAG Orchestration. With 2 years of experience, I focus on transitioning research-grade models into production-ready systems using quantization (AutoRound), vLLM optimization, and stateful agentic workflows.

Current Focus: Investigating compatibility between AutoRound W4A16 quantization and vLLM for Vision-Language Models.
Core Philosophy: "Optimization is not just about speed; it's about making advanced AI architecturally sustainable."

🧠 Core Expertise

🤖 LLM Optimization & Inference

Quantization: Advanced W4A16/INT4 quantization using AutoRound.
High-Throughput Serving: Production deployment of LLMs/VLMs via vLLM.
GPU Efficiency: Memory-efficient serving and compute optimization for NVIDIA RTX 3090/4070/5090 and AWS G5 instances.

📚 RAG & Search Engineering

Hybrid Retrieval: Orchestrating BM25 (Elasticsearch) and Dense Vector Search.
Advanced RAG: Implementation of LightRAG and GraphRAG architectures for complex document intelligence.
Infrastructure: Building scalable backends with PostgreSQL, Neo4j, and FastAPI.

⛓️ Agentic Workflows

Stateful Agents: Designing multi-step reasoning pipelines using LangGraph.
Tool Integration: Autonomous agent systems with persistent memory and complex tool-calling capabilities.

🧪 Contributions

🏛️ LightRAG Contributor

Active contributor to the HKUDS/LightRAG ecosystem, focusing on enterprise-grade integrations:

Gemini Integration: Implemented the Google Gemini demo for the core framework.
Storage Backends: Developed the PostgreSQL-backed LightRAG implementation.
Enterprise Features: Added workspace isolation demos for multi-tenant knowledge management.
PRs: #2538 | #2556 | #2615

⚙️ AutoRound + vLLM Compatibility

Technical investigation into the export pipelines for quantized Vision-Language Models:

Analyzing AutoRound W4A16 behavior with Qwen3-VL-8B.
Debugging AWQ export compatibility for seamless vLLM integration.
Issue: intel/auto-round #1377

💻 Technical Stack

Category	Tools & Technologies
AI Frameworks	LangChain, LangGraph, LightRAG, PyTorch
Inference/Quant	vLLM, AutoRound
Data/Search	Elasticsearch, PostgreSQL, Neo4j, Redis
Infrastructure	Docker, AWS (ECS, ECR, G5), RunPod
Languages	Python (Advanced), SQL

📊 GitHub Performance

🌍 Connect

LinkedIn: vishva-r
Instagram: @justt_vishva
Portfolio: GitHub Repositories

⭐️ Building the infrastructure that makes AI smarter, faster, and more accessible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly