Skip to content
View vishvaRam's full-sized avatar

Block or report vishvaRam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vishvaRam/README.md

🌌 Vishva Ram | Generative AI Engineer

LLM Optimization β€’ Hybrid RAG β€’ Agentic Systems

LinkedIn HuggingFace Docker


⚑ Professional Profile

I am a Generative AI Engineer specializing in the intersection of High-Performance Inference and Complex RAG Orchestration. With 2 years of experience, I focus on transitioning research-grade models into production-ready systems using quantization (AutoRound), vLLM optimization, and stateful agentic workflows.

  • Current Focus: Investigating compatibility between AutoRound W4A16 quantization and vLLM for Vision-Language Models.
  • Core Philosophy: "Optimization is not just about speed; it's about making advanced AI architecturally sustainable."

🧠 Core Expertise

πŸ€– LLM Optimization & Inference

  • Quantization: Advanced W4A16/INT4 quantization using AutoRound.
  • High-Throughput Serving: Production deployment of LLMs/VLMs via vLLM.
  • GPU Efficiency: Memory-efficient serving and compute optimization for NVIDIA RTX 3090/4070/5090 and AWS G5 instances.

πŸ“š RAG & Search Engineering

  • Hybrid Retrieval: Orchestrating BM25 (Elasticsearch) and Dense Vector Search.
  • Advanced RAG: Implementation of LightRAG and GraphRAG architectures for complex document intelligence.
  • Infrastructure: Building scalable backends with PostgreSQL, Neo4j, and FastAPI.

⛓️ Agentic Workflows

  • Stateful Agents: Designing multi-step reasoning pipelines using LangGraph.
  • Tool Integration: Autonomous agent systems with persistent memory and complex tool-calling capabilities.

πŸ§ͺ Contributions

πŸ›οΈ LightRAG Contributor

Active contributor to the HKUDS/LightRAG ecosystem, focusing on enterprise-grade integrations:

  • Gemini Integration: Implemented the Google Gemini demo for the core framework.
  • Storage Backends: Developed the PostgreSQL-backed LightRAG implementation.
  • Enterprise Features: Added workspace isolation demos for multi-tenant knowledge management.
  • PRs: #2538 | #2556 | #2615

βš™οΈ AutoRound + vLLM Compatibility

Technical investigation into the export pipelines for quantized Vision-Language Models:

  • Analyzing AutoRound W4A16 behavior with Qwen3-VL-8B.
  • Debugging AWQ export compatibility for seamless vLLM integration.
  • Issue: intel/auto-round #1377

πŸ’» Technical Stack

Category Tools & Technologies
AI Frameworks LangChain, LangGraph, LightRAG, PyTorch
Inference/Quant vLLM, AutoRound
Data/Search Elasticsearch, PostgreSQL, Neo4j, Redis
Infrastructure Docker, AWS (ECS, ECR, G5), RunPod
Languages Python (Advanced), SQL

πŸ“Š GitHub Performance




🌍 Connect


⭐️ Building the infrastructure that makes AI smarter, faster, and more accessible.

Pinned Loading

  1. Structured-Output-Examples-for-LLMs Structured-Output-Examples-for-LLMs Public

    This repository demonstrates structured data extraction using various language models and frameworks. It includes examples of generating JSON outputs for name and age extraction from text prompts. …

    Python 8

  2. Data-Prep-for-LLM-fine-tuning Data-Prep-for-LLM-fine-tuning Public

    This repository helps prepare datasets for fine-tuning Large Language Models (LLMs). It includes tools for cleaning, formatting, and augmenting data to improve model performance. Designed for resea…

    Jupyter Notebook 1

  3. Blog-Writing-Agentic-RAG-CrewAI Blog-Writing-Agentic-RAG-CrewAI Public

    An automated blog writing system that leverages CrewAI to create high-quality, well-researched blog posts. The project implements a multi-agent workflow for researching topics, generating content, …

    Python

  4. Unsloth-FineTuning Unsloth-FineTuning Public

    Fine-tuning Qwen 2.5 3B on Reserve Bank of India (RBI) regulations using Unsloth for efficient training. Achieved 57.6% accuracy (8.2x improvement over base model).

    Jupyter Notebook