I build systems, not just services — distributed backend, infrastructure & AI working together at scale
Software Engineer with experience building large-scale distributed systems, AI platforms, and real-time infrastructure.
- Built 50+ production-grade microservices powering enterprise systems
- Designed event-driven architectures using Kafka, Spark & Kubernetes
- Deployed and scaled LLMs on Kubernetes using Ray & DeepSpeed
- Developed real-time voice AI systems (WebRTC + SIP + LiveKit)
- Focused on performance, observability, and production reliability
- 🧩 Distributed backend systems (Java, Spring Boot, gRPC)
- ☁️ Cloud-native infrastructure (AWS EKS, Terraform, Helm)
- ⚡ Real-time communication systems (WebRTC, LiveKit, SIP)
- 🤖 AI/LLM systems (LLMOps, RAG, agentic workflows)
- 📊 Data platforms (Spark, Kafka, Iceberg, Hive)
Backend:
Java, Spring Boot, Python (FastAPI, Flask), gRPC
Infrastructure & DevOps:
Kubernetes (EKS), Terraform, Helm, Jenkins, AWS, GCP, Ace Cloud
Realtime Systems:
WebRTC, LiveKit, SIP, STUN/TURN
Data & ML:
Apache Spark, Kafka, Feast, Kubeflow
Observability:
Grafana Stack (Mimir, Loki, Tempo, Alloy)
Real-time, low-latency voice system using WebRTC, SIP & LiveKit on AWS EKS
- Solved NAT traversal & media routing challenges
- Designed auto-scaling, fault-tolerant infrastructure
Event-driven microservices platform with AI workflows
- Multi-tenant architecture with strict data isolation
- Kafka-based async processing for scalability
- Integrated AI content generation pipelines
Deployed LLMs on Kubernetes using Ray & DeepSpeed
- Distributed inference at scale
- Integrated with enterprise ML pipelines
End-to-end ML platform with governance & feature store
- Spark-based pipelines for batch & real-time processing
- Kubeflow orchestration + Feast feature store
- 100% data lineage with Apache Atlas
- ⚡ Reduced data processing time from 20 mins → 1.5 mins
- 🚀 Achieved <1s observability alerting across 100+ services
- 🔁 Automated CI/CD pipelines with zero-error deployments
- 📈 Built systems supporting large-scale AI-driven workloads
- MLOps & AI Infrastructure
- Distributed Systems at Scale
- Real-time Communication Systems
I write about backend systems, DevOps, and AI infrastructure:
👉 https://medium.com/@akashsahani2001
- Portfolio
- GitHub
