@@ -20,121 +22,95 @@
## What is BharatMLStack?
-BharatMLStack is a comprehensive, production-ready machine learning infrastructure platform designed to democratize ML capabilities across India and beyond. Our mission is to provide a robust, scalable, and accessible ML stack that empowers organizations to build, deploy, and manage machine learning solutions at massive scale.
+BharatMLStack is a production-ready, cloud-agnostic ML infrastructure platform that powers real-time feature serving, model inference, and embedding search at massive scale. Built and battle-tested at [Meesho](https://meesho.com), it is designed to help organizations ship ML to production faster, cheaper, and more reliably.
## Our Vision
-- 🎯 **Democratize Machine Learning**: Make advanced ML infrastructure accessible to organizations of all sizes
-- 🚀 **Scale Without Limits**: Built to handle millions of requests per second with enterprise-grade reliability
-- 🇮🇳 **Bharat-First Approach**: Optimized for Indian market needs while maintaining global standards
-- ⚡ **Real-Time Intelligence**: Enable instant decision-making with sub-millisecond feature serving
-- 🔧 **Developer-Friendly**: Intuitive APIs and interfaces that accelerate ML development cycles
+BharatMLStack is built around **four core tenets**:
-## Star History
+### Workflow Integration & Productivity
+> Ship ML to production faster than ever.
-[](https://www.star-history.com/#Meesho/BharatMLStack&Date)
+- **3x faster** experiment-to-deployment cycles
+- **95% reduction** in model onboarding time
-## Running at Million Scale
+### Cloud-Agnostic & Lock-In Free
+> Run anywhere. Own your stack.
-BharatMLStack is battle-tested in production environments, powering:
-- **1M+ feature vector retrievals per second** across distributed deployments
-- **Sub-10ms latency** for real-time feature retrieval
-- **99.99% uptime** with auto-scaling and fault tolerance
-- **Petabyte-scale** feature storage and processing
-- **Multi-region deployments** with global load balancing
+- Runs across **public cloud, on-prem, and edge**
+- Kubernetes-native with zero vendor lock-in
-## Document
-- [Doc](https://meesho.github.io/BharatMLStack/)
-- [Blogs](https://meesho.github.io/BharatMLStack/blog)
-## Core Components
+### Economic Efficiency
+> Do more with less.
-### 📋 Current Releases
-
-| Component | Version | Description |
-|-----------|---------|-------------|
-| 🚀 **Horizon** | `v1.0.0` | Control Plane & Backend |
-| 🎨 **Trufflebox UI** | `v1.0.0` | ML Management Console |
-| 🗄️ **Online Feature Store** | `v1.0.0` | Real-Time Features |
-| 🐹 **Go SDK** | `v1.0.0` | Go Client Library |
-| 🐍 **Python SDK** | `v1.0.1` | Python Client Library |
-| 🚀 **Numerix** | `v1.0.0` | Mathematical Compute Engine |
-
-### 🚀 Horizon - Control Plane & Backend
-The central control plane for BharatMLStack components, serving as the backend for Trufflebox UI.
-- **Component orchestration**: Manages and coordinates all BharatMLStack services
-- **API gateway**: Unified interface for all MLOps and workflows
-
-### 🎨 Trufflebox UI - ML Management Console
-Modern web interface for managing ML models, features, and experiments. Currently it supports:
-- **Feature Registry**: Centralized repository for feature definitions and metadata
-- **Feature Cataloging**: Discovery and search capabilities for available features
-- **Online Feature Store Control System**: Management interface for feature store operations
-- **Approval Flows**: Workflow management for feature deployment and changes
-
-### 🗄️ Online Feature Store - Real-Time Features
-High-performance feature store for real-time ML inference and training.
-- **Real-time serving**: Sub-10ms feature retrieval at scale
-- **Streaming ingestion**: Process millions of feature updates per second
-- **Feature Backward Compatible Versioning**: Track and manage feature evolution
-- **Multi-source integration**: Push from stream, batch and real-time sources
-
-### 🗄️ Numerix - Mathematical Compute Engine
-High-performance feature store for real-time ML inference and training.
-- **Matrix Operations**: High-performance matrix computations and transformations
-- **gRPC API**: Fast binary protocol for efficient data transfer
-- **Multi-format Support**: String and byte-based matrix formats
-- **Optimized Performance**: Built with Rust for maximum efficiency
-- **Scalable Architecture**: Designed for distributed processing
-
-## Key Differentiators
-
-- ✨ **Production-Ready**: Battle-tested components used in high-traffic production systems
-- 🌐 **Cloud Agnostic**: Kubernetes-native, so deploy on the cloud you love
-- 📊 **Observability**: Built-in monitoring, logging
+- **60–70% lower** infrastructure costs vs hyperscaler managed services
+- Optimized resource utilization across CPU and GPU workloads
-## Quick Start
+### Availability & Scalability
+> Enterprise-grade reliability at internet scale.
-🚀 **Get started with BharatMLStack in minutes!**
+- **99.99% uptime** across clusters
+- **1M+ QPS** with low latency
-For comprehensive setup instructions, examples, and deployment guides, see our detailed Quick Start documentation:
+## Designed Truly for Bharat Scale
-📖 **[Quick Start Guide →](./quick-start/README.md)**
+Built for the demands of one of the world's largest e-commerce platforms:
-### What You'll Find:
+| Metric | Performance |
+|--------|-------------|
+| **Feature Store** | 2.4M QPS (batch of 100 id lookups) |
+| **Model Inference** | 1M+ QPS |
+| **Embedding Search** | 500K QPS |
+| **Feature Retrieval Latency** | Sub-10ms |
-- **🐳 Docker Setup**: Complete stack deployment with Docker Compose
-- **📊 Sample Data**: Pre-configured examples to get you started
-- **🔍 Health Checks**: Verify your deployment is working
-- **📝 Step-by-Step Tutorials**: From installation to first feature operations
+## Core Components
-### TL;DR - One Command Setup:
+| Component | Description | Version | Docs |
+|-----------|-------------|---------|------|
+| **[TruffleBox UI](./trufflebox-ui/)** | Web console for feature registry, cataloging, and approval workflows | `v1.3.0` | [Docs](https://meesho.github.io/BharatMLStack/trufflebox-ui/v1.0.0/userguide) |
+| **[Online Feature Store](./online-feature-store/)** | Sub-10ms feature retrieval at millions of QPS with streaming ingestion | `v1.2.0` | [Docs](https://meesho.github.io/BharatMLStack/category/online-feature-store) |
+| **[Inferflow](./inferflow/)** | DAG-based real-time inference orchestration for composable ML pipelines | `v1.0.0` | [Docs](https://meesho.github.io/BharatMLStack/category/inferflow) |
+| **[Numerix](./numerix/)** | Rust-powered math compute engine for high-performance matrix ops | `v1.0.0` | [Docs](https://meesho.github.io/BharatMLStack/category/numerix) |
+| **[Skye](./skye/)** | Vector similarity search with pluggable backends | `v1.0.0` | [Docs](https://meesho.github.io/BharatMLStack/category/skye) |
+| **[Go SDK](./go-sdk/)** | Go client for Feature Store, Interaction Store, and logging | `v1.3.0` | [Docs](https://meesho.github.io/BharatMLStack/category/go-sdk) |
+| **[Python SDK](./py-sdk/)** | Python client libraries for Feature Store and inference logging | `v1.0.1` | [Docs](https://meesho.github.io/BharatMLStack/category/python-sdk) |
+| **[Interaction Store](./interaction-store/)** | ScyllaDB-backed store for user interaction signals at sub-10ms | — | — |
+| **[Horizon](./horizon/)** | Control plane that orchestrates all services and powers TruffleBox UI | `v1.3.0` | — |
+
+> Full documentation at [meesho.github.io/BharatMLStack](https://meesho.github.io/BharatMLStack/) | [Blogs](https://meesho.github.io/BharatMLStack/blog)
+- [All Blog Posts](https://meesho.github.io/BharatMLStack/blog)
+
+## Quick Start
```bash
-# Clone and start the complete stack
git clone https://github.com/Meesho/BharatMLStack.git
cd BharatMLStack/quick-start
-ONFS_VERSION= HORIZON_VERSION= TRUFFLEBOX_VERSION= NUMERIX_VERSION= ./start.sh
+#Set versions
+ONFS_VERSION=v1.2.0 HORIZON_VERSION=v1.3.0 TRUFFLEBOX_VERSION=v1.3.0 NUMERIX_VERSION=v1.0.0
+
+./start.sh
```
-Then follow the [Quick Start Guide](./quick-start/README.md) for detailed setup and usage instructions.
+For step-by-step setup, Docker Compose details, sample data, and health checks, see the full **[Quick Start Guide →](./quick-start/README.md)**.
## Architecture
-BharatMLStack follows a microservices architecture designed for scalability and maintainability. Several components are to be open-sourced
-
-
+
-### 🚀 Quick Navigation
+## Use-Cases
+
+BharatMLStack powers a wide range of ML-driven applications:
-| Component | Documentation | Quick Start |
-|-----------|--------------|-------------|
-| **Online Feature Store** | [Docs](https://meesho.github.io/BharatMLStack/category/online-feature-store) | [Setup](./quick-start/README.md) |
-| **Go SDK** | [Docs](./go-sdk/README.md) | [Examples](./go-sdk/README.md) |
-| **Python SDK** | [Docs](./py-sdk/README.md) | [Quickstart](./py-sdk/README.md) |
-| **User Guide** | [Docs](https://meesho.github.io/BharatMLStack/trufflebox-ui/v1.0.0/userguide) | [Setup](./quick-start/README.md) |
-| **Numerix** | [Docs](https://meesho.github.io/BharatMLStack/category/numerix) | [Setup](./quick-start/README.md) |
+| Use-Case | What BharatMLStack Enables |
+|----------|---------------------------|
+| **Personalized Candidate Generation** | Retrieve and rank millions of candidates in real time using feature vectors and embedding similarity |
+| **Personalized Ranking** | Serve user, item, and context features at ultra-low latency to power real-time ranking models |
+| **Fraud & Risk Detection** | Stream interaction signals and features to detect anomalies and fraudulent patterns in milliseconds |
+| **Image Search** | Run embedding search at 500K QPS to match visual queries against massive product catalogs |
+| **LLM Recommender Systems** | Orchestrate LLM inference pipelines with feature enrichment for next-gen recommendation engines |
+| **DL & LLM Deployments at Scale** | Deploy and scale deep learning and large language models across GPU clusters with Inferflow orchestration |
## Contributing
@@ -142,9 +118,9 @@ We welcome contributions from the community! Please see our [Contributing Guide]
## Community & Support
-- 💬 **Discord**: Join our [community chat](https://discord.gg/XkT7XsV2AU)
-- 🐛 **Issues**: Report bugs and request features on [GitHub Issues](https://github.com/Meesho/BharatMLStack/issues)
-- 📧 **Email**: Contact us at [ml-oss@meesho.com](mailto:ml-oss@meesho.com )
+- **Discord**: Join our [community chat](https://discord.gg/XkT7XsV2AU)
+- **Issues**: Report bugs and request features on [GitHub Issues](https://github.com/Meesho/BharatMLStack/issues)
+- **Email**: Contact us at [ml-oss@meesho.com](mailto:ml-oss@meesho.com)
## License
diff --git a/assets/bharatmlstack-architecture.png b/assets/bharatmlstack-architecture.png
new file mode 100644
index 00000000..afa5b787
Binary files /dev/null and b/assets/bharatmlstack-architecture.png differ
diff --git a/assets/bharatmlstack-logo.png b/assets/bharatmlstack-logo.png
new file mode 100644
index 00000000..9756a1ec
Binary files /dev/null and b/assets/bharatmlstack-logo.png differ
diff --git a/docs-src/docs/inferflow/v1.0.0/_category_.json b/docs-src/docs/inferflow/v1.0.0/_category_.json
index 0641455e..3c72a212 100644
--- a/docs-src/docs/inferflow/v1.0.0/_category_.json
+++ b/docs-src/docs/inferflow/v1.0.0/_category_.json
@@ -1,9 +1,4 @@
{
- "label": "v1.0.0",
- "position": 1,
- "link": {
- "type": "generated-index",
- "description": "Inferflow v1.0.0",
- "slug": "/inferflow/v1.0.0"
- }
+ "label": "v1.0.0",
+ "position": 1
}
diff --git a/docs-src/docs/inferflow/v1.0.0/index.md b/docs-src/docs/inferflow/v1.0.0/index.md
new file mode 100644
index 00000000..abb59d61
--- /dev/null
+++ b/docs-src/docs/inferflow/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Inferflow v1.0.0
+sidebar_position: 0
+slug: /inferflow/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Inferflow v1.0.0
+
+Inferflow is a graph-driven feature retrieval and model inference orchestration engine. It dynamically resolves entity relationships via configurable DAGs, retrieves features from the Online Feature Store, and orchestrates model scoring.
+
+
diff --git a/docs-src/docs/intro.md b/docs-src/docs/intro.md
new file mode 100644
index 00000000..4a56ea71
--- /dev/null
+++ b/docs-src/docs/intro.md
@@ -0,0 +1,57 @@
+---
+sidebar_position: 0
+title: BharatMLStack Documentation
+slug: intro
+---
+
+# BharatMLStack Documentation
+
+Welcome to the BharatMLStack documentation. BharatMLStack is an open-source, end-to-end ML infrastructure stack built for scale, speed, and simplicity. Explore the components below to get started.
+
+---
+
+## Quick Start
+
+Get up and running with BharatMLStack in minutes. Step-by-step instructions, sample data, and Docker Compose setup for local development and testing.
+
+**[Go to Quick Start →](/category/quick-start)**
+
+---
+
+## Online Feature Store
+
+Sub-10ms, high-throughput access to machine learning features for real-time inference. Supports batch and streaming ingestion, schema validation, and compact versioned feature groups.
+
+**[Go to Online Feature Store →](/category/online-feature-store)**
+
+---
+
+## Inferflow
+
+Graph-driven feature retrieval and model inference orchestration engine. Dynamically resolves entity relationships, retrieves features, and orchestrates model scoring — all without custom code.
+
+**[Go to Inferflow →](/category/inferflow)**
+
+---
+
+## Trufflebox UI
+
+Modern, feature-rich UI framework for MLOps management. Supports feature catalog, user management, and admin operations with approval flows.
+
+**[Go to Trufflebox UI →](/category/trufflebox-ui)**
+
+---
+
+## SDKs
+
+Client libraries for Go and Python to interact with the Online Feature Store and other platform components. Includes gRPC clients, REST APIs, and Apache Spark integration.
+
+**[Go to SDKs →](/category/sdks)**
+
+---
+
+## Numerix
+
+High-performance compute engine for ultra-fast element-wise matrix operations. Built in Rust with SIMD acceleration for sub-5ms p99 latency.
+
+**[Go to Numerix →](/category/numerix)**
diff --git a/docs-src/docs/numerix/_category_.json b/docs-src/docs/numerix/_category_.json
index 7c2d4af0..2340ae40 100644
--- a/docs-src/docs/numerix/_category_.json
+++ b/docs-src/docs/numerix/_category_.json
@@ -1,6 +1,6 @@
{
"label": "Numerix",
- "position": 6,
+ "position": 7,
"link": {
"type": "generated-index",
"description": "Numerix is a mathematical compute engine for BharatML Stack. It is used to perform mathematical operations on matrices and vectors."
diff --git a/docs-src/docs/numerix/v1.0.0/_category_.json b/docs-src/docs/numerix/v1.0.0/_category_.json
index 4748f653..66455a9e 100644
--- a/docs-src/docs/numerix/v1.0.0/_category_.json
+++ b/docs-src/docs/numerix/v1.0.0/_category_.json
@@ -1,10 +1,5 @@
{
"label": "v1.0.0",
- "position": 1,
- "link": {
- "type": "generated-index",
- "description": "Numerix v1.0.0",
- "slug": "/numerix/v1.0.0"
- }
+ "position": 1
}
diff --git a/docs-src/docs/numerix/v1.0.0/index.md b/docs-src/docs/numerix/v1.0.0/index.md
new file mode 100644
index 00000000..1307fef7
--- /dev/null
+++ b/docs-src/docs/numerix/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Numerix v1.0.0
+sidebar_position: 0
+slug: /numerix/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Numerix v1.0.0
+
+Numerix is a mathematical compute engine for BharatML Stack. It is used to perform mathematical operations on matrices and vectors.
+
+
diff --git a/docs-src/docs/online-feature-store/v1.0.0/_category_.json b/docs-src/docs/online-feature-store/v1.0.0/_category_.json
index 4fec8dcc..4e1f685c 100644
--- a/docs-src/docs/online-feature-store/v1.0.0/_category_.json
+++ b/docs-src/docs/online-feature-store/v1.0.0/_category_.json
@@ -1,9 +1,4 @@
{
- "label": "v1.0.0",
- "position": 1,
- "link": {
- "type": "generated-index",
- "description": "Online Feature Store v1.0.0",
- "slug": "/online-feature-store/v1.0.0"
- }
+ "label": "v1.0.0",
+ "position": 1
}
\ No newline at end of file
diff --git a/docs-src/docs/online-feature-store/v1.0.0/index.md b/docs-src/docs/online-feature-store/v1.0.0/index.md
new file mode 100644
index 00000000..b790c081
--- /dev/null
+++ b/docs-src/docs/online-feature-store/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Online Feature Store v1.0.0
+sidebar_position: 0
+slug: /online-feature-store/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Online Feature Store v1.0.0
+
+A high-performance, scalable, and production-grade feature store built for modern machine learning systems. It supports both real-time and batch workflows, with low-latency feature retrieval.
+
+
diff --git a/docs-src/docs/predator/_category_.json b/docs-src/docs/predator/_category_.json
new file mode 100644
index 00000000..576eb122
--- /dev/null
+++ b/docs-src/docs/predator/_category_.json
@@ -0,0 +1,8 @@
+{
+ "label": "Predator",
+ "position": 7,
+ "link": {
+ "type": "generated-index",
+ "description": "Predator is a scalable, high-performance model inference service built as a wrapper around NVIDIA Triton Inference Server, designed to serve ML models with low latency in Kubernetes, with OnFS and Interflow integration."
+ }
+}
diff --git a/docs-src/docs/predator/v1.0.0/_category_.json b/docs-src/docs/predator/v1.0.0/_category_.json
new file mode 100644
index 00000000..3c72a212
--- /dev/null
+++ b/docs-src/docs/predator/v1.0.0/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "v1.0.0",
+ "position": 1
+}
diff --git a/docs-src/docs/predator/v1.0.0/architecture.md b/docs-src/docs/predator/v1.0.0/architecture.md
new file mode 100644
index 00000000..e337ae4f
--- /dev/null
+++ b/docs-src/docs/predator/v1.0.0/architecture.md
@@ -0,0 +1,201 @@
+---
+title: Architecture
+sidebar_position: 1
+---
+
+# BharatMLStack - Predator
+
+Predator is a scalable, high-performance model inference service built as a wrapper around the **NVIDIA Triton Inference Server**. It is designed to serve a variety of machine learning models (Deep Learning, Tree-based, etc.) with low latency in a **Kubernetes (K8s)** environment.
+
+The system integrates seamlessly with the **Online Feature Store (OnFS)** for real-time feature retrieval and uses **Horizon** as the deployment orchestration layer. Deployments follow a **GitOps** pipeline — Horizon generates Helm configurations, commits them to GitHub, and **Argo Sync** reconciles the desired state onto Kubernetes.
+
+---
+
+## High-Level Design
+
+
+
+### End-to-End Flow
+
+1. **Model Deployment Trigger**: An actor initiates deployment through **Trufflebox UI**, specifying the GCS path (`gcs://`) of the trained model. Separately, post-training pipelines write model artifacts to **GCS Artifactory**.
+
+2. **Orchestration via Horizon**: Trufflebox UI communicates with **Horizon**, the deployment orchestration layer. Horizon generates the appropriate **Helm** chart configuration for the inference service.
+
+3. **GitOps Pipeline**: Horizon commits the Helm values to a **GitHub** repository. **Argo Sync** watches the repo and reconciles the desired state onto the Kubernetes cluster, creating or updating deployable units.
+
+4. **Deployable Units (Deployable 1 … N)**: Each deployable is an independent Kubernetes deployment that:
+ - Downloads model artifacts from **GCS** at startup via an `init.sh` script.
+ - Launches a **Triton Inference Server** instance loaded with the model.
+ - Runs one or more pods, each containing the inference runtime and configured backends.
+
+5. **Triton Backends**: Each Triton instance supports pluggable backends based on the model type:
+ - **FIL** — GPU-accelerated tree-based models (XGBoost, LightGBM, Random Forest).
+ - **PyTorch** — Native PyTorch models via LibTorch.
+ - **Python** — Custom preprocessing/postprocessing or unsupported model formats.
+ - **TRT (TensorRT)** — GPU-optimized serialized TensorRT engines.
+ - **ONNX** — Framework-agnostic execution via ONNX Runtime.
+ - **DALI** — GPU-accelerated data preprocessing (image, audio, video).
+
+6. **Autoscaling with KEDA**: The cluster uses **KEDA** (Kubernetes Event-Driven Autoscaling) to scale deployable pods based on custom metrics (CPU utilization, GPU utilization via DCGM, queue depth, etc.). The underlying **Kubernetes** scheduler places pods across GPU/CPU node pools.
+
+### Key Design Principles
+
+- **GitOps-driven**: All deployment state is version-controlled in Git; Argo Sync ensures cluster state matches the declared configuration.
+- **Isolation per deployable**: Each model or model group gets its own deployable unit, preventing noisy-neighbor interference.
+- **Init-based model loading**: Models are materialized to local disk before Triton starts, ensuring deterministic startup and no runtime dependency on remote storage.
+- **Pluggable backends**: The same infrastructure serves deep learning, tree-based, and custom models through Triton's backend abstraction.
+
+---
+
+## Inference Engine: Triton Inference Server
+
+NVIDIA Triton Inference Server is a high-performance model serving system designed to deploy ML and deep learning models at scale across CPUs and GPUs. It provides a unified inference runtime that supports multiple frameworks, optimized execution, and production-grade scheduling.
+
+Triton operates as a standalone server that loads models from a model repository and exposes standardized HTTP/gRPC APIs. Predator uses **gRPC** for efficient request and response handling via the **helix client**.
+
+### Core Components
+
+- **Model Repository**: Central directory where models are stored. Predator typically materializes the model repository onto local disk via an init container, enabling fast model loading and eliminating runtime dependency on remote storage during inference.
+
+### Backends
+
+A backend is the runtime responsible for executing a model. Each model specifies which backend runs it via configuration.
+
+| Backend | Description |
+|---------|-------------|
+| **TensorRT** | GPU-optimized; executes serialized TensorRT engines (kernel fusion, FP16/INT8). |
+| **PyTorch** | Serves native PyTorch models via LibTorch. |
+| **ONNX Runtime** | Framework-agnostic ONNX execution with TensorRT and other accelerators. |
+| **TensorFlow** | Runs TensorFlow SavedModel format. |
+| **Python backend** | Custom Python code for preprocessing, postprocessing, or unsupported models. |
+| **Custom backends** | C++/Python backends for specialized or proprietary runtimes. |
+| **DALI** | GPU-accelerated data preprocessing (image, audio, video). |
+| **FIL (Forest Inference Library)** | GPU-accelerated tree-based models (XGBoost, LightGBM, Random Forest). |
+
+### Key Features
+
+- **Dynamic batching**: Combines multiple requests into a single batch at runtime — higher GPU utilization, improved throughput, reduced latency variance.
+- **Concurrent model execution**: Run multiple models or multiple instances of the same model; distribute load across GPUs.
+- **Model versioning**: Support multiple versions per model.
+- **Ensemble models**: Pipeline of models as an ensemble; eliminates intermediate network hops, reduces latency.
+- **Model instance scaling**: Multiple copies of a model for parallel inference and load isolation.
+- **Observability**: Prometheus metrics, granular latency, throughput, GPU utilization.
+- **Warmup requests**: Preload kernels and avoid cold-start latency.
+
+---
+
+## Model Repository Structure
+
+```
+model_repository/
+├── model_A/
+│ ├── config.pbtxt
+│ ├── 1/
+│ │ └── model.plan
+│ ├── 2/
+│ │ └── model.plan
+├── model_B/
+│ ├── config.pbtxt
+│ ├── 1/
+│ └── model.py
+```
+
+The `config.pbtxt` file defines how Triton loads and executes a model: input/output tensors, batch settings, hardware execution, backend runtime, and optimization parameters. At minimum it defines: `backend/platform`, `max_batch_size`, `inputs`, `outputs`.
+
+### Sample config.pbtxt
+
+```text
+name: "product_ranking_model"
+platform: "tensorrt_plan"
+max_batch_size: 64
+input [ { name: "input_embeddings" data_type: TYPE_FP16 dims: [ 128 ] }, { name: "context_features" data_type: TYPE_FP32 dims: [ 32 ] } ]
+output [ { name: "scores" data_type: TYPE_FP32 dims: [ 1 ] } ]
+instance_group [ { kind: KIND_GPU count: 2 gpus: [0] } ]
+dynamic_batching { preferred_batch_size: [8,16,32,64] max_queue_delay_microseconds: 2000 }
+```
+
+---
+
+## Kubernetes Deployment Architecture
+
+Predator inference services are deployed on Kubernetes using **Helm-based** deployments for standardized, scalable, GPU-optimized model serving. Each deployment consists of Triton Inference Server wrapped within a Predator runtime, with autoscaling driven by CPU and GPU utilization.
+
+### Pod Architecture
+
+```
+Predator Pod
+├── Init Container (Model Sync)
+├── Triton Inference Server Container
+```
+
+Model artifacts and runtime are initialized before inference traffic is accepted.
+
+#### Init Container
+
+- Download model artifacts from cloud storage (GCS).
+- Populate the Triton model repository directory.
+- Example: `gcloud storage cp -r gs://.../model-path/* /models`
+
+Benefits: deterministic startup (Triton starts only after models are available), separation of concerns (image = runtime, repository = data).
+
+#### Triton Inference Server Container
+
+- Load model artifacts from local repository.
+- Manage inference scheduling, request/response handling, and expose inference endpoints.
+
+### Triton Server Image Strategy
+
+The Helm chart uses the Triton container image from the internal **artifact registry**. Production uses **custom-built** images (only required backends, e.g. TensorRT, Python) to reduce size and startup time. Unnecessary components are excluded; images are built internally and pushed to the registry.
+
+**Response Caching**: Custom cache plugins can be added at image build time for optional inference response caching — reducing redundant execution and GPU use for repeated inputs.
+
+### Image Distribution Optimization
+
+- **Secondary boot disk image caching**: Images are pre-cached on GPU node pool secondary boot disks to avoid repeated pulls during scale-up and reduce pod startup time and cold-start latency.
+- **Image streaming**: Can be used to progressively pull layers for faster time-to-readiness during scaling.
+
+### Health Probes
+
+Readiness and liveness use `/v2/health/ready`. Triton receives traffic only after model loading; failed instances are restarted automatically.
+
+### Resource Configuration
+
+Sample GPU resource config:
+
+```yaml
+limits:
+ cpu: 7000m
+ memory: 28Gi
+ gpu: 1
+```
+
+### Autoscaling Architecture
+
+Predator uses **KEDA** (Kubernetes Event-Driven Autoscaling) for scaling deployable pods. KEDA supports custom metric sources including:
+
+- **CPU / Memory utilization** for CPU-based deployments.
+- **GPU utilization** via **DCGM** (Data Center GPU Manager) for GPU pods — covering utilization, memory, power, etc.
+- **Custom Prometheus queries** for application-level scaling signals (e.g., inference queue depth, request latency).
+
+KEDA ScaledObjects are configured per deployable, enabling fine-grained, independent scaling for each model or model group.
+
+---
+
+## Contributing
+
+We welcome contributions! See the [Contributing Guide](https://github.com/Meesho/BharatMLStack/blob/main/CONTRIBUTING.md).
+
+## Community & Support
+
+- **Discord**: [community chat](https://discord.gg/XkT7XsV2AU)
+- **Issues**: [GitHub Issues](https://github.com/Meesho/BharatMLStack/issues)
+- **Email**: [ml-oss@meesho.com](mailto:ml-oss@meesho.com)
+
+## License
+
+BharatMLStack is open-source under the [BharatMLStack Business Source License 1.1](https://github.com/Meesho/BharatMLStack/blob/main/LICENSE.md).
+
+---
+
+
Built with ❤️ for the ML community from Meesho
+
If you find this useful, ⭐️ the repo — your support means the world to us!
diff --git a/docs-src/docs/predator/v1.0.0/functionalities.md b/docs-src/docs/predator/v1.0.0/functionalities.md
new file mode 100644
index 00000000..e5f90b7c
--- /dev/null
+++ b/docs-src/docs/predator/v1.0.0/functionalities.md
@@ -0,0 +1,119 @@
+---
+title: Key Functionalities
+sidebar_position: 2
+---
+
+# Predator - Key Functionalities
+
+## Overview
+
+Predator is a scalable, high-performance model inference service built as a wrapper around **NVIDIA Triton Inference Server**. It serves Deep Learning and tree-based models with low latency in **Kubernetes**, integrates with the **Online Feature Store (OnFS)** and uses **Interflow** for orchestration between clients, feature store, and inference engine. Clients send inference requests via the **Helix client** over gRPC.
+
+---
+
+## Core Capabilities
+
+### Multi-Backend Inference
+
+Predator leverages Triton's pluggable backends so you can serve a variety of model types from a single deployment:
+
+| Backend | Use Case |
+|---------|----------|
+| **TensorRT** | GPU-optimized DL; serialized engines (FP16/INT8) |
+| **PyTorch** | Native PyTorch via LibTorch |
+| **ONNX Runtime** | Framework-agnostic ONNX with TensorRT/GPU |
+| **TensorFlow** | SavedModel format |
+| **Python** | Custom preprocessing, postprocessing, or unsupported models |
+| **FIL** | Tree-based models (XGBoost, LightGBM, Random Forest) on GPU |
+| **DALI** | GPU-accelerated data preprocessing (image, audio, video) |
+| **Custom** | C++/Python backends for proprietary or specialized runtimes |
+
+### Dynamic Batching
+
+Triton combines multiple incoming requests into a single batch at runtime.
+
+- Higher GPU utilization and improved throughput
+- Reduced latency variance
+- Configurable `preferred_batch_size` and `max_queue_delay_microseconds` in `config.pbtxt`
+
+### Concurrent Model Execution
+
+- Run multiple models simultaneously
+- Run multiple instances of the same model
+- Distribute load across GPUs via `instance_group` in model config
+
+### Model Versioning & Ensembles
+
+- **Versioning**: Multiple versions per model (e.g. `1/`, `2/` in the model repository)
+- **Ensembles**: Define a pipeline of models as an ensemble; eliminates intermediate network hops and reduces latency
+
+### Model Instance Scaling
+
+- Deploy multiple copies of a model for parallel inference and load isolation
+- Configured via `instance_group`
+
+---
+
+## Inference & API
+
+### gRPC via Helix Client
+
+Predator uses **gRPC** for efficient request/response handling. Client applications (e.g. Realestate, IOP) send inference requests through the **Helix client**, which talks to the Triton Inference Server inside the Predator pod.
+
+### Model Repository
+
+Models are stored in a local model repository. Predator materializes this via an **Init Container** that downloads artifacts from cloud storage (e.g. GCS) so Triton has no runtime dependency on remote storage during inference.
+
+---
+
+## Deployment & Operational Features
+
+### Custom Triton Images
+
+- Production uses **custom-built** Triton images (only required backends) for smaller size and faster startup
+- Images built on GCP VM, pushed to **Artifact Registry**, and referenced in Helm deployments
+- Optional **response caching** via custom cache plugins added at image build time
+
+### Image Distribution
+
+- **Secondary boot disk caching**: Triton image pre-cached on GPU node pool to reduce pod startup and scale-up latency
+- **Image streaming**: Optionally used for faster time-to-readiness during scaling
+
+### Health Probes
+
+- Readiness and liveness use `/v2/health/ready`
+- Triton receives traffic only after models are loaded; failed instances are restarted automatically
+
+### Autoscaling
+
+- CPU-based scaling for generic load
+- GPU-based scaling using **DCGM** metrics (utilization, memory, power); custom queries drive scale-up/scale-down
+
+---
+
+## Observability
+
+- **Prometheus metrics**: Latency, throughput, GPU utilization, and more
+- Metrics emitted from the Triton Inference Container and visualized in **Grafana**
+- **Warmup requests**: Configurable to preload kernels and avoid cold-start latency
+
+---
+
+## Contributing
+
+We welcome contributions! See the [Contributing Guide](https://github.com/Meesho/BharatMLStack/blob/main/CONTRIBUTING.md).
+
+## Community & Support
+
+- **Discord**: [community chat](https://discord.gg/XkT7XsV2AU)
+- **Issues**: [GitHub Issues](https://github.com/Meesho/BharatMLStack/issues)
+- **Email**: [ml-oss@meesho.com](mailto:ml-oss@meesho.com)
+
+## License
+
+BharatMLStack is open-source under the [BharatMLStack Business Source License 1.1](https://github.com/Meesho/BharatMLStack/blob/main/LICENSE.md).
+
+---
+
+
Built with ❤️ for the ML community from Meesho
+
If you find this useful, ⭐️ the repo — your support means the world to us!
diff --git a/docs-src/docs/predator/v1.0.0/index.md b/docs-src/docs/predator/v1.0.0/index.md
new file mode 100644
index 00000000..9de78cd9
--- /dev/null
+++ b/docs-src/docs/predator/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Predator v1.0.0
+sidebar_position: 0
+slug: /predator/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Predator v1.0.0
+
+Predator is a scalable, high-performance model inference service built as a wrapper around NVIDIA Triton Inference Server, designed to serve ML models with low latency in Kubernetes.
+
+
diff --git a/docs-src/docs/predator/v1.0.0/release-notes.md b/docs-src/docs/predator/v1.0.0/release-notes.md
new file mode 100644
index 00000000..a53f4ff3
--- /dev/null
+++ b/docs-src/docs/predator/v1.0.0/release-notes.md
@@ -0,0 +1,21 @@
+---
+title: Release Notes
+sidebar_position: 3
+---
+
+# Predator - Release Notes
+
+## Version 1.0.0
+
+**Release Date**: June 2025
+**Status**: General Availability (GA)
+
+First stable release of **Predator** — scalable model inference service built around **NVIDIA Triton Inference Server**, part of BharatMLStack. Serves Deep Learning and tree-based models with low latency in **Kubernetes**; integrates with **OnFS** and **Interflow**; clients use the **Helix client** over gRPC.
+
+### What's New
+
+- **Triton inference engine**: Unified runtime for DL and tree-based models on CPU/GPU; model repository via Init Container from GCS; gRPC API via Helix client.
+- **Multi-backend support**: TensorRT, PyTorch, ONNX Runtime, TensorFlow, Python, FIL, DALI, Custom.
+- **Dynamic batching & concurrency**: Configurable via `config.pbtxt`; model versioning and ensembles.
+- **Kubernetes deployment**: Helm-based; Init Container + Triton container; custom Triton images from Artifact Registry; health probes; CPU/GPU autoscaling.
+- **Observability**: Prometheus metrics, Grafana; warmup requests for cold-start avoidance.
diff --git a/docs-src/docs/quick-start/_category_.json b/docs-src/docs/quick-start/_category_.json
index 2e50c7ae..ad53c8fa 100644
--- a/docs-src/docs/quick-start/_category_.json
+++ b/docs-src/docs/quick-start/_category_.json
@@ -1,6 +1,6 @@
{
"label": "Quick Start",
- "position": 2,
+ "position": 3,
"link": {
"type": "generated-index",
"description": "Quick Start guide for BharatML Stack. Get up and running quickly with step-by-step instructions, sample data, and Docker Compose setup for local development and testing."
diff --git a/docs-src/docs/quick-start/v1.0.0/index.md b/docs-src/docs/quick-start/v1.0.0/index.md
new file mode 100644
index 00000000..adc1bb16
--- /dev/null
+++ b/docs-src/docs/quick-start/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Quick Start v1.0.0
+sidebar_position: 0
+slug: /quick-start/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Quick Start v1.0.0
+
+Get up and running quickly with step-by-step instructions, sample data, and Docker Compose setup for local development and testing.
+
+
diff --git a/docs-src/docs/sdks/_category_.json b/docs-src/docs/sdks/_category_.json
index 674a3f7e..ee44b06e 100644
--- a/docs-src/docs/sdks/_category_.json
+++ b/docs-src/docs/sdks/_category_.json
@@ -1,6 +1,6 @@
{
"label": "SDKs",
- "position": 3,
+ "position": 5,
"link": {
"type": "generated-index",
"description": "Software Development Kits (SDKs) for BharatML Stack. Includes client libraries for Go and Python to interact with the online feature store and other platform components."
diff --git a/docs-src/docs/sdks/go/v1.0.0/index.md b/docs-src/docs/sdks/go/v1.0.0/index.md
new file mode 100644
index 00000000..72dbc2da
--- /dev/null
+++ b/docs-src/docs/sdks/go/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Go SDK v1.0.0
+sidebar_position: 0
+slug: /sdks/go/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Go SDK v1.0.0
+
+Go client libraries and packages for interacting with the BharatML Stack online feature store, including gRPC clients and protocol buffer definitions.
+
+
diff --git a/docs-src/docs/sdks/python/v1.0.0/_category_.json b/docs-src/docs/sdks/python/v1.0.0/_category_.json
index 58516700..4e1f685c 100644
--- a/docs-src/docs/sdks/python/v1.0.0/_category_.json
+++ b/docs-src/docs/sdks/python/v1.0.0/_category_.json
@@ -1,8 +1,4 @@
{
- "label": "v1.0.0",
- "position": 1,
- "link": {
- "type": "generated-index",
- "description": "Python SDK v1.0.0 documentation for BharatML Stack. Contains API reference, usage guides, and examples for the Python client libraries including gRPC feature client, Spark feature push client, and common utilities."
- }
+ "label": "v1.0.0",
+ "position": 1
}
\ No newline at end of file
diff --git a/docs-src/docs/sdks/python/v1.0.0/index.md b/docs-src/docs/sdks/python/v1.0.0/index.md
new file mode 100644
index 00000000..3d6f0e23
--- /dev/null
+++ b/docs-src/docs/sdks/python/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Python SDK v1.0.0
+sidebar_position: 0
+slug: /sdks/python/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Python SDK v1.0.0
+
+Python client libraries and utilities for interacting with the BharatML Stack online feature store, including gRPC clients, Spark integration, and common utilities.
+
+
diff --git a/docs-src/docs/skye/_category_.json b/docs-src/docs/skye/_category_.json
new file mode 100644
index 00000000..431fda78
--- /dev/null
+++ b/docs-src/docs/skye/_category_.json
@@ -0,0 +1,8 @@
+{
+ "label": "Skye",
+ "position": 6,
+ "link": {
+ "type": "generated-index",
+ "description": "Skye is a high-performance vector similarity search platform that enables fast semantic retrieval by representing data as vectors and querying nearest matches in high-dimensional space. It supports pluggable vector databases, tenant-level index isolation, intelligent caching, and centralized cluster management."
+ }
+}
diff --git a/docs-src/docs/skye/v1.0.0/_category_.json b/docs-src/docs/skye/v1.0.0/_category_.json
new file mode 100644
index 00000000..3c72a212
--- /dev/null
+++ b/docs-src/docs/skye/v1.0.0/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "v1.0.0",
+ "position": 1
+}
diff --git a/docs-src/docs/skye/v1.0.0/architecture.md b/docs-src/docs/skye/v1.0.0/architecture.md
new file mode 100644
index 00000000..f08926a7
--- /dev/null
+++ b/docs-src/docs/skye/v1.0.0/architecture.md
@@ -0,0 +1,373 @@
+---
+title: Architecture
+sidebar_position: 1
+---
+
+# Skye - Vector Similarity Search Platform
+
+Skye is BharatMLStack's vector similarity search platform that enables fast semantic retrieval by representing data as vectors and querying nearest matches in high-dimensional space. It is composed of three runnable components: **skye-admin**, **skye-consumers**, and **skye-serving**.
+
+---
+
+## System Overview
+
+
+
+Skye provides a critical platform for managing data aggregation, model onboarding, and embedding support at production scale. The architecture is designed around three core pillars:
+
+- **Pluggable Vector Databases**: Support for multiple vector database backends (Qdrant and extensible to others) via a generic abstraction layer.
+- **Tenant-Level Index Isolation with Shared Embeddings**: Models are stored once but can serve multiple tenants (variants), reducing data redundancy.
+- **Event-Driven Administration**: Model lifecycle management is handled through Kafka-based event flows for resilience and fault tolerance.
+
+### Component Architecture
+
+| Component | Role |
+|---|---|
+| **skye-serving** | Handles real-time similarity search queries with in-memory caching and vector DB lookups |
+| **skye-consumers** | Processes embedding ingestion (reset/delta jobs) and real-time aggregation events from Kafka |
+| **skye-admin** | Manages model lifecycle, onboarding, variant registration, and coordinates Databricks jobs |
+
+---
+
+## Data Model
+
+### Model and Variant Hierarchy
+
+Skye uses a **model-first** hierarchy rather than a tenant-first approach. Models sit at the base level with variants (formerly tenants) nested within each model. This eliminates embedding duplication across tenants.
+
+```
+model (e.g., intent_model)
+ ├── model_config (distance_function, vector_dimension, etc.)
+ ├── embedding_store (shared embeddings for all variants)
+ ├── variant_1 (e.g., organic)
+ │ ├── vss_filter (criteria for index inclusion)
+ │ ├── vectordb_type (QDRANT, etc.)
+ │ ├── vectordb_config (host, port, replication, sharding)
+ │ ├── read_version / write_version
+ │ └── job_frequency (FREQ_1D, FREQ_3H, etc.)
+ └── variant_2 (e.g., ad)
+ ├── vss_filter
+ ├── vectordb_type
+ └── ...
+```
+
+**Key benefit**: If a model consumes 30M embeddings and is used by two variants, the embeddings are stored once (30M) instead of duplicated (60M).
+
+### Entity-Based Data Split
+
+Data is split at the entity level (catalog, product, user) into separate tables for both embeddings and aggregator data:
+
+**Embedding Tables** (per entity):
+
+```sql
+CREATE TABLE catalog_embeddings (
+ model_name text,
+ version int,
+ id text,
+ embedding frozen>,
+ search_embedding frozen>,
+ to_be_indexed_variant_1 boolean,
+ to_be_indexed_variant_2 boolean,
+ PRIMARY KEY ((model_name, version), id)
+);
+```
+
+**Aggregator Tables** (per entity):
+
+```sql
+CREATE TABLE catalog_aggregator (
+ id text,
+ is_live_ad text,
+ out_of_stock text,
+ PRIMARY KEY (id)
+);
+```
+
+Each entity is mapped via a store configuration:
+
+```json
+{
+ "db_conf_id": "1",
+ "embeddings_table": "catalog_embeddings",
+ "aggregator_table": "catalog_aggregator"
+}
+```
+
+---
+
+## Serving Flow
+
+The serving path is optimized for low latency with multiple caching layers:
+
+1. **Request arrives** at skye-serving via gRPC
+2. **ConfigRepo** resolves the model configuration, variant filters, and vector DB connection
+3. **In-memory cache** is checked first to reduce load on distributed cache
+4. **Distributed cache (Redis)** is checked next for cached similarity results
+5. **Vector DB query** executes if cache misses, using `search_indexed_only` flag for optimal searches within indexed space
+6. **Aggregator data** is fetched from ScyllaDB to apply variant-level filters
+7. **Response** returns ranked similar candidates with scores
+
+### Configuration Bootstrap
+
+On startup, ConfigRepo creates:
+- A map of each model with its configurations (embedding table, vector DB channel)
+- A map of each entity to its aggregator table
+
+```json
+{
+ "intent_model": {
+ "db_conf_id": "1",
+ "index_embedding_table": "catalog_embeddings",
+ "vector_db_grpc_channel": ""
+ }
+}
+```
+
+---
+
+## Admin Flows
+
+Skye uses an **event-driven approach** for model lifecycle management:
+
+- All admin operations are processed through Kafka consumers asynchronously
+- A SQL database behind the admin stores all model states
+- Pod termination does not affect in-progress operations (events are re-consumed on failure)
+- Databricks jobs are triggered and monitored via the admin API
+
+### API Contracts
+
+#### Register Model
+
+```
+POST /register-model
+```
+
+```json
+{
+ "entity": "catalog",
+ "ingestion_column_mapping": "{\"id_column\":\"id\",\"embedding_column\":\"features\",\"to_be_indexed_column\":\"to_be_indexed\"}",
+ "embedding_store_enabled": true,
+ "embedding_store_ttl": 604800,
+ "mq_id": 804,
+ "model_config": "{\"distance_function\":\"DOT\",\"vector_dimension\":32}",
+ "store_id": 1,
+ "training_data_path": "gcs_path"
+}
+```
+
+#### Register Variant
+
+```
+POST /register-variant
+```
+
+```json
+{
+ "entity": "catalog",
+ "model_name": "intent_model",
+ "vss_filter": "{...filter criteria...}",
+ "vectordb_type": "QDRANT",
+ "vectordb_config": "{...connection config...}",
+ "job_frequency": "FREQ_1D"
+}
+```
+
+#### Reset Model
+
+```
+POST /reset-model
+```
+
+```json
+{
+ "entity": "catalog",
+ "model_name": "intent_model",
+ "frequency": "FREQ_1D"
+}
+```
+
+Response includes variant version mappings, MQ ID, and training data path for the Databricks job.
+
+#### Trigger Model Machine
+
+```
+POST /trigger-model-machine
+```
+
+```json
+{
+ "entity": "catalog",
+ "model_name": "intent_model",
+ "variant": "organic"
+}
+```
+
+#### Promote Model / Variant to Scale-Up Cluster
+
+```
+POST /promote-model
+POST /promote-variant
+```
+
+Used to transition successful experiments from experiment clusters to production clusters.
+
+---
+
+## Consumer Flows
+
+
+
+### Reset/Delta Ingestion
+
+Embedding ingestion occurs once per model and executes in parallel for each variant. The Kafka event contract supports:
+
+- **Multiple variants per event**: A single embedding event specifies which variants should index the data
+- **Separate search and index embeddings**: Models can have different embeddings for search space vs index space
+- **EOF handling**: EOF is sent to all partitions to ensure all data is consumed before completion
+
+```json
+{
+ "entity": "catalog",
+ "model_name": "intent_model",
+ "candidate_id": "48869419",
+ "version": "1",
+ "index_space": {
+ "variants_version_map": "{'organic':1,'ad':2}",
+ "embedding": [0.036, -0.048, ...],
+ "variants_index_map": "{'organic':true,'ad':false}",
+ "operation": "A",
+ "payload": "{'sscat_id':700}"
+ },
+ "search_space": {
+ "embedding": [0.036, -0.048, ...]
+ }
+}
+```
+
+### Real-Time Consumers
+
+A generic Kafka schema is used for all real-time consumers, simplifying new integrations:
+
+```json
+{
+ "timestamp": 1719308350,
+ "entity_label": "catalog",
+ "data": [
+ {
+ "id": "125138466",
+ "label": "is_live_ad",
+ "value": "true"
+ }
+ ]
+}
+```
+
+### Retry Topic
+
+Failed ingestion events are published to a retry topic for reprocessing, ensuring no data loss:
+
+```json
+{
+ "timestamp": 1719308350,
+ "entity_label": "catalog",
+ "model_name": "intent_model",
+ "variant": "organic",
+ "data": [
+ {
+ "id": "125138466",
+ "label": "is_live_ad",
+ "value": "true"
+ }
+ ]
+}
+```
+
+---
+
+## Key Design Decisions
+
+### Pluggable Vector Database Support
+
+Skye introduces a generic `vector_db_type` configuration and converts vendor-specific configs to a generic `vector_config`, enabling support for multiple vector database backends beyond Qdrant.
+
+### Variant-Based Model Sharing
+
+By eliminating the tenant-based construct and introducing variants, Skye allows:
+- Models to be shared across tenants without duplication
+- Each variant to have its own filter criteria, vector DB config, and job frequency
+- Independent read/write version tracking per variant
+
+### ScyllaDB for Real-Time Aggregation
+
+Replaced Delta Lake with self-hosted ScyllaDB for cost efficiency. The aggregator is entity-generic (not model/version-specific) since all real-time data is consistent across models.
+
+### Event-Driven State Management
+
+Model state transitions are handled via Kafka events with a SQL database backing store. This eliminates:
+- Single points of failure in admin/ingestion flows
+- Models getting stuck during pod restarts
+- Manual intervention for consumer pause/resume
+
+---
+
+## Resiliency
+
+| Mechanism | Description |
+|---|---|
+| **Retry Topics** | Failed ingestion messages are captured in a failure topic for reprocessing |
+| **Circuit Breakers** | Applied to similarity search API calls to throttle RPS during failures |
+| **Snapshot Backups** | Periodic collection snapshots enable quick restore during downtime |
+| **Automated Cluster Setup** | Scripted provisioning eliminates configuration inconsistencies |
+| **Databricks Job Retries** | Lambda functions with retry mechanisms for failed ingestion jobs |
+
+---
+
+## Scalability
+
+- **Vector DB Scaling**: Generic scripts for adding nodes to existing clusters, enabling horizontal scaling based on load and RPS
+- **Service Scaling**: Hosted on EKS with CPU-based autoscaling
+- **Experiment Isolation**: Experiments run on separate EKS and vector DB clusters, reducing production cluster complexity
+- **Indexed-Only Search**: The `search_indexed_only` flag ensures queries only search indexed space, avoiding latency from brute-force searches on partially built indexes
+
+---
+
+## Observability
+
+### Metrics (per model + variant)
+
+| Metric | Description |
+|---|---|
+| `avg_similar_candidates` | Average number of similarity candidates returned |
+| `avg_recall` | Score of the first similar catalog returned |
+| Service Latency | P99.9 / P99 / P95 / P50 |
+| Service 5xx Count | Error rate monitoring |
+| Vector DB Latency | P99.9 / P99 / P95 / P50 |
+| Vector DB QPS | Throughput monitoring |
+| ScyllaDB Latency | P99.9 / P99 / P95 / P90 |
+| Redis Latency | P99.9 / P99 / P95 / P90 |
+| Redis Hit % | Cache effectiveness |
+
+### Alerts
+
+| Alert | Threshold |
+|---|---|
+| Indexed Vector Count | < 95% |
+| Events to Failure Topic | Rate > 0 |
+| Service 5xx | < 10 |
+| Service Latency | Model-dependent SLA |
+
+---
+
+## Technology Stack
+
+| Component | Technology |
+|---|---|
+| Language | Go |
+| Vector Database | Qdrant (pluggable) |
+| Embedding Storage | ScyllaDB |
+| Real-Time Aggregation | ScyllaDB |
+| Caching | Redis + In-Memory |
+| Message Queue | Kafka |
+| Configuration | ZooKeeper / etcd |
+| Container Orchestration | Kubernetes (EKS) |
+| Job Orchestration | Databricks |
diff --git a/docs-src/docs/skye/v1.0.0/functionalities.md b/docs-src/docs/skye/v1.0.0/functionalities.md
new file mode 100644
index 00000000..1ba92d30
--- /dev/null
+++ b/docs-src/docs/skye/v1.0.0/functionalities.md
@@ -0,0 +1,106 @@
+---
+title: Functionalities
+sidebar_position: 2
+---
+
+# Skye - Functionalities
+
+## Core Capabilities
+
+### 1. Vector Similarity Search
+
+Skye provides real-time nearest-neighbor search across high-dimensional vector spaces. It supports:
+
+- **Configurable distance functions**: DOT product, Cosine similarity, Euclidean distance
+- **Configurable vector dimensions**: Per-model vector dimension settings
+- **Indexed-only search**: Queries only search within fully indexed space, avoiding brute-force fallback on partially built indexes
+- **Pagination support**: Service-level pagination for clients, even when the underlying vector DB does not natively support it
+
+### 2. Pluggable Vector Database Support
+
+The platform is designed to be vector DB agnostic:
+
+- **Generic vector config**: A `vector_db_type` field and generic `vectordb_config` replace vendor-specific configurations
+- **Current support**: Qdrant with official Go client
+- **Extensibility**: New vector databases can be integrated by implementing the vector DB interface
+
+### 3. Model and Variant Management
+
+#### Model Registration
+- Models are registered via API with entity type, embedding configuration, distance function, vector dimension, and training data path
+- Each model is associated with a store ID mapping to specific embedding and aggregator tables
+
+#### Variant Registration
+- Variants represent different views/filters of the same model (e.g., organic, ad, commerce)
+- Each variant has its own filter criteria, vector DB cluster, job frequency, and version tracking
+- Variants share the same embeddings, eliminating data redundancy
+
+#### Model Promotion
+- Successful experiments can be promoted from experiment clusters to production clusters via API
+
+### 4. Embedding Ingestion
+
+#### Batch Ingestion (Reset/Delta Jobs)
+- Triggered via Databricks jobs that read from GCS paths
+- Supports separate index-space and search-space embeddings
+- Per-variant `to_be_indexed` flags control which embeddings are indexed for each variant
+- EOF markers sent to all Kafka partitions ensure complete data consumption
+
+#### Real-Time Ingestion
+- Generic Kafka schema for all real-time consumers
+- Entity-based aggregation data (e.g., is_live_ad, out_of_stock) updates in real time
+- During model resets, real-time consumers continue pushing data to the latest collection (no pausing)
+
+### 5. Real-Time Data Aggregation
+
+- Entity-wise (catalog, product, user) real-time aggregation via ScyllaDB
+- Generic approach: aggregator tables are entity-level, not model/version-specific
+- All real-time data is consistent across models sharing the same entity
+
+### 6. Intelligent Caching
+
+- **In-memory cache**: First layer, reduces load on distributed cache
+- **Distributed cache (Redis)**: Second layer for cached similarity results
+- Hit rate monitoring and cache effectiveness metrics per model
+
+### 7. Embedded Storage
+
+- Optional embedding storage with configurable TTL
+- Enables embedding lookup APIs for downstream consumers
+- Stored in ScyllaDB with efficient binary serialization
+
+### 8. Retry and Fault Tolerance
+
+- **Retry topic**: Failed ingestion events are published to a dedicated retry topic
+- **Event-driven state management**: Model states persist in SQL DB, surviving pod restarts
+- **Kafka-based admin**: Asynchronous processing with automatic re-consumption on failure
+
+### 9. Experiment Isolation
+
+- Dedicated EKS cluster (`skye-service-experiments`) for experiments
+- Dedicated vector DB cluster for experiment workloads
+- Clean separation from production: experiments do not impact production performance
+- Promotion path from experiment to production after load analysis
+
+### 10. Centralized Cluster Management
+
+- Automated cluster provisioning via scripts (collaboration with DevOps)
+- Consistent configurations across all clusters (eliminates consensus issues)
+- Horizontal scaling support: generic scripts for adding nodes to existing clusters
+
+---
+
+## Onboarding Flow
+
+### Step-by-step Process
+
+1. **Data Scientist** provides a base GCS path where model embeddings will be pushed
+2. **Register Model** via `POST /register-model` with entity type, column mappings, model config
+3. **Register Variant(s)** via `POST /register-variant` with filter criteria, vector DB config, job frequency
+4. **Schedule Databricks Job** to read data from GCS path and ingest into Skye platform
+5. **Reset Model** via `POST /reset-model` to trigger the first full ingestion
+6. **Trigger Model Machine** via `POST /trigger-model-machine` to start the indexing pipeline
+
+### Extending to New Tenants
+
+With the variant system, extending a model to a new tenant only requires registering a new variant with appropriate filters -- no re-ingestion of embeddings is needed.
diff --git a/docs-src/docs/skye/v1.0.0/index.md b/docs-src/docs/skye/v1.0.0/index.md
new file mode 100644
index 00000000..0a5ee391
--- /dev/null
+++ b/docs-src/docs/skye/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Skye v1.0.0
+sidebar_position: 0
+slug: /skye/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Skye v1.0.0
+
+Skye is a high-performance vector similarity search platform that enables fast semantic retrieval by representing data as vectors and querying nearest matches in high-dimensional space.
+
+
diff --git a/docs-src/docs/skye/v1.0.0/release-notes.md b/docs-src/docs/skye/v1.0.0/release-notes.md
new file mode 100644
index 00000000..ac0b1f77
--- /dev/null
+++ b/docs-src/docs/skye/v1.0.0/release-notes.md
@@ -0,0 +1,67 @@
+---
+title: Release Notes
+sidebar_position: 3
+---
+
+# Skye - Release Notes
+
+## v1.0.0
+
+### Overview
+
+Initial open-source release of Skye, BharatMLStack's vector similarity search platform. This release represents a complete re-architecture of the internal VSS (Vector Similarity Search) service, addressing scalability, resilience, and operational efficiency challenges from the previous generation.
+
+### What's New
+
+#### Architecture
+- **Model-first hierarchy**: Models at the base level with variants nested within, eliminating embedding duplication across tenants
+- **Entity-based data split**: Separate embedding and aggregator tables per entity type (catalog, product, user)
+- **Event-driven admin flows**: Kafka-based model lifecycle management with SQL-backed state persistence
+- **Pluggable vector DB support**: Generic vector database abstraction replacing vendor-specific tight coupling
+
+#### Serving
+- **Multi-layer caching**: In-memory cache + Redis distributed cache for low-latency similarity search
+- **Indexed-only search**: `search_indexed_only` flag prevents brute-force fallback on partially indexed collections
+- **Pagination support**: Service-level pagination for clients
+- **Separate search/index embeddings**: Models can use different embedding spaces for search and indexing
+
+#### Ingestion
+- **Shared embeddings across variants**: Single ingestion per model with parallel variant processing
+- **Generic RT consumer schema**: Simplified onboarding for new real-time data sources
+- **Retry topic**: Automatic capture and reprocessing of failed ingestion events
+- **EOF to all partitions**: Ensures complete data consumption before processing completion
+
+#### Operations
+- **API-based model onboarding**: Register models and variants via REST API (replaces manual Databricks-only flow)
+- **Automated cluster provisioning**: Scripted setup for consistent vector DB cluster configurations
+- **Experiment isolation**: Dedicated EKS and vector DB clusters for experiments
+- **Comprehensive observability**: Per-model + per-variant metrics for latency, throughput, error rates, and cache effectiveness
+
+### Improvements Over Previous Architecture
+
+| Area | Before | After |
+|---|---|---|
+| Embedding storage | Duplicated per tenant | Shared per model |
+| Vector DB coupling | Tightly coupled to Qdrant | Pluggable via generic interface |
+| State management | In-pod synchronous thread | Event-driven with SQL backing |
+| Consumer handling | Paused during ingestion | No pausing; concurrent writes |
+| Cluster setup | Manual, error-prone | Automated, consistent |
+| Experiment infra | Shared with production | Isolated clusters |
+| Failure recovery | Manual intervention | Retry topics + snapshots |
+| Observability | Generic alerts | Model + variant level metrics |
+
+### Known Limitations
+
+- Snapshot restore is currently supported for smaller indexes only
+- Pagination is handled at the service level (not natively by the vector DB)
+- Horizontal scaling of vector DB clusters requires running provisioning scripts
+
+### Technology Stack
+
+- **Language**: Go
+- **Vector Database**: Qdrant (pluggable)
+- **Storage**: ScyllaDB
+- **Cache**: Redis + In-Memory
+- **Message Queue**: Kafka
+- **Configuration**: ZooKeeper / etcd
+- **Orchestration**: Kubernetes (EKS)
diff --git a/docs-src/docs/trufflebox-ui/_category_.json b/docs-src/docs/trufflebox-ui/_category_.json
index b06298f4..d44ae254 100644
--- a/docs-src/docs/trufflebox-ui/_category_.json
+++ b/docs-src/docs/trufflebox-ui/_category_.json
@@ -1,6 +1,6 @@
{
"label": "Trufflebox UI",
- "position": 2,
+ "position": 4,
"link": {
"type": "generated-index",
"description": "Trufflebox UI is a modern, feature rich UI framework for supporting MLOps. It supports Feature catalog, management, user managemnet and other adminops"
diff --git a/docs-src/docs/trufflebox-ui/v1.0.0/index.md b/docs-src/docs/trufflebox-ui/v1.0.0/index.md
new file mode 100644
index 00000000..ee6a7212
--- /dev/null
+++ b/docs-src/docs/trufflebox-ui/v1.0.0/index.md
@@ -0,0 +1,14 @@
+---
+title: v1.0.0
+description: Trufflebox UI v1.0.0
+sidebar_position: 0
+slug: /trufflebox-ui/v1.0.0
+---
+
+import DocCardList from '@theme/DocCardList';
+
+# Trufflebox UI v1.0.0
+
+Trufflebox UI is a modern, feature-rich UI framework for supporting MLOps. It supports feature catalog, management, user management, and other admin operations.
+
+
diff --git a/docs-src/docusaurus.config.js b/docs-src/docusaurus.config.js
index 229e0cb2..59f6ea48 100644
--- a/docs-src/docusaurus.config.js
+++ b/docs-src/docusaurus.config.js
@@ -78,6 +78,10 @@ const config = {
({
// Replace with your project's social card
image: 'img/docusaurus-social-card.jpg',
+ colorMode: {
+ defaultMode: 'dark',
+ respectPrefersColorScheme: true,
+ },
navbar: {
title: 'BharatMLStack',
items: [
diff --git a/docs-src/package.json b/docs-src/package.json
index 3b2c4d32..b470544c 100644
--- a/docs-src/package.json
+++ b/docs-src/package.json
@@ -24,7 +24,8 @@
},
"devDependencies": {
"@docusaurus/module-type-aliases": "3.8.1",
- "@docusaurus/types": "3.8.1"
+ "@docusaurus/types": "3.8.1",
+ "yarn": "1.22.22"
},
"browserslist": {
"production": [
diff --git a/docs-src/src/css/custom.css b/docs-src/src/css/custom.css
index ff94defd..b66bc7db 100644
--- a/docs-src/src/css/custom.css
+++ b/docs-src/src/css/custom.css
@@ -1,143 +1,636 @@
/**
- * Any CSS included here will be global. The classic template
- * bundles Infima by default. Infima is a CSS framework designed to
- * work well for content-centric websites.
+ * Global theme for BharatMLStack docs site.
+ * Overrides Infima variables to match the homepage's indigo/purple dark theme.
+ * Supports both dark (primary) and light modes.
*/
-/* You can override the default Infima variables here. */
+/* ========================================
+ 1. Infima Variable Overrides
+ ======================================== */
+
:root {
- /* BharatMLStack brand colors - purple/burgundy theme */
- --ifm-color-primary: #450839;
- --ifm-color-primary-dark: #3d0732;
- --ifm-color-primary-darker: #39062f;
- --ifm-color-primary-darkest: #2f0527;
- --ifm-color-primary-light: #4d0940;
- --ifm-color-primary-lighter: #510a43;
- --ifm-color-primary-lightest: #5d0c4d;
+ /* Primary palette – gold/amber */
+ --ifm-color-primary: #f59e0b;
+ --ifm-color-primary-dark: #d97706;
+ --ifm-color-primary-darker: #b45309;
+ --ifm-color-primary-darkest: #92400e;
+ --ifm-color-primary-light: #fbbf24;
+ --ifm-color-primary-lighter: #fcd34d;
+ --ifm-color-primary-lightest: #fde68a;
+
+ /* Light mode backgrounds and text */
+ --ifm-background-color: #f8fafc;
+ --ifm-background-surface-color: #ffffff;
+ --ifm-font-color-base: #1e293b;
+ --ifm-font-color-secondary: #64748b;
+ --ifm-heading-color: #0f172a;
+ --ifm-link-color: #f59e0b;
+ --ifm-link-hover-color: #d97706;
+
+ /* Code */
--ifm-code-font-size: 95%;
- --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.1);
-
- /* Custom BharatMLStack variables with better contrast */
- --bharatml-primary: #450839;
- --bharatml-primary-hover: #6a0c59;
- --bharatml-secondary: #f9f9f9;
- --bharatml-text: #1c1e21; /* Much darker for better contrast */
- --bharatml-text-light: #606770; /* Darker gray for better readability */
+ --ifm-code-background: #f1f5f9;
+ --ifm-code-border-radius: 6px;
+ --ifm-code-padding-horizontal: 0.4rem;
+ --ifm-code-padding-vertical: 0.15rem;
+ --docusaurus-highlighted-code-line-bg: rgba(245, 158, 11, 0.08);
+
+ /* Cards, borders, shadows */
+ --ifm-card-background-color: #ffffff;
+ --ifm-global-shadow-lw: 0 2px 8px rgba(0, 0, 0, 0.06);
+ --ifm-global-shadow-md: 0 4px 16px rgba(0, 0, 0, 0.08);
+ --ifm-global-shadow-tl: 0 8px 32px rgba(0, 0, 0, 0.1);
+ --ifm-global-radius: 8px;
+
+ /* Table of contents */
+ --ifm-toc-border-color: rgba(0, 0, 0, 0.08);
+
+ /* Navbar height for padding */
+ --ifm-navbar-height: 3.75rem;
}
-/* For readability concerns, you should choose a lighter palette in dark mode. */
+/* Dark mode */
[data-theme='dark'] {
- --ifm-color-primary: #8b4582;
- --ifm-color-primary-dark: #7d3f75;
- --ifm-color-primary-darker: #763c6e;
- --ifm-color-primary-darkest: #62315a;
- --ifm-color-primary-light: #994b8f;
- --ifm-color-primary-lighter: #a04e96;
- --ifm-color-primary-lightest: #b657a9;
- --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3);
-
- /* Dark mode BharatMLStack colors */
- --bharatml-primary: #8b4582;
- --bharatml-primary-hover: #a04e96;
- --bharatml-secondary: #1e1e1e;
- --bharatml-text: #e3e3e3; /* Light text for dark mode */
- --bharatml-text-light: #b4b4b4; /* Lighter gray for dark mode */
+ --ifm-color-primary: #fbbf24;
+ --ifm-color-primary-dark: #f59e0b;
+ --ifm-color-primary-darker: #d97706;
+ --ifm-color-primary-darkest: #b45309;
+ --ifm-color-primary-light: #fcd34d;
+ --ifm-color-primary-lighter: #fde68a;
+ --ifm-color-primary-lightest: #fef3c7;
+
+ --ifm-background-color: #27001D;
+ --ifm-background-surface-color: #3d0029;
+ --ifm-font-color-base: #e2e8f0;
+ --ifm-font-color-secondary: #94a3b8;
+ --ifm-heading-color: #f1f5f9;
+ --ifm-link-color: #fbbf24;
+ --ifm-link-hover-color: #fcd34d;
+
+ --ifm-code-background: rgba(255, 255, 255, 0.06);
+ --docusaurus-highlighted-code-line-bg: rgba(251, 191, 36, 0.15);
+
+ --ifm-card-background-color: rgba(255, 255, 255, 0.03);
+ --ifm-global-shadow-lw: 0 2px 8px rgba(0, 0, 0, 0.3);
+ --ifm-global-shadow-md: 0 4px 16px rgba(0, 0, 0, 0.4);
+ --ifm-global-shadow-tl: 0 8px 32px rgba(0, 0, 0, 0.5);
+
+ --ifm-toc-border-color: rgba(255, 255, 255, 0.06);
}
-/* Custom BharatMLStack styles */
-.bharatml-hero {
- background: linear-gradient(135deg, var(--bharatml-primary) 0%, var(--bharatml-primary-hover) 100%);
- color: white;
+
+/* ========================================
+ 2. Global Gradient Orb Background
+ ======================================== */
+
+.gradient-bg-global {
+ position: fixed;
+ top: 0;
+ left: 0;
+ width: 100%;
+ height: 100%;
+ z-index: 0;
+ pointer-events: none;
}
-/* Hero button styling - both buttons should have white borders and proper text colors */
-.bharatml-hero .bharatml-button {
- background-color: var(--bharatml-primary);
- border: 2px solid white !important;
- color: white !important;
- transition: all 0.3s ease;
+.gradient-orb-global {
+ position: absolute;
+ border-radius: 50%;
+ filter: blur(100px);
+ opacity: 0.25;
+ animation: globalOrbFloat 25s ease-in-out infinite;
}
-.bharatml-hero .bharatml-button:hover {
- background-color: white !important;
- border-color: white !important;
- color: var(--bharatml-primary) !important;
+[data-theme='light'] .gradient-orb-global {
+ opacity: 0.10;
}
-.bharatml-hero .button--outline {
- background-color: transparent !important;
- border: 2px solid white !important;
- color: white !important;
- transition: all 0.3s ease;
+.orb-global-1 {
+ width: 600px;
+ height: 600px;
+ background: radial-gradient(circle, #fbbf24, transparent);
+ top: -10%;
+ left: -10%;
}
-.bharatml-hero .button--outline:hover {
- background-color: white !important;
- border-color: white !important;
- color: var(--bharatml-primary) !important;
+.orb-global-2 {
+ width: 500px;
+ height: 500px;
+ background: radial-gradient(circle, #f59e0b, transparent);
+ top: 50%;
+ right: -10%;
+ animation-delay: 8s;
}
-/* Dark mode hero buttons */
-[data-theme='dark'] .bharatml-hero .bharatml-button {
- background-color: var(--bharatml-primary);
- border: 2px solid white !important;
- color: white !important;
+.orb-global-3 {
+ width: 700px;
+ height: 700px;
+ background: radial-gradient(circle, #06b6d4, transparent);
+ bottom: -20%;
+ left: 30%;
+ animation-delay: 15s;
}
-[data-theme='dark'] .bharatml-hero .bharatml-button:hover {
- background-color: white !important;
- border-color: white !important;
- color: var(--bharatml-primary) !important;
+@keyframes globalOrbFloat {
+ 0%, 100% {
+ transform: translate(0, 0) scale(1);
+ }
+ 33% {
+ transform: translate(60px, -60px) scale(1.1);
+ }
+ 66% {
+ transform: translate(-40px, 40px) scale(0.9);
+ }
}
-[data-theme='dark'] .bharatml-hero .button--outline {
- background-color: transparent !important;
- border: 2px solid white !important;
- color: white !important;
+
+/* ========================================
+ 3. Navbar – Glass Morphism
+ ======================================== */
+
+.navbar {
+ background: rgba(39, 0, 29, 0.8) !important;
+ backdrop-filter: blur(20px);
+ -webkit-backdrop-filter: blur(20px);
+ border-bottom: 1px solid rgba(255, 255, 255, 0.05);
+ box-shadow: none;
+ position: sticky;
+ z-index: 100;
+}
+
+[data-theme='light'] .navbar {
+ background: rgba(255, 255, 255, 0.85) !important;
+ border-bottom: 1px solid rgba(0, 0, 0, 0.08);
+}
+
+.navbar__title {
+ font-weight: 800;
+ background: linear-gradient(135deg, #fbbf24, #f59e0b, #06b6d4);
+ background-size: 200% 200%;
+ -webkit-background-clip: text;
+ -webkit-text-fill-color: transparent;
+ background-clip: text;
+ animation: navGradientShift 3s ease infinite;
+}
+
+@keyframes navGradientShift {
+ 0%, 100% { background-position: 0% 50%; }
+ 50% { background-position: 100% 50%; }
+}
+
+.navbar__link {
+ font-weight: 500;
}
-[data-theme='dark'] .bharatml-hero .button--outline:hover {
- background-color: white !important;
- border-color: white !important;
- color: var(--bharatml-primary) !important;
+[data-theme='dark'] .navbar__link {
+ color: #e2e8f0;
}
-/* General button styling for other parts of the site */
-.bharatml-button {
- background-color: var(--bharatml-primary);
- border-color: var(--bharatml-primary);
- transition: all 0.3s ease;
+[data-theme='dark'] .navbar__link:hover,
+[data-theme='dark'] .navbar__link--active {
+ color: #fbbf24;
+}
+
+.navbar__toggle {
+ color: var(--ifm-font-color-base);
+}
+
+/* Navbar sidebar (mobile) */
+.navbar-sidebar {
+ background: var(--ifm-background-color);
+}
+
+
+/* ========================================
+ 4. Footer – Dark Theme
+ ======================================== */
+
+.footer {
+ background: #3d0029 !important;
+ border-top: 1px solid rgba(255, 255, 255, 0.05);
+}
+
+[data-theme='light'] .footer {
+ background: #f1f5f9 !important;
+ border-top: 1px solid rgba(0, 0, 0, 0.08);
+}
+
+.footer__title {
+ color: #e2e8f0;
+ font-weight: 700;
+}
+
+[data-theme='light'] .footer__title {
+ color: #1e293b;
+}
+
+.footer__link-item {
+ color: #94a3b8;
+ transition: color 0.3s;
+}
+
+.footer__link-item:hover {
+ color: #fbbf24;
+ text-decoration: none;
+}
+
+[data-theme='light'] .footer__link-item {
+ color: #64748b;
+}
+
+[data-theme='light'] .footer__link-item:hover {
+ color: #f59e0b;
+}
+
+.footer__copyright {
+ color: #64748b;
+}
+
+
+/* ========================================
+ 5. Sidebar – Glass Effect
+ ======================================== */
+
+[data-theme='dark'] .theme-doc-sidebar-container {
+ border-right: 1px solid rgba(255, 255, 255, 0.05) !important;
}
-.bharatml-button:hover {
- background-color: var(--bharatml-primary-hover);
- border-color: var(--bharatml-primary-hover);
- color: white;
+[data-theme='dark'] .menu {
+ background: transparent;
}
-.bharatml-card {
- border: 1px solid rgba(69, 8, 57, 0.1);
+[data-theme='dark'] .menu__link {
+ color: #cbd5e1;
border-radius: 8px;
- padding: 2rem;
- transition: all 0.3s ease;
- background: white;
+ transition: all 0.2s;
+}
+
+[data-theme='dark'] .menu__link:hover {
+ background: rgba(251, 191, 36, 0.1);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .menu__link--active:not(.menu__link--sublist) {
+ background: rgba(251, 191, 36, 0.15);
+ color: #fbbf24;
+ font-weight: 600;
+}
+
+[data-theme='dark'] .menu__list-item-collapsible:hover {
+ background: rgba(251, 191, 36, 0.08);
+}
+
+[data-theme='dark'] .theme-doc-sidebar-item-category > .menu__list-item-collapsible > .menu__link {
+ color: #e2e8f0;
+ font-weight: 600;
+}
+
+
+/* ========================================
+ 6. Doc / Blog Content
+ ======================================== */
+
+/* Ensure proper z-index for content above gradient orbs */
+[class*='docMainContainer'],
+[class*='mainWrapper'],
+.main-wrapper {
+ position: relative;
+ z-index: 1;
+}
+
+/* Markdown content */
+.markdown h1,
+.markdown h2,
+.markdown h3,
+.markdown h4,
+.markdown h5,
+.markdown h6 {
+ color: var(--ifm-heading-color);
+}
+
+/* Tables */
+[data-theme='dark'] table {
+ border-color: rgba(255, 255, 255, 0.08);
+}
+
+[data-theme='dark'] table thead tr {
+ background: rgba(255, 255, 255, 0.04);
+ border-bottom: 1px solid rgba(255, 255, 255, 0.08);
+}
+
+[data-theme='dark'] table tbody tr {
+ border-bottom: 1px solid rgba(255, 255, 255, 0.04);
+}
+
+[data-theme='dark'] table tbody tr:nth-child(2n) {
+ background: rgba(255, 255, 255, 0.02);
+}
+
+[data-theme='dark'] th,
+[data-theme='dark'] td {
+ border-color: rgba(255, 255, 255, 0.06);
+}
+
+/* Blockquotes */
+[data-theme='dark'] blockquote {
+ border-left-color: #fbbf24;
+ background: rgba(251, 191, 36, 0.05);
+ color: #cbd5e1;
+}
+
+/* Horizontal rules */
+[data-theme='dark'] hr {
+ border-color: rgba(255, 255, 255, 0.06);
+}
+
+
+/* ========================================
+ 7. Code Blocks
+ ======================================== */
+
+[data-theme='dark'] .prism-code {
+ background: rgba(255, 255, 255, 0.04) !important;
+ border: 1px solid rgba(255, 255, 255, 0.06);
+}
+
+[data-theme='dark'] code {
+ background: rgba(255, 255, 255, 0.06);
+ border: 1px solid rgba(255, 255, 255, 0.08);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] a code {
+ color: var(--ifm-link-color);
+}
+
+/* Code block title bar */
+[data-theme='dark'] .codeBlockTitle_node_modules-\@docusaurus-theme-classic-lib-theme-CodeBlock-Content-styles-module {
+ background: rgba(255, 255, 255, 0.06) !important;
+ border-bottom: 1px solid rgba(255, 255, 255, 0.06);
+}
+
+
+/* ========================================
+ 8. Admonitions
+ ======================================== */
+
+[data-theme='dark'] .alert {
+ background: rgba(255, 255, 255, 0.03);
+ border: 1px solid rgba(255, 255, 255, 0.06);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .alert--info {
+ border-left: 4px solid #06b6d4;
+ background: rgba(6, 182, 212, 0.06);
+}
+
+[data-theme='dark'] .alert--warning {
+ border-left: 4px solid #f59e0b;
+ background: rgba(245, 158, 11, 0.06);
+}
+
+[data-theme='dark'] .alert--danger {
+ border-left: 4px solid #ef4444;
+ background: rgba(239, 68, 68, 0.06);
}
-.bharatml-card:hover {
- border-color: var(--bharatml-primary);
- box-shadow: 0 4px 20px rgba(69, 8, 57, 0.1);
- transform: translateY(-2px);
+[data-theme='dark'] .alert--success {
+ border-left: 4px solid #10b981;
+ background: rgba(16, 185, 129, 0.06);
}
-.bharatml-icon {
- width: 64px;
- height: 64px;
- background: linear-gradient(135deg, var(--bharatml-primary), var(--bharatml-primary-hover));
+[data-theme='dark'] .alert--secondary {
+ border-left: 4px solid #fbbf24;
+ background: rgba(251, 191, 36, 0.06);
+}
+
+[data-theme='dark'] .admonitionHeading_node_modules-\@docusaurus-theme-classic-lib-theme-Admonition-Layout-styles-module {
+ color: inherit;
+}
+
+
+/* ========================================
+ 9. Table of Contents (right sidebar)
+ ======================================== */
+
+[data-theme='dark'] .table-of-contents__link {
+ color: #94a3b8;
+}
+
+[data-theme='dark'] .table-of-contents__link:hover,
+[data-theme='dark'] .table-of-contents__link--active {
+ color: #fbbf24;
+}
+
+[data-theme='dark'] .table-of-contents {
+ border-left: 1px solid rgba(255, 255, 255, 0.06);
+}
+
+
+/* ========================================
+ 10. Pagination / Doc navigation
+ ======================================== */
+
+[data-theme='dark'] .pagination-nav__link {
+ background: rgba(255, 255, 255, 0.03);
+ border: 1px solid rgba(255, 255, 255, 0.08);
border-radius: 12px;
- display: flex;
- align-items: center;
- justify-content: center;
- margin: 0 auto 1rem;
- font-size: 1.5rem;
- color: white;
+ transition: all 0.3s;
+}
+
+[data-theme='dark'] .pagination-nav__link:hover {
+ border-color: rgba(251, 191, 36, 0.3);
+ background: rgba(251, 191, 36, 0.06);
+}
+
+[data-theme='dark'] .pagination-nav__sublabel {
+ color: #94a3b8;
+}
+
+[data-theme='dark'] .pagination-nav__label {
+ color: #e2e8f0;
+}
+
+
+/* ========================================
+ 11. Blog-specific
+ ======================================== */
+
+[data-theme='dark'] .blog-post-page article header h1 {
+ color: #f1f5f9;
+}
+
+[data-theme='dark'] article .avatar__name a {
+ color: #fbbf24;
+}
+
+[data-theme='dark'] .blog-tags a {
+ background: rgba(251, 191, 36, 0.1);
+ border: 1px solid rgba(251, 191, 36, 0.2);
+ color: #fbbf24;
+}
+
+[data-theme='dark'] .blog-tags a:hover {
+ background: rgba(251, 191, 36, 0.2);
+ border-color: rgba(251, 191, 36, 0.4);
+ text-decoration: none;
+}
+
+
+/* ========================================
+ 12. Search and misc inputs
+ ======================================== */
+
+[data-theme='dark'] .navbar__search-input {
+ background: rgba(255, 255, 255, 0.05);
+ border: 1px solid rgba(255, 255, 255, 0.1);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .navbar__search-input::placeholder {
+ color: #64748b;
+}
+
+
+/* ========================================
+ 13. Breadcrumbs
+ ======================================== */
+
+[data-theme='dark'] .breadcrumbs__link {
+ background: rgba(255, 255, 255, 0.04);
+ color: #94a3b8;
+ border-radius: 6px;
+}
+
+[data-theme='dark'] .breadcrumbs__link:hover {
+ background: rgba(251, 191, 36, 0.1);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .breadcrumbs__item--active .breadcrumbs__link {
+ background: rgba(251, 191, 36, 0.12);
+ color: #fbbf24;
+}
+
+
+/* ========================================
+ 14. Tabs
+ ======================================== */
+
+[data-theme='dark'] .tabs__item {
+ color: #94a3b8;
+ border-bottom-color: transparent;
+}
+
+[data-theme='dark'] .tabs__item:hover {
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .tabs__item--active {
+ color: #fbbf24;
+ border-bottom-color: #fbbf24;
+}
+
+
+/* ========================================
+ 15. Scrollbar (dark mode)
+ ======================================== */
+
+[data-theme='dark'] ::-webkit-scrollbar {
+ width: 8px;
+ height: 8px;
+}
+
+[data-theme='dark'] ::-webkit-scrollbar-track {
+ background: transparent;
+}
+
+[data-theme='dark'] ::-webkit-scrollbar-thumb {
+ background: rgba(255, 255, 255, 0.12);
+ border-radius: 4px;
+}
+
+[data-theme='dark'] ::-webkit-scrollbar-thumb:hover {
+ background: rgba(255, 255, 255, 0.2);
+}
+
+
+/* ========================================
+ 16. Version / Dropdown badges
+ ======================================== */
+
+[data-theme='dark'] .dropdown__menu {
+ background: #3d0029;
+ border: 1px solid rgba(255, 255, 255, 0.08);
+}
+
+[data-theme='dark'] .dropdown__link {
+ color: #cbd5e1;
+}
+
+[data-theme='dark'] .dropdown__link:hover {
+ background: rgba(251, 191, 36, 0.1);
+ color: #e2e8f0;
+}
+
+[data-theme='dark'] .dropdown__link--active {
+ color: #fbbf24;
+ background: rgba(251, 191, 36, 0.12);
+}
+
+
+/* ========================================
+ 17. Homepage Isolation
+ (hide Docusaurus navbar/footer on homepage)
+ ======================================== */
+
+html.homepage-active .navbar {
+ display: none !important;
+}
+
+html.homepage-active .footer {
+ display: none !important;
+}
+
+html.homepage-active main {
+ margin-top: 0;
+}
+
+html.homepage-active [class*='docMainContainer'],
+html.homepage-active [class*='mainWrapper'] {
+ padding-top: 0;
+}
+
+
+/* ========================================
+ 18. Light mode refinements
+ ======================================== */
+
+[data-theme='light'] .theme-doc-sidebar-container {
+ border-right: 1px solid rgba(0, 0, 0, 0.06);
+}
+
+[data-theme='light'] .menu__link--active:not(.menu__link--sublist) {
+ background: rgba(245, 158, 11, 0.08);
+ color: #f59e0b;
+ font-weight: 600;
+}
+
+[data-theme='light'] .menu__link:hover {
+ background: rgba(245, 158, 11, 0.05);
+}
+
+[data-theme='light'] .pagination-nav__link {
+ border-radius: 12px;
+ transition: all 0.3s;
+}
+
+[data-theme='light'] .pagination-nav__link:hover {
+ border-color: rgba(245, 158, 11, 0.3);
+ box-shadow: 0 4px 16px rgba(245, 158, 11, 0.08);
+}
+
+[data-theme='light'] blockquote {
+ border-left-color: #f59e0b;
}
diff --git a/docs-src/src/pages/index.js b/docs-src/src/pages/index.js
index c58e5f55..325cf414 100644
--- a/docs-src/src/pages/index.js
+++ b/docs-src/src/pages/index.js
@@ -1,202 +1,584 @@
-import clsx from 'clsx';
-import Link from '@docusaurus/Link';
+import React, { useEffect, useLayoutEffect, useRef, useState, useCallback } from 'react';
+import Layout from '@theme/Layout';
import useDocusaurusContext from '@docusaurus/useDocusaurusContext';
import useBaseUrl from '@docusaurus/useBaseUrl';
-import Layout from '@theme/Layout';
-import { OnlineFeatureStoreFeatures, TruffleboxUIFeatures, SDKsFeatures } from '@site/src/components/HomepageFeatures';
-
-import Heading from '@theme/Heading';
import styles from './index.module.css';
-function HomepageHeader() {
- const {siteConfig} = useDocusaurusContext();
+// ─── Data ──────────────────────────────────────────────
+
+const BARRIERS = [
+ {
+ icon: '\u{1F9E0}',
+ title: 'Focus on building intelligence, not infrastructure',
+ questions: [
+ 'Does every model deployment require a full-stack integration effort?',
+ 'Do engineers have to rebuild feature retrieval, endpoint integrations, and logging for each new model?',
+ 'Does changing a simple expression like 0.2\u00D7s\u2081 + 0.8\u00D7s\u2082 to 0.3\u00D7s\u2081 + 0.7\u00D7s\u2082 really need code reviews and redeployments?',
+ 'Why does deploying intelligence require the devops team to provision infra?',
+ ],
+ answer:
+ 'Machine learning teams should be iterating on models, not systems. Yet today, infrastructure complexity turns simple improvements into weeks of engineering effort, slowing experimentation and innovation.',
+ },
+ {
+ icon: '\u{1F4B0}',
+ title: 'Built for scale without exponential cost growth',
+ questions: [
+ 'Do your infrastructure costs scale faster than your ML impact?',
+ 'Are you recomputing the same features, reloading the same data, and moving the same bytes across systems repeatedly?',
+ 'Are expensive GPUs and compute sitting underutilized while workloads wait on data or inefficient pipelines?',
+ 'Why does scaling ML often mean scaling cost linearly\u2014or worse?',
+ ],
+ answer:
+ 'A modern ML platform should eliminate redundant computation, reuse features intelligently, and optimize data access across memory, NVMe, and object storage. Compute should be pooled, scheduled efficiently, and fully utilized\u2014ensuring that scale drives impact, not runaway infrastructure costs.',
+ },
+ {
+ icon: '\u{1F30D}',
+ title: 'Freedom to deploy anywhere, without lock-in',
+ questions: [
+ 'Are your models tied to a single cloud, making migration costly and complex?',
+ 'Does adopting managed services today limit your ability to optimize cost or move infrastructure tomorrow?',
+ 'Can you deploy the same ML stack across public cloud, private cloud, or sovereign environments without redesigning everything?',
+ 'Why should infrastructure choices dictate the future of your ML systems?',
+ ],
+ answer:
+ 'A modern ML platform should be built on open standards and cloud-neutral abstractions, allowing you to deploy anywhere\u2014public cloud, private infrastructure, or sovereign environments. This ensures complete control over your data, freedom from vendor lock-in, and the ability to optimize for cost, performance, and compliance without architectural constraints.',
+ },
+];
+
+const COMPONENTS = [
+ {
+ icon: '\u{26A1}',
+ title: 'Online Feature Store',
+ description:
+ 'BharatMLStack Online Feature Store delivers sub-10ms, high-throughput access to machine learning features for real-time inference. It seamlessly ingests batch and streaming data, validates schemas, and persists compact, versioned feature groups optimized for low latency and efficiency. With scalable storage backends, gRPC APIs, and binary-optimized formats, it ensures consistent, reliable feature serving across ML pipelines.',
+ cta: '/online-feature-store/v1.0.0',
+ },
+ {
+ icon: '\u{1F500}',
+ title: 'Inferflow',
+ description:
+ "Inferflow is BharatMLStack's intelligent inference gateway that dynamically retrieves and assembles features required by ML models using a graph-based configuration called Inferpipes. It automatically resolves entity relationships, fetches features from the Online Feature Store, and constructs feature vectors without custom code.",
+ cta: '/inferflow/v1.0.0',
+ },
+ {
+ icon: '\u{1F50D}',
+ title: 'Skye',
+ description:
+ 'Skye enables fast similarity retrieval by representing data as vectors and querying nearest matches in high-dimensional space. It supports pluggable vector databases, ensuring flexibility across infrastructure. The system provides tenant-level index isolation while allowing single embedding ingestion even when shared across tenants, reducing redundancy.',
+ cta: '/skye/v1.0.0',
+ },
+ {
+ icon: '\u{1F9EE}',
+ title: 'Numerix',
+ description:
+ 'Numerix is a high-performance compute engine designed for ultra-fast element-wise matrix operations. Built in Rust and accelerated using SIMD, it delivers exceptional efficiency and predictable performance. Optimized for real-time inference workloads, it achieves strict sub-5ms p99 latency on matrices up to 1000\u00D710.',
+ cta: '/numerix/v1.0.0',
+ },
+ {
+ icon: '\u{1F680}',
+ title: 'Predator',
+ description:
+ 'Predator streamlines infrastructure and model lifecycle management. It enables the creation of deployables with specific Triton Server versions and supports seamless model rollouts. Leveraging Helm charts and Argo CD, Predator automates Kubernetes-based deployments while integrating with KEDA for auto-scaling and performance tuning.',
+ cta: '/predator/v1.0.0',
+ },
+];
+
+const STATS = [
+ { target: 4.5, suffix: 'M+', decimals: 1, label: 'Daily Orders', description: 'Daily orders processed via ML pipelines' },
+ { target: 2.4, suffix: 'M', decimals: 1, label: 'QPS on FS', description: 'QPS on Feature Store with batch size of 100 id lookups' },
+ { target: 1, suffix: 'M+', decimals: 0, label: 'QPS Inference', description: 'QPS on Model Inference' },
+ { target: 500, suffix: 'K', decimals: 0, label: 'QPS Embedding', description: 'QPS Embedding Search' },
+];
+
+const DEMO_VIDEOS = [
+ {
+ title: 'Feature Store',
+ description: 'Learn how to onboard and manage features using the self-serve UI for the Online Feature Store.',
+ url: 'https://videos.meesho.com/reels/feature_store.mp4',
+ },
+ {
+ title: 'Embedding Platform',
+ description: 'Walkthrough of onboarding and managing embedding models via the Skye self-serve UI.',
+ url: 'https://videos.meesho.com/reels/embedding_platform.mp4',
+ },
+ {
+ title: 'Numerix',
+ description: 'Step-by-step guide to configuring and running matrix operations through the Numerix self-serve UI.',
+ url: 'https://videos.meesho.com/reels/numerix.mp4',
+ },
+ {
+ title: 'Predator',
+ description: 'How to deploy and manage ML models on Kubernetes using the Predator self-serve UI.',
+ url: 'https://videos.meesho.com/reels/predator.mp4',
+ },
+ {
+ title: 'Inferflow',
+ description: 'Setting up inferpipes and feature retrieval graphs through the Inferflow self-serve UI.',
+ url: 'https://videos.meesho.com/reels/inferflow.mp4',
+ },
+];
+
+const BLOG_POSTS = [
+ {
+ title: "Building Meesho's ML Platform: From Chaos to Cutting-Edge (Part 1)",
+ category: 'ML Platform',
+ icon: '\u{1F680}',
+ link: '/blog/post-one',
+ },
+ {
+ title: "Building Meesho's ML Platform: Lessons from the First-Gen System (Part 2)",
+ category: 'ML Platform',
+ icon: '\u{1F9E9}',
+ link: '/blog/post-two',
+ },
+ {
+ title: 'Cracking the Code: Scaling Model Inference & Real-Time Embedding Search',
+ category: 'Inference',
+ icon: '\u{26A1}',
+ link: '/blog/post-three',
+ },
+ {
+ title: 'Designing a Production-Grade LLM Inference Platform: From Model Weights to Scalable GPU Serving',
+ category: 'LLM',
+ icon: '\u{1F9E0}',
+ link: '/blog/post-four',
+ },
+ {
+ title: 'LLM Inference Optimization Techniques: Engineering Sub-Second Latency at Scale',
+ category: 'Optimization',
+ icon: '\u{1F52C}',
+ link: '/blog/post-five',
+ },
+];
+
+// ─── Components ────────────────────────────────────────
+
+function CustomNav() {
+ const docsUrl = useBaseUrl('/');
+ const blogUrl = useBaseUrl('/blog');
return (
-
-
Open source, end-to-end ML infrastructure stack built for scale, speed, and simplicity.
+ Integrate, deploy, and manage robust ML workflows with full reliability and control.
- BharatMLStack is a comprehensive, production-ready machine learning infrastructure
- platform designed to democratize ML capabilities across India and beyond. Our mission
- is to provide a robust, scalable, and accessible ML stack that empowers organizations
- to build, deploy, and manage machine learning solutions at massive scale.
-
-
- Explore Online Feature Store →
-
-
-
-
-
🏆 Key Achievements
-
-
✅ Sub-10ms P99 latency for real-time inference
-
✅ 1M+ RPS tested with 100 IDs per request
-
✅ PSDB format outperforms Proto3 & Arrow
-
✅ Multi-database: Scylla, Dragonfly, Redis
-
✅ Production-ready with comprehensive monitoring
+
+
+
+
Why BharatMLStack
+
The Real Barriers to Scaling Machine Learning
+
+ ML teams spend more time fighting infrastructure than building intelligence.
+ BharatMLStack removes those barriers.
+
- Trufflebox UI provides a comprehensive, modern web interface for managing your entire
- ML infrastructure. Built with cutting-edge web technologies, it delivers an intuitive
- experience for feature management, user administration, and operational oversight.
- Streamline your MLOps workflows with enterprise-grade UI components.
-
-
- Explore Trufflebox UI →
-
-
-
-
-
🎨 UI Features
-
-
✅ Comprehensive feature catalog & discovery
-
✅ Role-based access control & user management
-
✅ Job, Store, Admin Ops management
-
✅ Approval flow for everything
-
✅ Responsive design for desktop & mobile
-
+
+
+
+
Platform Components
+
BharatMLStack Components
+
+ Purpose-built components for every stage of the ML lifecycle, from feature
+ serving to model deployment.
+
- Our SDKs are designed with developers in mind, providing idiomatic APIs for Go and Python
- that feel natural in your existing codebase. Whether you're building microservices,
- data pipelines, or ML applications, our SDKs provide the tools you need for seamless
- integration with BharatMLStack's powerful infrastructure.
-
-
- Explore SDKs →
-
-
-
-
-
🛠️ Developer Tools
-
-
✅ Native Go & Python SDKs with type safety
-
✅ High-performance gRPC
-
✅ Apache Spark integration for publishing features