Built for Distributed AI Systems, Agents, and Complex Workflows
Trace every workflow. Replay every execution. Understand every decision.
PulseStack provides deep visibility into AI-powered distributed systems, enabling teams to monitor execution flows, investigate failures, replay workflows, and gain actionable runtime insightsโall from a unified observability layer.
Modern AI systems are no longer simple prompt-response applications.
Today's production environments consist of:
- ๐ค Multi-Agent Workflows
- ๐ Distributed Automation Pipelines
- ๐ ๏ธ Tool-Calling Runtimes
- โก Event-Driven Orchestrators
- ๐ง Long-Running Autonomous Systems
- ๐ Cross-Service Execution Graphs
Yet most observability platforms were designed for:
- REST APIs
- Microservices
- Traditional Backend Applications
- Request/Response Lifecycles
They were never built to understand AI-native execution flows.
Traditional Monitoring
Request โโโโโโโโโโบ Service โโโโโโโโโโบ Database
โ Logs โ Metrics โ Traces
Modern AI Workflows
User
โ
โผ
Agent A
โ
โโโโบ Tool Call
โ
โโโโบ Agent B
โ โ
โ โโโโบ Memory Retrieval
โ โโโโบ LLM Invocation
โ โโโโบ External API
โ
โโโโบ Workflow Orchestrator
โ
โโโโบ Event Stream
โโโโบ Retry Logic
โโโโบ Agent C
โ Difficult to Trace
โ Difficult to Replay
โ Difficult to Debug
When AI workflows fail in production, teams struggle to answer critical questions:
- Which agent step failed?
- What event triggered the failure chain?
- Which tool call introduced latency?
- Why did the workflow generate inconsistent outputs?
- How can the execution be replayed deterministically?
- Which workflow version caused the regression?
- Which tenant experienced the issue?
- Where are tokens, retries, and infrastructure costs being consumed?
PulseStack introduces a purpose-built Runtime Intelligence Layer for AI-native systems, providing complete visibility into workflows, agents, tools, and execution paths.
By capturing every event, trace, decision, tool invocation, workflow transition, and execution artifact, PulseStack enables teams to:
- ๐ Observe workflows in real time
- ๐ Replay executions deterministically
- ๐ Debug failures with full context
- ๐ Analyze performance and bottlenecks
- ๐ค Understand agent behavior at scale
From prompts to agents, tools, workflows, and distributed execution graphs โ PulseStack turns AI systems from black boxes into fully observable, replayable, and debuggable platforms.
PulseStack provides a unified Runtime Intelligence Layer for modern AI systems.
It helps teams:
- ๐ Trace distributed AI workflows in real time
- ๐ค Monitor multi-agent execution pipelines
- ๐ Replay workflow runs deterministically
- ๐ Analyze bottlenecks and performance issues
- ๐ Correlate events across services and agents
- ๐ Stream runtime telemetry at scale
flowchart TB
A["๐ค Agents"]
B["๐ ๏ธ Tools"]
C["๐ก Events"]
D["๐ Services"]
E["๐ Workflows"]
P["โก PulseStack"]
T["๐ Trace"]
R["โช Replay"]
N["๐ Analyze"]
O["๐๏ธ Observe"]
A --> P
B --> P
C --> P
D --> P
E --> P
P --> T
P --> R
P --> N
P --> O
Turn complex AI workflows into observable, debuggable, and replayable systems.
flowchart TD
A[Agents & Workflows]
B[Pulse Gateway API]
C[Distributed Event Bus]
D[Runtime Engine & DAG Executor]
E[Observability Pipeline]
F[(ClickHouse & PostgreSQL)]
G[Replay Engine & UI Console]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
Every workflow event flows through the runtime layer, enabling real-time observability, deterministic replay, and deep execution insights.
flowchart LR
A["๐ค AI Runtime"] --> B["๐ Event Bus"]
B --> C["๐ Tracing"]
A --> D["โก DAG Engine"]
B --> E["๐ Analytics"]
C --> F["๐ Replay"]
D --> E
E --> F
| Feature | Description |
|---|---|
| ๐ค AI Workflow Runtime | Execute distributed AI workflows |
| ๐ Distributed Event Pipeline | Real-time event streaming across services |
| ๐ Deterministic Replay Engine | Replay workflow executions for debugging |
| ๐ OpenTelemetry Tracing | End-to-end execution visibility |
| โก DAG Execution Engine | Dependency-aware workflow orchestration |
| ๐ Plugin System | Custom runtime extensions and integrations |
| ๐ Runtime Analytics | Latency, retries, tokens, throughput, failures |
| ๐ฅ๏ธ React Operations Console | Visual workflow and trace inspection |
| โธ๏ธ Kubernetes + Helm | Production-ready deployment support |
PulseStack follows a modular monorepo architecture, separating runtime services, shared packages, infrastructure assets, and extensibility layers.
๐ฆ apps/ โ Runtime services & applications
๐ฆ packages/ โ Shared libraries & SDKs
โธ๏ธ infra/ โ Deployment & infrastructure assets
๐ plugins/ โ Runtime extensions
๐ proto/ โ Protocol definitions
apps/
pulse-gateway/
pulse-runtime/
pulse-events/
pulse-web/
packages/
contracts/
core/
sdk/
plugin-sdk/
ui/
infra/
docker/
helm/
k8s/
plugins/
audit-log/
proto/
pulsestack.proto๐ก Contributors interested in the internal execution model can explore the Distributed Runtime Architecture documentation.
| Layer | Technology |
|---|---|
| โ๏ธ Runtime | Node.js + TypeScript |
| ๐ Transport | gRPC + WebSockets |
| ๐จ Messaging | NATS |
| ๐ Observability | OpenTelemetry |
| ๐ Analytics | ClickHouse |
| ๐๏ธ Persistence | PostgreSQL |
| ๐ฅ๏ธ Frontend | React + Vite |
| โธ๏ธ Infrastructure | Docker ยท Kubernetes ยท Helm |
| ๐ฆ Monorepo | Turbo ยท pnpm |
flowchart LR
U["๐ค Operator / User"] --> W["๐ค AI Workflow"]
W --> A["๐ง Agents"]
W --> T["๐ ๏ธ Tools"]
W --> L["๐ฌ LLMs"]
A --> P["โก PulseStack"]
T --> P
L --> P
P --> O["๐ Tracing"]
P --> R["๐ Replay"]
P --> M["๐ Metrics"]
P --> G["๐ Analytics"]
Before getting started, ensure the following tools are installed:
- ๐ฆ Node.js 20+
- โก pnpm
- ๐ณ Docker
docker compose -f infra/docker/docker-compose.yml up -dpnpm installpnpm dev| Service | Endpoint |
|---|---|
| ๐ช Gateway API | http://localhost:4000 |
| ๐ฅ๏ธ Runtime UI | http://localhost:3000 |
| ๐ Swagger Docs | http://localhost:4101/docs |
๐ก Once the development environment is running, open the Runtime UI to inspect workflows, traces, and replay sessions.
Create a .env file from the example file:
cp .env.example .envcopy .env.example .envCopy-Item .env.example .env| Variable | Purpose | Default Value |
|---|---|---|
| RUNTIME_URL | Runtime engine service | http://localhost:4101 |
| EVENTS_URL | Event streaming service | http://localhost:4102 |
| TRACE_URL | Tracing service | http://localhost:4103 |
| REPLAY_URL | Replay engine | http://localhost:4104 |
| METRICS_URL | Metrics service | http://localhost:4105 |
| GRAPH_URL | Workflow graph service | http://localhost:4106 |
| VITE_GATEWAY_URL | Frontend gateway URL | http://localhost:4000 |
A sample .env.example file is included with default local development values.
Stop conflicting local services or change ports.
Ensure Docker Desktop is running before executing:
docker compose -f infra/docker/docker-compose.yml up -dInstall pnpm globally:
npm install -g pnpmEnsure Docker Desktop is installed and running before starting infrastructure services.
The following example demonstrates a simple AI workflow consisting of an Agent โ Tool โ LLM execution chain.
Plan Agent
โ
โผ
Fetch Logs
โ
โผ
Summarize with LLM
{
"workflow": {
"id": "wf_agent_ops",
"name": "Agent Ops",
"version": "1.0.0",
"tenantId": "local",
"correlationId": "corr_agent_ops",
"metadata": {},
"steps": [
{
"id": "s1",
"name": "Plan",
"kind": "agent",
"dependsOn": [],
"input": {
"objective": "inspect queue health"
}
},
{
"id": "s2",
"name": "FetchLogs",
"kind": "tool",
"dependsOn": ["s1"],
"retry": {
"maxAttempts": 3,
"backoffMs": 250,
"maxBackoffMs": 2000,
"exponential": true
},
"input": {
"tool": "logs.query",
"range": "15m"
}
},
{
"id": "s3",
"name": "Summarize",
"kind": "llm",
"dependsOn": ["s2"],
"input": {
"model": "gpt-4.1",
"prompt": "Summarize anomalies"
}
}
]
},
"input": {
"environment": "prod"
},
"initiatedBy": "operator"
}Workflow steps can opt into bounded retry handling with a retry policy:
{
"id": "fetch_logs",
"name": "Fetch logs",
"kind": "tool",
"retry": {
"maxAttempts": 3,
"backoffMs": 250,
"maxBackoffMs": 2000,
"exponential": true
},
"input": {
"tool": "logs.query"
}
}Retries are disabled by default (maxAttempts: 1). When enabled, PulseStack emits step.retrying
events before another attempt, emits step.failed when attempts are exhausted, records retry metadata
in execution output and snapshots under __retry, and marks the workflow as failed instead of leaving
the execution in a running state.
flowchart LR
A["๐ Today<br/>Runtime Intelligence"]
B["๐ฎ Future<br/>AI-Native Observability Platform"]
A --> B
- ๐ Multi-Tenant Isolation
- ๐ฅ RBAC & Authentication Providers
- โช Workflow Time-Travel Debugger
- ๐ฐ Token Cost Analytics
- ๐ง Agent Memory Visualization
- ๐ Live Workflow Graph Rendering
- ๐ Distributed Replay Clustering
- ๐ค ML-Based Anomaly Detection
- ๐ก๏ธ AI Safety Event Auditing
- ๐ญ Runtime Health Dashboards
- ๐ Predictive Workflow Insights
- ๐ CNCF & OpenTelemetry Integrations
Build the observability platform that AI systems deserve โ where every workflow, agent, event, decision, and execution path can be traced, replayed, analyzed, and understood.
PulseStack is being built as:
- ๐ An open observability standard for AI-native systems
- ๐๏ธ A contributor-first infrastructure project
- ๐ A foundation for next-generation AI debugging and runtime intelligence
- ๐ A platform for building reliable, observable, and replayable AI workflows
We welcome:
- Open Source Contributors
- Infrastructure Engineers
- AI Runtime Builders
- Observability Engineers
- GSoC & GSSoC Contributors
- Distributed Systems Enthusiasts
Join us in shaping the future of AI observability, replay, and runtime intelligence.
Get started locally in a few minutes:
git clone https://github.com/sreerevanth/PulseStack.git
cd PulseStack
pnpm install
pnpm devWe welcome issues, feature requests, discussions, documentation improvements, and pull requests from contributors of all experience levels.
Distributed under the MIT License.
Building observability infrastructure for the next generation of autonomous systems.
| Contributor | Role |
|---|---|
| @sreerevanth | Founder, Initial Design, System Architecture |
| Contributors | Development, Documentation, Testing, Community Support |
- ๐ก Idea & Vision โ @sreerevanth
- ๐๏ธ Initial System Design โ @sreerevanth
- ๐ Project Founder โ @sreerevanth
Made with โค๏ธ by the open-source community.