Skip to content

NEXARA-oss/PulseStack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

29 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

โšก PulseStack

Observability โ€ข Replay โ€ข Runtime Intelligence

Built for Distributed AI Systems, Agents, and Complex Workflows



Trace every workflow. Replay every execution. Understand every decision.

PulseStack provides deep visibility into AI-powered distributed systems, enabling teams to monitor execution flows, investigate failures, replay workflows, and gain actionable runtime insightsโ€”all from a unified observability layer.


Why PulseStack Exists

Modern AI systems are no longer simple prompt-response applications.

Today's production environments consist of:

  • ๐Ÿค– Multi-Agent Workflows
  • ๐Ÿ”„ Distributed Automation Pipelines
  • ๐Ÿ› ๏ธ Tool-Calling Runtimes
  • โšก Event-Driven Orchestrators
  • ๐Ÿง  Long-Running Autonomous Systems
  • ๐ŸŒ Cross-Service Execution Graphs

Yet most observability platforms were designed for:

  • REST APIs
  • Microservices
  • Traditional Backend Applications
  • Request/Response Lifecycles

They were never built to understand AI-native execution flows.


The AI Observability Gap

                    Traditional Monitoring

    Request โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Service โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Database
          โœ“ Logs         โœ“ Metrics          โœ“ Traces


                     Modern AI Workflows

 User
   โ”‚
   โ–ผ
 Agent A
   โ”‚
   โ”œโ”€โ”€โ–บ Tool Call
   โ”‚
   โ”œโ”€โ”€โ–บ Agent B
   โ”‚       โ”‚
   โ”‚       โ”œโ”€โ”€โ–บ Memory Retrieval
   โ”‚       โ”œโ”€โ”€โ–บ LLM Invocation
   โ”‚       โ””โ”€โ”€โ–บ External API
   โ”‚
   โ””โ”€โ”€โ–บ Workflow Orchestrator
             โ”‚
             โ”œโ”€โ”€โ–บ Event Stream
             โ”œโ”€โ”€โ–บ Retry Logic
             โ””โ”€โ”€โ–บ Agent C

        โŒ Difficult to Trace
        โŒ Difficult to Replay
        โŒ Difficult to Debug

The Problem

When AI workflows fail in production, teams struggle to answer critical questions:

  • Which agent step failed?
  • What event triggered the failure chain?
  • Which tool call introduced latency?
  • Why did the workflow generate inconsistent outputs?
  • How can the execution be replayed deterministically?
  • Which workflow version caused the regression?
  • Which tenant experienced the issue?
  • Where are tokens, retries, and infrastructure costs being consumed?

โšก PulseStack Changes That

PulseStack introduces a purpose-built Runtime Intelligence Layer for AI-native systems, providing complete visibility into workflows, agents, tools, and execution paths.

By capturing every event, trace, decision, tool invocation, workflow transition, and execution artifact, PulseStack enables teams to:

  • ๐Ÿ” Observe workflows in real time
  • ๐Ÿ”„ Replay executions deterministically
  • ๐Ÿž Debug failures with full context
  • ๐Ÿ“Š Analyze performance and bottlenecks
  • ๐Ÿค– Understand agent behavior at scale

From prompts to agents, tools, workflows, and distributed execution graphs โ€” PulseStack turns AI systems from black boxes into fully observable, replayable, and debuggable platforms.


What PulseStack Solves

PulseStack provides a unified Runtime Intelligence Layer for modern AI systems.

It helps teams:

  • ๐Ÿ” Trace distributed AI workflows in real time
  • ๐Ÿค– Monitor multi-agent execution pipelines
  • ๐Ÿ”„ Replay workflow runs deterministically
  • ๐Ÿ“Š Analyze bottlenecks and performance issues
  • ๐ŸŒ Correlate events across services and agents
  • ๐Ÿ“ˆ Stream runtime telemetry at scale
flowchart TB
    A["๐Ÿค– Agents"]
    B["๐Ÿ› ๏ธ Tools"]
    C["๐Ÿ“ก Events"]
    D["๐ŸŒ Services"]
    E["๐Ÿ”„ Workflows"]

    P["โšก PulseStack"]

    T["๐Ÿ” Trace"]
    R["โช Replay"]
    N["๐Ÿ“Š Analyze"]
    O["๐Ÿ‘๏ธ Observe"]

    A --> P
    B --> P
    C --> P
    D --> P
    E --> P

    P --> T
    P --> R
    P --> N
    P --> O
Loading

Turn complex AI workflows into observable, debuggable, and replayable systems.


Core Architecture

flowchart TD
    A[Agents & Workflows]
    B[Pulse Gateway API]
    C[Distributed Event Bus]
    D[Runtime Engine & DAG Executor]
    E[Observability Pipeline]
    F[(ClickHouse & PostgreSQL)]
    G[Replay Engine & UI Console]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
Loading

Every workflow event flows through the runtime layer, enabling real-time observability, deterministic replay, and deep execution insights.


Features

flowchart LR
    A["๐Ÿค– AI Runtime"] --> B["๐Ÿ”„ Event Bus"]
    B --> C["๐ŸŒ Tracing"]

    A --> D["โšก DAG Engine"]
    B --> E["๐Ÿ“Š Analytics"]
    C --> F["๐Ÿ” Replay"]

    D --> E
    E --> F
Loading
Feature Description
๐Ÿค– AI Workflow Runtime Execute distributed AI workflows
๐Ÿ”„ Distributed Event Pipeline Real-time event streaming across services
๐Ÿ” Deterministic Replay Engine Replay workflow executions for debugging
๐ŸŒ OpenTelemetry Tracing End-to-end execution visibility
โšก DAG Execution Engine Dependency-aware workflow orchestration
๐Ÿ”Œ Plugin System Custom runtime extensions and integrations
๐Ÿ“Š Runtime Analytics Latency, retries, tokens, throughput, failures
๐Ÿ–ฅ๏ธ React Operations Console Visual workflow and trace inspection
โ˜ธ๏ธ Kubernetes + Helm Production-ready deployment support

๐Ÿ“ฆ Monorepo Structure

PulseStack follows a modular monorepo architecture, separating runtime services, shared packages, infrastructure assets, and extensibility layers.

๐Ÿ“ฆ apps/          โ†’ Runtime services & applications
๐Ÿ“ฆ packages/      โ†’ Shared libraries & SDKs
โ˜ธ๏ธ infra/         โ†’ Deployment & infrastructure assets
๐Ÿ”Œ plugins/       โ†’ Runtime extensions
๐Ÿ“œ proto/         โ†’ Protocol definitions
apps/
  pulse-gateway/
  pulse-runtime/
  pulse-events/
  pulse-web/

packages/
  contracts/
  core/
  sdk/
  plugin-sdk/
  ui/

infra/
  docker/
  helm/
  k8s/

plugins/
  audit-log/

proto/
  pulsestack.proto

๐Ÿ’ก Contributors interested in the internal execution model can explore the Distributed Runtime Architecture documentation.


๐Ÿ› ๏ธ Tech Stack

Layer Technology
โš™๏ธ Runtime Node.js + TypeScript
๐Ÿ”— Transport gRPC + WebSockets
๐Ÿ“จ Messaging NATS
๐Ÿ“ˆ Observability OpenTelemetry
๐Ÿ“Š Analytics ClickHouse
๐Ÿ—„๏ธ Persistence PostgreSQL
๐Ÿ–ฅ๏ธ Frontend React + Vite
โ˜ธ๏ธ Infrastructure Docker ยท Kubernetes ยท Helm
๐Ÿ“ฆ Monorepo Turbo ยท pnpm

PulseStack Overview

flowchart LR
    U["๐Ÿ‘ค Operator / User"] --> W["๐Ÿค– AI Workflow"]

    W --> A["๐Ÿง  Agents"]
    W --> T["๐Ÿ› ๏ธ Tools"]
    W --> L["๐Ÿ’ฌ LLMs"]

    A --> P["โšก PulseStack"]
    T --> P
    L --> P

    P --> O["๐Ÿ” Tracing"]
    P --> R["๐Ÿ”„ Replay"]
    P --> M["๐Ÿ“Š Metrics"]
    P --> G["๐Ÿ“ˆ Analytics"]
Loading

๐Ÿš€ Local Development

Prerequisites

Before getting started, ensure the following tools are installed:

  • ๐Ÿ“ฆ Node.js 20+
  • โšก pnpm
  • ๐Ÿณ Docker

1๏ธโƒฃ Start Infrastructure

docker compose -f infra/docker/docker-compose.yml up -d

2๏ธโƒฃ Install Dependencies

pnpm install

3๏ธโƒฃ Start Development Environment

pnpm dev

๐ŸŒ Local Services

Service Endpoint
๐Ÿšช Gateway API http://localhost:4000
๐Ÿ–ฅ๏ธ Runtime UI http://localhost:3000
๐Ÿ“š Swagger Docs http://localhost:4101/docs

๐Ÿ’ก Once the development environment is running, open the Runtime UI to inspect workflows, traces, and replay sessions.

Environment Setup

Create a .env file from the example file:

Linux/macOS

cp .env.example .env

Windows CMD

copy .env.example .env

Windows PowerShell

Copy-Item .env.example .env

Environment Variables

Variable Purpose Default Value
RUNTIME_URL Runtime engine service http://localhost:4101
EVENTS_URL Event streaming service http://localhost:4102
TRACE_URL Tracing service http://localhost:4103
REPLAY_URL Replay engine http://localhost:4104
METRICS_URL Metrics service http://localhost:4105
GRAPH_URL Workflow graph service http://localhost:4106
VITE_GATEWAY_URL Frontend gateway URL http://localhost:4000

A sample .env.example file is included with default local development values.

Troubleshooting

Port already in use

Stop conflicting local services or change ports.

Docker services not starting

Ensure Docker Desktop is running before executing:

docker compose -f infra/docker/docker-compose.yml up -d

pnpm command not found

Install pnpm globally:

npm install -g pnpm

Docker command not found

Ensure Docker Desktop is installed and running before starting infrastructure services.

๐Ÿ“ Example Workflow Payload

The following example demonstrates a simple AI workflow consisting of an Agent โ†’ Tool โ†’ LLM execution chain.

Plan Agent
    โ”‚
    โ–ผ
Fetch Logs
    โ”‚
    โ–ผ
Summarize with LLM
{
  "workflow": {
    "id": "wf_agent_ops",
    "name": "Agent Ops",
    "version": "1.0.0",
    "tenantId": "local",
    "correlationId": "corr_agent_ops",
    "metadata": {},
    "steps": [
      {
        "id": "s1",
        "name": "Plan",
        "kind": "agent",
        "dependsOn": [],
        "input": {
          "objective": "inspect queue health"
        }
      },
      {
        "id": "s2",
        "name": "FetchLogs",
        "kind": "tool",
        "dependsOn": ["s1"],
        "retry": {
          "maxAttempts": 3,
          "backoffMs": 250,
          "maxBackoffMs": 2000,
          "exponential": true
        },
        "input": {
          "tool": "logs.query",
          "range": "15m"
        }
      },
      {
        "id": "s3",
        "name": "Summarize",
        "kind": "llm",
        "dependsOn": ["s2"],
        "input": {
          "model": "gpt-4.1",
          "prompt": "Summarize anomalies"
        }
      }
    ]
  },
  "input": {
    "environment": "prod"
  },
  "initiatedBy": "operator"
}

Runtime Retry Policies

Workflow steps can opt into bounded retry handling with a retry policy:

{
  "id": "fetch_logs",
  "name": "Fetch logs",
  "kind": "tool",
  "retry": {
    "maxAttempts": 3,
    "backoffMs": 250,
    "maxBackoffMs": 2000,
    "exponential": true
  },
  "input": {
    "tool": "logs.query"
  }
}

Retries are disabled by default (maxAttempts: 1). When enabled, PulseStack emits step.retrying events before another attempt, emits step.failed when attempts are exhausted, records retry metadata in execution output and snapshots under __retry, and marks the workflow as failed instead of leaving the execution in a running state.


๐Ÿ—บ๏ธ Roadmap

flowchart LR
    A["๐Ÿš€ Today<br/>Runtime Intelligence"]
    B["๐Ÿ”ฎ Future<br/>AI-Native Observability Platform"]

    A --> B
Loading
  • ๐Ÿ” Multi-Tenant Isolation
  • ๐Ÿ‘ฅ RBAC & Authentication Providers
  • โช Workflow Time-Travel Debugger
  • ๐Ÿ’ฐ Token Cost Analytics
  • ๐Ÿง  Agent Memory Visualization
  • ๐ŸŒ Live Workflow Graph Rendering
  • ๐Ÿ”„ Distributed Replay Clustering
  • ๐Ÿค– ML-Based Anomaly Detection
  • ๐Ÿ›ก๏ธ AI Safety Event Auditing
  • ๐Ÿ”ญ Runtime Health Dashboards
  • ๐Ÿ“ˆ Predictive Workflow Insights
  • ๐Ÿ”— CNCF & OpenTelemetry Integrations

Vision

Build the observability platform that AI systems deserve โ€” where every workflow, agent, event, decision, and execution path can be traced, replayed, analyzed, and understood.


๐ŸŒ Open Source Vision

PulseStack is being built as:

  • ๐ŸŒ An open observability standard for AI-native systems
  • ๐Ÿ—๏ธ A contributor-first infrastructure project
  • ๐Ÿ” A foundation for next-generation AI debugging and runtime intelligence
  • ๐Ÿš€ A platform for building reliable, observable, and replayable AI workflows

๐Ÿค Who Can Contribute?

We welcome:

  • Open Source Contributors
  • Infrastructure Engineers
  • AI Runtime Builders
  • Observability Engineers
  • GSoC & GSSoC Contributors
  • Distributed Systems Enthusiasts

Join us in shaping the future of AI observability, replay, and runtime intelligence.


๐Ÿค Contributing

Get started locally in a few minutes:

git clone https://github.com/sreerevanth/PulseStack.git
cd PulseStack
pnpm install
pnpm dev

We welcome issues, feature requests, discussions, documentation improvements, and pull requests from contributors of all experience levels.


๐Ÿ“œ License

Distributed under the MIT License.


โšก PulseStack

Building observability infrastructure for the next generation of autonomous systems.


โค๏ธ Credits

Core Contributors

Contributor Role
@sreerevanth Founder, Initial Design, System Architecture
Contributors Development, Documentation, Testing, Community Support

Special Recognition

  • ๐Ÿ’ก Idea & Vision โ€” @sreerevanth
  • ๐Ÿ—๏ธ Initial System Design โ€” @sreerevanth
  • ๐Ÿš€ Project Founder โ€” @sreerevanth

Made with โค๏ธ by the open-source community.

About

Open-source observability and runtime intelligence platform for distributed AI workflows, autonomous agents, and automation systems.

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors