ProbOS

Alpha — ProbOS is under active development. APIs will change, features may break, and documentation may lag behind the code. Contributions and feedback welcome.

Probabilistic agent-native OS runtime — an operating system kernel where every component is an autonomous agent, coordination happens through consensus, and the system learns from its own behavior.

"What if an OS didn't execute instructions — it negotiated them?"

What Is This?

ProbOS reimagines the OS as a mesh of probabilistic agents rather than deterministic processes. Instead of syscalls, you speak natural language. Instead of a scheduler, agents self-organize through Hebbian learning and trust networks. Instead of permissions, destructive operations require multi-agent consensus.

See it in action: ProbOS: An AI Operating System That Builds Its Own Agents (Live Demo)

Design Philosophy

Traditional operating systems use rigid, deterministic mechanisms: syscalls, schedulers, ACLs. ProbOS replaces each with a probabilistic, self-organizing equivalent:

Traditional OS	ProbOS Equivalent
Syscalls	Natural language decomposed into intent DAGs
Process scheduler	Attention-based priority scoring with Hebbian learning
File permissions / ACLs	Multi-agent consensus voting with red team verification
Process table	Agent registry with health monitoring and auto-recycling
IPC	Pub/sub intent bus with concurrent fan-out
Cron / scheduled tasks	Dreaming engine — offline consolidation during idle periods
Command history	Episodic memory with semantic recall
Shell aliases	Workflow cache — learned shortcuts for repeated patterns

Every agent maintains a confidence score and trust reputation. The system doesn't just execute operations — it deliberates, verifies, and learns.

Architecture

Five layers plus two cross-cutting concerns, each built on the one below:

┌─────────────────────────────────────────────────────┐
│  Experience    CLI shell, HXI (WebGL canvas),       │
│                FastAPI + WebSocket, Rich panels     │
├─────────────────────────────────────────────────────┤
│  Cognitive     LLM decomposer, working memory,      │
│                episodic memory, attention, dreaming, │
│                self-modification, agent design,      │
│                workflow cache, dynamic prompts       │
├─────────────────────────────────────────────────────┤
│  Consensus     Quorum voting, trust network,         │
│                Shapley attribution, escalation       │
├─────────────────────────────────────────────────────┤
│  Mesh          Intent bus, Hebbian routing,           │
│                gossip protocol, capability registry  │
├─────────────────────────────────────────────────────┤
│  Substrate     Agent lifecycle, pools, spawner,       │
│                registry, heartbeat, event log        │
├─────────────────────────────────────────────────────┤
│  Federation    ZeroMQ transport, node bridge,         │
│                intent router, gossip exchange        │
├─────────────────────────────────────────────────────┤
│  Knowledge     Git-backed store, ChromaDB semantic,   │
│                warm boot, per-artifact rollback      │
└─────────────────────────────────────────────────────┘

Substrate

Agents follow a perceive → decide → act → report lifecycle. A spawner creates them from templates, resource pools maintain target sizes with auto-recycling of degraded agents, and a heartbeat system monitors liveness. Everything is logged to an append-only SQLite event log.

Mesh

Agents discover each other through a capability registry with fuzzy matching. An intent bus does concurrent fan-out to all subscribers. A Hebbian router learns which agents handle which intents best — successful pairings get stronger, failures get weaker. A SWIM-style gossip protocol exchanges state between agents.

Consensus

Destructive operations (file writes, shell commands, HTTP fetches) require multi-agent agreement. A quorum engine collects confidence-weighted votes. A Bayesian trust network (Beta distribution) tracks agent reliability over time. Red team agents independently re-execute operations to verify results.

Cognitive

Natural language goes through: working memory assembly → episodic recall → workflow cache lookup → LLM decomposition into a DAG of intents → attention-based priority scoring → parallel/sequential execution → optional reflection synthesis.

The decomposer's system prompt is self-assembling — each agent class declares IntentDescriptor metadata, and the prompt is built dynamically from whatever agents are registered. Adding a new agent type makes its intents available to the LLM without editing any prompt, configuration, or routing table.

Experience

A Rich-powered interactive shell with 16 slash commands, real-time DAG execution display with spinners, and formatted result panels.

Agents

ProbOS boots with 47 agents across 20+ pools (+ 2 red team verifiers):

Core Agents (always active)

Pool	Count	Capabilities	Consensus
`system`	2	Heartbeat monitoring (CPU, load, PID)	No
`filesystem`	3	`read_file`, `stat_file`	No
`filesystem_writers`	3	`write_file`	Yes
`directory`	3	`list_directory`	No
`search`	3	`search_files` (recursive glob)	No
`shell`	3	`run_command` (30s timeout)	Yes
`http`	3	`http_fetch` (1MB cap, per-domain rate limiting)	Yes
`introspect`	2	`explain_last`, `agent_info`, `system_health`, `why`	No
`red_team`	2	Independent result verification	N/A

Bundled Cognitive Agents (10 pools, "useful on Day 1")

Pool	Capabilities
`web_search`	Search the web via mesh-routed HTTP
`page_reader`	Extract and summarize web page content
`weather`	Weather lookups via public APIs
`news`	News search and summarization
`translator`	Language translation
`summarizer`	Text summarization
`calculator`	Mathematical calculations
`todo_manager`	Task list management
`note_taker`	Note creation and retrieval
`scheduler`	Scheduling and reminders

System Agents (conditional)

Pool	Purpose	When active
`skills`	Dynamic skill execution (SkillBasedAgent)	Self-mod enabled
`system_qa`	Smoke tests for designed agents	QA enabled
`designed_*`	Self-designed agents (CognitiveAgent subclasses)	Created at runtime

A test agent (CorruptedFileReaderAgent) deliberately returns fabricated data to verify that the consensus layer detects and rejects it.

Quick Start

Requirements: Python 3.12+, uv

# Clone and install
git clone https://github.com/seangalliher/ProbOS.git
cd ProbOS
uv sync

# Run tests (1590 Python + 15 Vitest = 1605 total)
uv run pytest tests/ -v

# Launch interactive shell
uv run python -m probos

# Run the visual demo
uv run python demo.py

LLM Backend

ProbOS connects to an OpenAI-compatible LLM endpoint (configurable in config/system.yaml). Three options:

Option	Setup
No LLM (default)	Works out of the box — falls back to a built-in `MockLLMClient` with regex pattern matching. Good for exploring the architecture and running tests.
Ollama (local)	Install Ollama, pull a model (`ollama pull qwen3.5:35b`), update `config/system.yaml` endpoints to `http://127.0.0.1:11434`.
OpenAI-compatible API	Point `llm_base_url` in `config/system.yaml` to any OpenAI-compatible endpoint and set your API key.

Interactive Shell

probos> read /tmp/test.txt                   # Natural language → intent DAG
probos> list the files in /home/user/docs    # Directory listing
probos> write hello to /tmp/out.txt          # Consensus-verified write
probos> search for *.py in /home/user        # Recursive file search
probos> what just happened?                  # Introspection
probos> why did you use file_reader?         # Self-explanation
probos> how healthy is the system?           # System health assessment

probos> /status                              # Pool health, mesh, cognitive state
probos> /agents                              # Agent table with states and trust
probos> /weights                             # Hebbian connection weights
probos> /memory                              # Working memory snapshot
probos> /attention                           # Task priority queue + focus history
probos> /dream now                           # Force a dream consolidation cycle
probos> /cache                               # Workflow cache contents
probos> /explain                             # Explain last execution
probos> /history                             # Recent episodic memory entries
probos> /recall <query>                      # Search episodic memory
probos> /model                               # LLM client info
probos> /tier fast|standard|deep             # Switch LLM tier
probos> /debug                               # Toggle debug mode
probos> /help                                # All commands

How It Works

When you type natural language:

Working memory assembles system state (agent health, trust scores, Hebbian weights, capabilities) within a token budget
Episodic recall finds similar past interactions for context (top-3 by keyword-overlap cosine similarity)
Workflow cache checks for previously successful DAG patterns (exact match, then fuzzy with pre-warm intents)
LLM decomposer converts text into a TaskDAG — a directed acyclic graph of typed intents with dependencies
Attention manager scores tasks: urgency × relevance × deadline_factor × dependency_bonus
DAG executor runs independent intents in parallel, respects dependency ordering
Consensus gates destructive operations through multi-agent voting + red team verification
Reflection (optional) sends execution results back to the LLM for synthesis
Hebbian router strengthens successful agent-intent pairings, weakens failures
Episodic memory stores the interaction for future recall
Workflow cache stores successful patterns to bypass the LLM on repeat queries
Dreaming engine consolidates learning during idle periods — replays episodes, prunes weak connections, adjusts trust scores, pre-warms likely upcoming intents

Configuration

All tuning lives in config/system.yaml:

Section	Controls
pools	Target sizes (2-7), spawn cooldown, health check intervals
mesh	Gossip rate, Hebbian decay/reward rates, signal TTL
consensus	Min votes, approval threshold, trust priors (Beta distribution), decay rate
cognitive	LLM endpoint, model tiers (fast/standard/deep), token budget, concurrency limit, attention parameters
memory	Max episodes, relevance threshold
dreaming	Idle threshold, replay count, strengthening/weakening factors, prune threshold

Project Structure

src/probos/
├── __init__.py              # Package root
├── __main__.py              # Entry point (probos CLI)
├── api.py                   # FastAPI server + WebSocket events
├── config.py                # Pydantic config models
├── runtime.py               # Top-level orchestrator (~2500 lines)
├── types.py                 # Core types (30+ dataclasses)
├── agents/                  # Tool agents (deterministic)
│   ├── file_reader.py       #   read_file, stat_file
│   ├── file_writer.py       #   write_file (consensus-gated)
│   ├── directory_list.py    #   list_directory
│   ├── file_search.py       #   search_files
│   ├── shell_command.py     #   run_command (consensus-gated)
│   ├── http_fetch.py        #   http_fetch (rate-limited, consensus-gated)
│   ├── introspect.py        #   explain_last, agent_info, system_health, why
│   ├── system_qa.py         #   Smoke tests for designed agents
│   ├── red_team.py          #   Independent verification
│   ├── corrupted.py         #   Test agent (deliberately wrong)
│   └── utility/             #   10 CognitiveAgent types ("useful on Day 1")
│       ├── web_agents.py    #     WebSearch, PageReader, Weather, News
│       ├── language_agents.py #   Translator, Summarizer
│       ├── productivity_agents.py # Calculator, Todo, NoteTaker
│       └── organizer_agents.py #  Scheduler
├── cognitive/               # LLM pipeline + self-modification
│   ├── decomposer.py        #   NL → TaskDAG + DAG executor
│   ├── prompt_builder.py    #   Dynamic system prompt assembly
│   ├── llm_client.py        #   OpenAI-compatible + mock client
│   ├── cognitive_agent.py   #   Instructions-first LLM agent base
│   ├── working_memory.py    #   Bounded context assembly
│   ├── episodic.py          #   ChromaDB semantic long-term memory
│   ├── attention.py         #   Priority scoring + focus tracking
│   ├── dreaming.py          #   Offline consolidation + pre-warm
│   ├── workflow_cache.py    #   LRU pattern cache
│   ├── agent_designer.py    #   LLM designs new agents from capability gaps
│   ├── self_mod.py          #   Self-modification pipeline orchestrator
│   ├── code_validator.py    #   Static analysis for generated code
│   ├── sandbox.py           #   Isolated execution for untrusted agents
│   ├── skill_designer.py    #   Skill template generation
│   ├── skill_validator.py   #   Skill safety validation
│   ├── behavioral_monitor.py #  Runtime behavior tracking post-deploy
│   ├── feedback.py          #   Human feedback → trust/Hebbian/episodic
│   ├── correction_detector.py # Distinguishes corrections from new requests
│   ├── agent_patcher.py     #   Hot-patches designed agent code
│   ├── strategy.py          #   StrategyRecommender (skill attachment)
│   ├── dependency_resolver.py # Auto-install agent dependencies (uv)
│   ├── emergent_detector.py #   5 algorithms for emergent behavior
│   ├── embeddings.py        #   Embedding utilities
│   └── research.py          #   Web research phase for agent design
├── consensus/               # Multi-agent agreement
│   ├── quorum.py            #   Confidence-weighted voting
│   ├── trust.py             #   Bayesian Beta(α,β) reputation
│   ├── shapley.py           #   Shapley value attribution
│   └── escalation.py        #   3-tier failure cascade
├── experience/              # User interface
│   ├── shell.py             #   Async REPL (20+ commands)
│   ├── renderer.py          #   Real-time DAG execution display
│   ├── panels.py            #   Rich panel/table rendering
│   ├── knowledge_panel.py   #   Knowledge store panels
│   └── qa_panel.py          #   QA result panels
├── federation/              # Multi-node mesh
│   ├── bridge.py            #   ZeroMQ node bridge
│   ├── router.py            #   Intent forwarding + loop prevention
│   └── transport.py         #   Transport abstraction
├── knowledge/               # Persistent storage
│   ├── store.py             #   Git-backed artifact persistence
│   └── semantic.py          #   SemanticKnowledgeLayer (5 ChromaDB collections)
├── mesh/                    # Agent coordination
│   ├── intent.py            #   Pub/sub bus with fan-out
│   ├── routing.py           #   Hebbian learning (SQLite)
│   ├── capability.py        #   Fuzzy matching registry
│   ├── gossip.py            #   SWIM-style state exchange
│   └── signal.py            #   TTL-enforced signals
└── substrate/               # Agent lifecycle
    ├── agent.py             #   BaseAgent ABC (perceive/decide/act/report)
    ├── registry.py          #   Async-safe agent index
    ├── spawner.py           #   Template-based factory
    ├── pool.py              #   Resource pools + health checks
    ├── scaler.py            #   Demand-based pool scaling
    ├── heartbeat.py         #   Periodic pulse loop
    ├── event_log.py         #   Append-only SQLite audit log
    ├── identity.py          #   Persistent agent identity
    └── skill_agent.py       #   SkillBasedAgent (dynamic skill dispatch)

ui/src/                      # HXI — Human Experience Interface (React + Three.js)
├── canvas/                  #   WebGL cognitive mesh visualization
├── components/              #   CognitiveCanvas, AgentTooltip, overlays
├── audio/                   #   TTS, speech input, sound engine
├── store/                   #   Zustand state management
└── hooks/                   #   WebSocket connection to runtime

Tests

1605 tests covering every layer (1590 Python + 15 Vitest):

uv run pytest tests/ -v          # Python tests
cd ui && npx vitest run           # UI tests

Key Concepts

Self-selection. Agents decide whether to handle an intent via perceive(). The system doesn't assign work — agents volunteer based on capability matching.

Confidence tracking. Each agent maintains a Bayesian confidence score. Success moves it toward 1.0, failure toward 0.0. Agents degraded below 0.2 are recycled and replaced.

Hebbian learning. "Neurons that fire together wire together." When an agent successfully handles an intent, the connection weight between that intent and agent strengthens. Over time, the system learns optimal routing.

Consensus pipeline. Destructive operations follow: broadcast → quorum evaluation → red team verification → Shapley attribution → trust update → Hebbian learning. A single corrupted agent cannot cause damage.

Self-modification. When ProbOS encounters a capability gap (no agent can handle a request), it designs a new agent: LLM generates code → CodeValidator static analysis → SandboxRunner isolation test → probationary trust → SystemQA smoke tests → BehavioralMonitor tracks post-deployment. Agents can also be designed collaboratively via /design.

Correction feedback loop. Human corrections are the richest learning signal. CorrectionDetector identifies when the user is correcting a previous result → AgentPatcher modifies the responsible agent → hot-reload → auto-retry → trust/Hebbian/episodic updates.

Dreaming. During idle periods, the system replays recent episodes to strengthen successful pathways, weaken failed ones, prune dead connections, adjust trust scores, and pre-warm predictions for likely upcoming requests.

Dynamic intent discovery. Each agent class declares structured IntentDescriptor metadata. The decomposer's system prompt is assembled at runtime from whatever agents are registered. New agent types self-integrate without any configuration changes.

Federation. Multiple ProbOS nodes form a Nooplex — a cognitive mesh of meshes. Each node is sovereign (its own agents, trust, memory). Nodes exchange capabilities via ZeroMQ gossip protocol and can forward intents across the federation.

HXI (Human Experience Interface). A WebGL visualization of the cognitive mesh rendered in Three.js. Agent nodes glow with trust-mapped colors, pulse with activity, and connect with Hebbian-weighted edges. Real-time WebSocket streaming from the runtime.

Development Status

Phase 27 complete — 1605 tests passing.

Phase	Description	Status
1	Substrate + Mesh (agent lifecycle, intent bus, Hebbian routing, gossip)	Done
2	Consensus (quorum voting, trust network, red team verification)	Done
3a	Cognitive core (LLM decomposer, working memory, DAG execution)	Done
3b	Episodic memory, attention, dreaming, workflow cache	Done
4	Experience layer (shell, renderer, panels)	Done
5	Expansion agents (search, directory, shell, HTTP, introspect)	Done
6	Introspection + dynamic intent discovery (self-assembling prompts)	Done
7	Escalation cascades + error recovery	Done
8	Adaptive pool scaling (demand-based sizing)	Done
9	Federation (ZeroMQ transport, router, gossip, multi-node)	Done
10	Self-modification (agent designer, code validator, sandbox, behavioral monitor)	Done
11	Skills + transparency + web research	Done
12	Per-tier LLM endpoints (fast/standard/deep)	Done
13	Workflow caching (LRU, exact + fuzzy matching, pre-warm)	Done
14	Persistent knowledge (Git-backed store, warm boot, rollback)	Done
15	CognitiveAgent base class + domain-aware skill attachment	Done
16	DAG proposal mode (/plan, /approve, /reject)	Done
17	Dependency resolution (auto-install agent imports)	Done
18	Feedback-to-learning loop + correction detection + agent patching	Done
19	Shapley value trust attribution + trust-weighted matching	Done
20	Emergent behavior detection (5 algorithms)	Done
21	Semantic Knowledge Layer (ChromaDB, 5 collections)	Done
22	Bundled agent suite + distribution (`pip install`, `probos serve`)	Done
23	HXI MVP (WebSocket events, React/Three.js cognitive canvas)	Done
27	Codebase knowledge graph + impact analysis	Done

Roadmap

Phase	Title	Goal
24	Channel Integration	Discord, Slack, Telegram adapters + external tool connectors
25	Persistent Tasks	Long-running autonomous tasks with checkpointing, browser automation
26	Inter-Agent Deliberation	Structured multi-turn agent debates, agent-to-agent messaging
28	Meta-Learning	Workspace Ontology, dream cycle abstractions, session context, goal planning
29	Federation + Emergence	Knowledge federation, trust transitivity, TC_N measurement

Dependencies

Package	Purpose
pydantic >=2.0	Configuration validation
pyyaml >=6.0	YAML config loading
aiosqlite >=0.19	Async SQLite (event log, Hebbian weights, trust, episodic memory)
rich >=13.0	Terminal UI (panels, tables, Live display, spinners)
httpx >=0.27	HTTP client (LLM API, HTTP fetch agent)
pyzmq >=27.1	ZeroMQ transport (federation)
chromadb >=1.0	Vector database (semantic memory, knowledge layer)
fastapi >=0.115	API server + WebSocket events (HXI backend)
uvicorn >=0.34	ASGI server

Dev: pytest >=8.0, pytest-asyncio >=0.23, vitest (UI)

License

Apache License 2.0. See LICENSE.

Disclaimer

This is a personal research project. It is not affiliated with, endorsed by, or supported by my employer.

Name		Name	Last commit message	Last commit date
Latest commit History 1,036 Commits
.github		.github
Vibes		Vibes
config		config
docs		docs
prompts		prompts
scripts		scripts
src/probos		src/probos
tests		tests
ui		ui
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.tmp_issue160.txt		.tmp_issue160.txt
CONTRIBUTING.md		CONTRIBUTING.md
DECISIONS.md		DECISIONS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PROGRESS.md		PROGRESS.md
README.md		README.md
cross_layer_analysis.py		cross_layer_analysis.py
dashboard.html		dashboard.html
decisions-era-1-genesis.md		decisions-era-1-genesis.md
decisions-era-2-emergence.md		decisions-era-2-emergence.md
decisions-era-3-product.md		decisions-era-3-product.md
decisions-era-4-evolution.md		decisions-era-4-evolution.md
demo.py		demo.py
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
progress-era-1-genesis.md		progress-era-1-genesis.md
progress-era-2-emergence.md		progress-era-2-emergence.md
progress-era-3-product.md		progress-era-3-product.md
progress-era-4-evolution.md		progress-era-4-evolution.md
pyproject.toml		pyproject.toml
tmp_test.txt		tmp_test.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProbOS

What Is This?

Design Philosophy

Architecture

Substrate

Mesh

Consensus

Cognitive

Experience

Agents

Core Agents (always active)

Bundled Cognitive Agents (10 pools, "useful on Day 1")

System Agents (conditional)

Quick Start

LLM Backend

Interactive Shell

How It Works

Configuration

Project Structure

Tests

Key Concepts

Development Status

Roadmap

Dependencies

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ProbOS

What Is This?

Design Philosophy

Architecture

Substrate

Mesh

Consensus

Cognitive

Experience

Agents

Core Agents (always active)

Bundled Cognitive Agents (10 pools, "useful on Day 1")

System Agents (conditional)

Quick Start

LLM Backend

Interactive Shell

How It Works

Configuration

Project Structure

Tests

Key Concepts

Development Status

Roadmap

Dependencies

License

Disclaimer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages