Olly gives every person on a team their own AI agent. The agents know the team, communicate with each other, share artifacts and learnings, and can spin up internal packs of specialists to get complex work done fast.
A user sends a message. Their agent picks it up, checks what apps are connected, pulls in relevant team context, breaks the work into parallel subtasks if needed, coordinates with teammate agents, and delivers. The human stays in the loop at the boundaries — not in the middle of every tool call.
Most AI tools are a single model answering a single person. Olly is a layer across a whole team:
- Every team member has a persistent agent with its own memory, identity, and skill set
- Agents message each other directly — if your agent finishes a task that blocks a teammate's agent, it sends the result over, unprompted
- Team knowledge is shared — facts, constraints, and lessons one agent learns are available to every agent on the team
- Artifacts are first-class — slides, docs, sheets, and PDFs are created, versioned, and shared through a team-wide store; agents reference them by ID across conversations
- Packs of subagents run in parallel — complex tasks fan out to specialist workers, coordinate in real time, and synthesize back to a single answer
Each human user is backed by one persistent AI agent (the Alpha). When team members are on the same team, their agents can see each other and communicate.
Human Team
Alice ──► Alpha (Alice's agent) ─────────────────────┐
Bob ──► Alpha (Bob's agent) ──── A2A messaging ───┤── shared team context
Carol ──► Alpha (Carol's agent) ─────────────────────┘ shared artifacts
Agents message teammates when work crosses boundaries. If Alice's agent finishes a draft that Bob's agent needs, it sends it without waiting for Alice to tell it to. If Bob's agent is mid-task, the message lands in its inbox and gets picked up at the next safe boundary — never interrupting in-flight work.
Alice's agent
│
├── get_team_roster() → who's on the team, what they do
└── message_teammate_agent( → deliver directly to Bob's agent
name="Bob", stored in agent_messages DB;
message="...", if Bob is idle → wakes his session;
artifact_id="..." if busy → drains at next turn boundary
)
Delivery is push-not-poll: a message to an idle teammate wakes their session immediately. A message to a busy teammate folds into their context when they finish their current step — no polling, no missed messages.
When a task has independent sub-parts, an agent spawns a pack — a set of specialist members that run in parallel and coordinate in real time.
Alpha
│
└── delegate(tasks=[...])
│
├── Researcher ─────────────────────────────┐
├── Analyst ──── send_to_pack / ──────────┤── shared scratchpad
└── Writer ask_alpha ──────────┘
│
runs in WAVES (dependency-ordered);
a member can push discoveries to a still-running teammate mid-task
Wave execution: members with no dependencies run together in wave 1; members that depend on wave 1 results run in wave 2 with those results injected as context. Critical path is optimized, not total step count.
Live coordination inside a pack: while members are running, a supervisor drains the intra-pack inbox and routes messages in real time. A researcher finding a rate limit can warn the writer before the writer hits it — not after everyone finishes.
Pack comm tools (available to every member):
| Tool | What it does |
|---|---|
send_to_pack(to, message) |
Forward a discovery to a specific teammate while they're still working |
ask_alpha(question) |
Escalate a blocker to the team lead; blocks until answered |
ask_user(question) |
Surface a question directly to the human |
scratchpad_write/read(key) |
Shared in-memory KV store for the pack run |
Members can also spawn their own subpacks via delegate — depth capped at 2 (Alpha → Member → Sub-member).
| Layer | Scope | What goes here |
|---|---|---|
Personal memory (memory_read/write) |
Per-agent | User preferences, ongoing projects, lessons from failures |
Team context (team_context_read/write) |
Whole team | Company constraints, shared deadlines, client norms, collective lessons |
Skills (.agents/skills/) |
Per-agent | Reusable procedures the agent learns; loaded on demand |
Artifacts (artifact_write/read) |
Whole team | Slides, docs, sheets, PDFs; versioned and reference-able by ID |
Personal memory has three blocks:
persona— voice and communication style; updated when the user corrects how the agent speaksuser— durable facts: preferences, projects, people, deadlineslessons— rules extracted from failures; format: "Never do X because Y." Injected every session so mistakes don't repeat
Team context is visible to every agent on the team. The rule: only write here if the fact genuinely helps all team agents — company-wide constraints, shared deadlines, collective lessons. Personal or user-specific info stays in personal memory.
Agents discover what apps are connected at runtime and use them directly — no web scraping for things an app can do natively.
composio_list_apps() → ['reddit', 'gmail', 'googlesheets', 'linkedin', ...]
composio_find_actions() → find the right action for a given task
composio_execute() → run it with the user's credentials
Progressive disclosure: only three meta-tools are in the model's context (cheap). Thousands of app-specific actions are fetched on demand — token cost stays flat regardless of how many apps are connected.
Agents produce structured deliverables that live in a shared team store:
| Type | Format | What it becomes |
|---|---|---|
slide |
TSX | Slide deck rendered in the UI |
doc |
Markdown | Document in the doc editor |
sheet |
CSV | Spreadsheet in Univer |
pdf |
pdfme JSON | PDF viewable and downloadable |
An agent writes a source file, bumps a version, and emits an event — the UI panel updates. A teammate agent can reference the same artifact by ID. Humans can edit in the UI and the source stays in sync.
Skills are reusable agent procedures installed as .agents/skills/<name>/SKILL.md. Only name + description appear in context on every turn (cheap). The full body is loaded on demand with load_skill.
The agent writes a skill when it invents a reliable multi-step technique worth reusing. npx skills add <pkg> (run from the terminal or by the agent) installs community skills. The agent's capability surface expands over time without growing the baseline context.
Every Alpha's router includes:
| Category | Tools |
|---|---|
| Memory | memory_read, memory_append, memory_replace |
| Team | get_team_roster, message_teammate_agent, team_context_read, team_context_write, check_inbox |
| Artifacts | artifact_write, artifact_read, artifact_list |
| Apps | composio_list_apps, composio_find_actions, composio_execute |
| Browser | browse (BrowserOS + MiMo vision; last resort) |
| Skills | load_skill, write_skill |
| Delegation | delegate (spawns a pack of members) |
| Background | run_background, check_process, await_process |
| Scheduling | schedule_create, schedule_list, schedule_delete |
| CCR | context compression tools |
Pack members get all of the above plus send_to_pack, ask_alpha, ask_user, scratchpad_write/read — and their own delegate for spawning sub-packs.
| Component | Role |
|---|---|
| DeepSeek | LLM for Alpha and pack members |
| E2B | Cloud sandbox — persistent shell, filesystem, desktop |
| BrowserOS MCP | Browser automation; drives real Chromium |
| Redis | Pub/Sub for inter-agent messaging; session control channel (stop/steer); active-session lock |
| FastAPI + SSE | Streaming events to the frontend in real time |
| Supabase | Team members, agent messages, artifacts metadata, memory blocks, session state |
| Composio | App integrations — OAuth, action discovery, execution |
| Langfuse | LLM observability — every call, tool, token, subagent trace |
- Python 3.11+,
uv - Redis (local or Upstash)
- E2B account
- BrowserOS MCP server
- Supabase project
- Composio account + API key
git clone https://github.com/your-org/olly.git
cd olly/backend
uv sync# LLM
DEEPSEEK_API_KEY=...
DEEPSEEK_BASE_URL=...
DEEPSEEK_MODEL=...
# E2B
E2B_API_KEY=...
# BrowserOS
BROWSEROS_MCP_URL=http://localhost:9000
MIMO_API_KEY=...
MIMO_BASE_URL=...
# Redis
REDIS_URL=redis://localhost:6379
# Supabase
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_SECRET_KEY=eyJ...
# Composio
COMPOSIO_API_KEY=...
# Observability
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.comuv run ./main.pyFrontend:
cd frontend_v2
npm install
npm run devEvery turn the Alpha:
- Calls
composio_list_appsfirst — uses connected apps before web_search or browser - Checks its inbox for teammate messages that arrived while it was idle
- Spawns a pack via
delegatefor tasks with independent sub-parts - Runs tool calls and subagent spawns in parallel where possible
- After non-trivial work: updates personal memory with anything worth remembering; if it matters to the whole team, writes to team context too
- Browser is last resort — only if no connected app and no shell path covers the task
Langfuse traces every LLM call, tool invocation, subagent spawn, and token cost. Inspect cost-per-session, individual tool chains, and subagent timelines in the dashboard.
Run inspector (frontend) shows live subagent count, tool calls, failures, notes, and artifacts for each turn — visible while the turn is still running.