company-os

A generic, forkable AI company operating system. Built on Paperclip (multi-agent orchestration) + gstack (per-agent skill depth) + a 3-layer memory system.

Fork this. Replace the mission. Customize agent context for your domain. Ship.

What This Is

company-os gives you a fully-staffed AI company for any project — 12 specialized agents, 73 skills, coordinated by Paperclip, sharpened by gstack, and learning from every session through a persistent memory layer.

It's not a chatbot setup. It's an operating system for building products with AI.

Paperclip          ← coordinates the fleet (dashboard, inbox, budgets, audit trail)
  └── gstack       ← sharpens each agent (23 core + 34 extended skills)
       └── memory  ← persists learning across sessions (MEMORY.md + learnings.jsonl)

The Stack

Layer	What It Does	Why It Matters
Paperclip	Parallel agent execution, mission control dashboard, human inbox, per-agent budget controls, full audit trail	Agents coordinate without stepping on each other. You see everything.
gstack	73 skills — slash commands that make each Claude Code session operate like a specialized professional	`/ship` doesn't just git push. It merges base, runs tests, reviews diff, bumps VERSION, writes CHANGELOG, creates PR.
Memory	MEMORY.md auto-loads into every session. learnings.jsonl stores operational patterns. Checkpoints save mid-session state.	Agents remember. Each session builds on the last.

The Team

Chief of Staff (claude-opus-4.6)
├── Product Team
│   ├── Product Manager     — 0-to-1, PRDs, jobs-to-be-done, specs
│   └── Research Analyst    — X/web research, competitive intel, synthesis
├── Engineering Team
│   ├── CTO                 — architecture, code quality, cross-team decisions
│   ├── Frontend Engineer   — UI, components, design system enforcement
│   ├── Backend Engineer    — API, data models, business logic
│   ├── Release Engineer    — git, ship, deploy, CHANGELOG, retros
│   └── QA Engineer         — browser testing, visual QA, performance, canary
├── Creative Team
│   ├── Design Consultant   — brand, design system (DESIGN.md), visual direction
│   └── Content Writer      — copy, policy docs, brand voice
├── Security Officer        — CSO audits, guardrails, threat modeling
└── Solo Founder            — customer discovery, PMF, fundraising, hiring, operating cadence

Models: Chief of Staff, CTO, and Solo Founder run on claude-opus-4.6. All other agents run on claude-sonnet-4.6.

Skill Coverage — 73 Skills Across 14 Categories

Every skill is a slash command in Claude Code. Each is wired to the right agent in .paperclip.yaml.

Strategy & Product (8 skills)

Skill	What It Does
`/plan-ceo-review`	CEO/founder plan review — rethinks problem, finds 10-star product, challenges premises
`/office-hours`	YC-style forcing questions: demand reality, desperate specificity, narrowest wedge
`/plan-strategy`	Strategic planning — north star, quarterly priorities, OKRs
`/frame-problem`	Structured problem framing before jumping to solutions
`/discover-product`	Jobs-to-be-done, problem validation, early PRD for 0-to-1
`/brainstorming`	Turn ideas into designs through collaborative dialogue + 2-3 approach proposals
`/write-brief`	1-pagers, decision memos, exec summaries, investor updates
`/evaluating`	Multi-dimension artifact evaluation with concrete upgrade paths to 10/10

Research & Intelligence (5 skills)

Skill	What It Does
`/x-research`	Agentic X/Twitter research — decompose questions, follow threads, synthesize
`/last30days-nux`	Multi-platform social research across Reddit, X, YouTube, TikTok, HN, Polymarket
`/researching-web`	Web research — docs, blogs, news, technical solutions
`/researching-codebase`	Codebase investigation — patterns, implementations, file discovery
`/synthesize-research`	Turn raw research into structured synthesis with key findings

Planning & Documentation (3 skills)

Skill	What It Does
`/writing-plans`	Detailed implementation plans with phases, tasks, checks — living document for multi-session work
`/writing-documentation`	Markdown docs in `.thoughts/` with frontmatter and naming conventions
`/write-policy`	Policy docs, SOPs, team handbooks, compliance documentation

Engineering & Architecture (9 skills)

Skill	What It Does
`/plan-eng-review`	Eng manager plan review — architecture, data flow, edge cases, test coverage
`/implementing`	Execute implementation plans with mandatory phase checkpoints
`/investigate`	Systematic debugging — 4 phases: investigate, analyze, hypothesize, implement
`/codex`	OpenAI Codex second opinion — code review, adversarial challenge, consult
`/health`	Code quality dashboard — type checker, linter, test runner, dead code, trend tracking
`/review`	Pre-landing code review — SQL safety, LLM trust boundary, structural issues
`/ralph-wiggum`	Researcher → Coder → Verifier → Reviewer agent sequence
`/devex-review`	Live DX audit — tests docs, times TTHW, screenshots error messages
`/plan-devex-review`	DX review for plans before implementation

Design & Frontend (7 skills)

Skill	What It Does
`/design-consultation`	Complete design system — typography, color, spacing, motion → `DESIGN.md`
`/frontend-design`	Distinctive, production-grade frontend — avoids AI aesthetics
`/design-html`	Production-quality Pretext-native HTML/CSS from approved designs
`/design-review`	Designer's eye QA — spacing, hierarchy, AI slop patterns → fixes in source
`/design-shotgun`	Generate multiple AI design variants, comparison board, iterate with feedback
`/plan-design-review`	Designer's eye plan review — rates each dimension 0-10 before building
`/brand-voice-enforcement`	Apply brand guidelines to any content — tone, voice, style enforcement

Quality & Testing (7 skills)

Skill	What It Does
`/qa`	Systematic browser QA + iterative fixes — test → fix → verify loop
`/qa-only`	QA report only — health score, screenshots, repro steps, no fixes
`/benchmark`	Performance regression detection — baselines, before/after comparison
`/canary`	Post-deploy canary monitoring — console errors, regressions, anomalies
`/browse`	Fast headless browser — navigate, interact, screenshot, diff, assert
`/setup-browser-cookies`	Import real browser cookies into headless session for auth testing
`/open-gstack-browser`	Launch headed Chromium with AI sidebar extension

Shipping & Deployment (6 skills)

Skill	What It Does
`/ship`	Full ship workflow — merge base, tests, diff review, VERSION bump, CHANGELOG, PR
`/land-and-deploy`	Merge PR, wait for CI, verify prod health via canary
`/github`	Git/GitHub — draft PRs, squash merging, Graphite-style dev
`/document-release`	Post-ship docs update — README, ARCHITECTURE, CHANGELOG voice
`/retro`	Weekly engineering retro — commit history, patterns, per-person breakdowns
`/autoplan`	Auto-runs CEO, design, eng, and DX reviews sequentially

Safety & Guardrails (5 skills)

Skill	What It Does
`/careful`	Warns before `rm -rf`, `DROP TABLE`, force-push, `reset --hard`
`/guard`	Full safety — combines `/careful` + `/freeze`
`/freeze`	Restrict file edits to a specific directory for the session
`/unfreeze`	Clear the freeze boundary
`/cso`	Infrastructure security audit — secrets, supply chain, CI/CD, LLM security, OWASP

State & Memory (3 skills)

Skill	What It Does
`/checkpoint`	Save working state — git state, decisions, remaining work → resume across resets
`/learn`	Manage project learnings — review, search, prune, export
`/last30days-nux`	Research briefings persisted to `~/Documents/Last30Days/`

Third-Party Plugins (2 skills)

Skill	Source	What It Does
`ui-ux-pro-max`	NextLevelBuilder	67 UI styles, 161 palettes, 57 font pairings, 99 UX guidelines, 25 chart types
`openclaw`	thedotmack	OpenClaw agent framework integration

Meta (2 skills)

Skill	What It Does
`/gstack-upgrade`	Upgrade gstack to latest version
`/setup-deploy`	Configure deployment settings for `/land-and-deploy`

Solo Founder — Tier 1: Critical Month 1-3 (5 skills)

Skill	What It Does
`/user-interview`	Screener → interview script → synthesis → pattern aggregation using JTBD format
`/pmf-pulse`	Sean Ellis survey + retention curve + churn pattern analysis → PMF score report
`/pitch-deck`	Narrative-first pitch deck: 10-slide structure, each slide answers one investor question
`/investor-update`	250-word monthly update template — metrics table, highlights, lowlights, specific ask
`/runway`	Burn rate + 3 runway scenarios + fundraising trigger date + default alive check

Solo Founder — Tier 2: Growth Month 3-9 (5 skills)

Skill	What It Does
`/launch-strategy`	Full launch plan: channel selection, pre-launch checklist, day-of playbook by channel
`/sales-playbook`	ICP definition, cold outreach templates, demo flow, objection handling, pipeline tracker
`/pricing-strategy`	Competitor research, value-based pricing calc, 3-tier page copy, validation experiment
`/growth-experiment`	Hypothesis → ICE prioritization → run → results → patterns across AARRR funnel
`/legal-basics`	Incorporation checklist, IP assignment, contractor agreements, SAFE vs priced round

Solo Founder — Tier 3: Scale Month 6-18 (6 skills)

Skill	What It Does
`/cap-table`	Founder equity, round dilution modeling, SAFE conversion math, option pool planning
`/first-hire`	Role definition, sourcing strategy, interview rubric, equity offer, 30-day onboarding
`/content-machine`	Pillar → derivatives → social → SEO cluster → content calendar → build-in-public templates
`/demo-script`	Hook → aha moment → proof → CTA for 5-min demos, sales calls, or launch videos
`/decision-journal`	Log decisions with alternatives + reasoning → monthly review → quarterly pattern analysis
`/founder-rhythm`	Weekly (Mon plan / Fri retro) + monthly (metrics + investor update) + quarterly direction check

Skill-to-Agent Matrix

Skill	Primary Agent	Secondary
`plan-ceo-review`, `office-hours`, `plan-strategy`, `frame-problem`, `write-brief`, `evaluating`, `autoplan`	chief-of-staff	—
`discover-product`, `brainstorming`, `writing-plans`, `synthesize-research`	product-manager	chief-of-staff
`x-research`, `last30days-nux`, `researching-web`, `researching-codebase`, `writing-documentation`	research-analyst	—
`plan-eng-review`, `codex`, `health`, `investigate`, `review`, `devex-review`	cto	—
`frontend-design`, `design-html`, `design-review`, `design-shotgun`, `qa`, `benchmark`	frontend-engineer	—
`implementing`, `careful`, `writing-plans`, `researching-codebase`	backend-engineer	—
`ship`, `land-and-deploy`, `github`, `document-release`, `retro`, `checkpoint`	release-engineer	—
`qa`, `qa-only`, `benchmark`, `canary`, `browse`, `setup-browser-cookies`	qa-engineer	—
`design-consultation`, `design-shotgun`, `brand-voice-enforcement`, `plan-design-review`	design-consultant	—
`cso`, `guard`, `freeze`, `unfreeze`	security-officer	cto
`brand-voice-enforcement`, `write-policy`, `write-brief`, `writing-documentation`	content-writer	—

Coordination Protocols

Feature Request → Ship

User
 └── Chief of Staff  /office-hours, /plan-ceo-review
      └── Product Manager  /discover-product, /writing-plans
           └── CTO  /plan-eng-review
                ├── Frontend Engineer  [parallel]
                └── Backend Engineer  [parallel]
                     └── QA Engineer  /qa
                          └── Release Engineer  /ship → /land-and-deploy
                               └── Content Writer  /document-release

Bug Fix

QA  /qa-only (report)
 └── Frontend or Backend  /investigate + fix
      └── QA  /qa (verify)
           └── Release  /ship

Research → Decision

Research Analyst  /x-research or /last30days-nux
 └── /synthesize-research
      └── Chief of Staff  /write-brief
           └── Product Manager  (update spec)

Security Incident

Any agent flags issue
 └── Security Officer  /cso or /investigate
      └── Report with severity + remediation plan
           └── Chief of Staff  approves priority
                └── Backend/Frontend  implements fix
                     └── Security Officer  verifies

Memory Layer

Three tiers. Each builds on the last.

Tier 1 — MEMORY.md (Session Context)

Auto-loaded into every Claude Code session via system prompt injection.

~/.claude/projects/[project-slug]/memory/MEMORY.md

Max 200 lines (lines beyond are truncated)
Store: stable patterns, user preferences, key file paths, architectural decisions
Updated by agents during sessions when stable patterns are confirmed

Tier 2 — learnings.jsonl (Operational Learning)

Per-project log of what gstack agents have discovered.

~/.gstack/projects/[project-slug]/learnings.jsonl

Written by agents via gstack-learnings-log
Searched at session start via gstack-learnings-search --limit 3
Types: pattern, pitfall, preference, architecture, tool, operational

Tier 3 — Checkpoints (Session State)

Mid-session state saves for long-running work.

/checkpoint  # save current state
/checkpoint resume  # pick up where you left off

Captures: git state, decisions made, remaining tasks, context summary.

Tier 4 — claude-mem (Compressed Tool Output History)

Captures tool outputs from every session, compresses them to ~50-token semantic observations, and injects the most relevant ~500 tokens into future sessions via FTS5 semantic search.

# Start the worker once (add to ~/.zshrc for auto-start)
node ~/.claude/plugins/cache/thedotmack/claude-mem/10.0.6/scripts/worker-cli.js start

# Wire into agent heartbeat (Step 2 of each agent's checklist)
node ~/.claude/plugins/cache/thedotmack/claude-mem/10.0.6/scripts/context-generator.cjs \
  --project-path "$(pwd)" --query "[agent-name] [current-task]"

~10x token efficiency vs repeating raw tool output in every heartbeat. See memory/claude-mem-setup.md.

Tier 5 — Auto Dream (Nightly MEMORY.md Consolidation)

Nightly script that promotes high-confidence learnings (≥8/10) from learnings.jsonl into MEMORY.md, keeping the file under the 200-line limit.

./memory/auto-dream.sh --dry-run   # preview
./memory/auto-dream.sh             # run

# Cron: nightly at 2am
(crontab -l 2>/dev/null; echo "0 2 * * * $(pwd)/memory/auto-dream.sh") | crontab -

Token Budget Summary

Layer	Cost	Frequency
MEMORY.md	~1,500 tokens	Every session
learnings.jsonl (top 3)	~300 tokens	Every gstack skill
claude-mem context	~500 tokens	Every heartbeat
Checkpoints	2K-5K tokens	Explicit `/checkpoint`

Total heartbeat overhead: ~2,300 tokens — ~77% less than unoptimized Paperclip.

See memory/README.md for full documentation.

Human-in-the-Loop Gates

Agents pause and wait for human approval before:

Action	Agent	Gate
Create + merge PR	release-engineer	Approval in Paperclip inbox
Deploy to production	release-engineer	Explicit confirm
DB migration on prod	backend-engineer	`/careful` prompt
Security remediation	security-officer	Severity ≥ High
Major scope change	chief-of-staff	User confirmation
Publish marketing copy	content-writer	Content review

Configure additional gates in .paperclip.yaml under agents.[name].approval_required.

Getting Started

Prerequisites

Claude Code installed
gstack installed (npm install -g gstack or see gstack docs)
Paperclip installed (npm install -g paperclipai)

Quickstart

# 1. Clone this repo
gh repo clone Ash-code183/company-os my-company
cd my-company

# 2. Set your mission
# Edit .paperclip.yaml — change company.name and company.mission
# Edit COMPANY.md — describe what you're building

# 3. Customize agents for your domain (optional but recommended)
# Edit agents/*.md — add domain context, key files, project conventions

# 4. Create your design system (if building a UI)
# Run /design-consultation in Claude Code — generates DESIGN.md

# 5. Set your P0 test flow (optional)
# Edit agents/qa-engineer.md — add the critical user flows to test

# 6. Configure deployment (optional)
# Run /setup-deploy in Claude Code

# 7. Start your first agent session
npx paperclipai run --agent chief-of-staff --task "Review our product strategy"

Repository Structure

company-os/
├── .paperclip.yaml          ← company config: agents, models, skills, memory, teams
├── CLAUDE.md                ← gstack skill routing (40+ routes for Claude Code)
├── COMPANY.md               ← product overview, stack explanation, getting started
├── AGENTS.md                ← full agent roster, skill matrix, coordination protocols
├── SKILLS.md                ← complete 73-skill catalog with descriptions
├── CHANGELOG.md
│
├── agents/
│   ├── chief-of-staff.md    ← strategy, 10-star thinking, mission context
│   ├── product-manager.md   ← discovery, PRDs, jobs-to-be-done
│   ├── research-analyst.md  ← research protocols, synthesis patterns
│   ├── cto.md               ← architecture standards, quality gates
│   ├── frontend-engineer.md ← design system enforcement, DESIGN.md tokens
│   ├── backend-engineer.md  ← data model invariants, safety gates
│   ├── release-engineer.md  ← branch strategy, ship checklist, CHANGELOG voice
│   ├── qa-engineer.md       ← P0/P1/P2 test flows, visual QA checklist
│   ├── design-consultant.md ← brand principles, design system ownership
│   ├── security-officer.md  ← threat model, guardrail triggers
│   └── content-writer.md    ← brand voice rules, tone by context
│
├── teams/
│   ├── product/README.md    ← product team workflow + weekly cadence
│   ├── engineering/README.md← engineering workflow, standards, branch conventions
│   └── creative/README.md   ← design + content workflow, DESIGN.md as source of truth
│
├── skills/
│   ├── user-interview/      ← JTBD customer discovery workflow
│   ├── pmf-pulse/           ← PMF measurement (Sean Ellis + retention)
│   ├── pitch-deck/          ← 10-slide narrative pitch deck builder
│   ├── investor-update/     ← 250-word monthly update template
│   ├── runway/              ← burn rate + 3 scenarios + fundraising trigger
│   ├── launch-strategy/     ← channel playbooks (PH, HN, X, email)
│   ├── sales-playbook/      ← ICP → outreach → demo → close
│   ├── pricing-strategy/    ← value-based pricing + 3-tier page copy
│   ├── growth-experiment/   ← ICE scoring + AARRR experiment log
│   ├── legal-basics/        ← incorporation checklist + SAFE guide
│   ├── cap-table/           ← dilution modeling + SAFE conversion
│   ├── first-hire/          ← role rubric + equity offer + onboarding
│   ├── content-machine/     ← pillar → social → SEO + build-in-public
│   ├── demo-script/         ← hook → aha moment → proof → CTA
│   ├── decision-journal/    ← decision log + monthly review + bias audit
│   └── founder-rhythm/      ← weekly/monthly/quarterly operating cadence
│
└── memory/
    ├── README.md            ← 5-layer memory system documentation
    ├── heartbeat.md         ← heartbeat checklist template for all 12 agents
    ├── claude-mem-setup.md  ← claude-mem Paperclip integration guide
    └── auto-dream.sh        ← nightly MEMORY.md consolidation script

Forking This Template

Option 1 — Fork on GitHub

gh repo fork Ash-code183/company-os my-company
cd my-company

Option 2 — Clone Directly

gh repo clone Ash-code183/company-os my-company
cd my-company
git remote set-url origin https://github.com/YOUR-ORG/my-company.git

Customization Checklist

Replace mission — company.name and company.mission in .paperclip.yaml, and COMPANY.md
Add domain context — each agents/*.md has a section for project-specific context. Fill it in.
Create DESIGN.md — run /design-consultation in Claude Code. Required before any UI work.
Set P0 test flows — edit agents/qa-engineer.md with the critical user paths for your product
Configure deploy — run /setup-deploy to configure /land-and-deploy for your platform
Initialize memory — create ~/.claude/projects/[your-slug]/memory/MEMORY.md with key project facts

Why Paperclip + gstack Together

They solve different problems at different layers.

gstack makes a single Claude Code session operate like a senior professional. /investigate doesn't just look at the error — it runs four phases: locate, analyze, hypothesize, fix with root cause verified. /ship doesn't just git push — it checks tests, reviews the diff, bumps version, writes CHANGELOG, creates PR.

Paperclip coordinates multiple Claude Code sessions in parallel. It's why you can run Frontend Engineer and Backend Engineer simultaneously on the same feature, with a shared inbox for approvals, a dashboard showing every agent's state, and budget controls so nothing goes off the rails.

The evidence they're complementary: the official paperclipai/companies registry has a gstack/ directory. They're designed to work together.

Credits

Paperclip — multi-agent orchestration for zero-human companies
gstack — Garry Tan's Claude Code skill framework
last30days — multi-platform social research
ui-ux-pro-max — design intelligence plugin by NextLevelBuilder

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
credentials		credentials
memory		memory
skills		skills
teams		teams
.env.example		.env.example
.gitignore		.gitignore
.paperclip.yaml		.paperclip.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COMPANY.md		COMPANY.md
README.md		README.md
SKILLS.md		SKILLS.md

Folders and files

Latest commit

History

Repository files navigation

company-os

What This Is

The Stack

The Team

Skill Coverage — 73 Skills Across 14 Categories

Strategy & Product (8 skills)

Research & Intelligence (5 skills)

Planning & Documentation (3 skills)

Engineering & Architecture (9 skills)

Design & Frontend (7 skills)

Quality & Testing (7 skills)

Shipping & Deployment (6 skills)

Safety & Guardrails (5 skills)

State & Memory (3 skills)

Third-Party Plugins (2 skills)

Meta (2 skills)

Solo Founder — Tier 1: Critical Month 1-3 (5 skills)

Solo Founder — Tier 2: Growth Month 3-9 (5 skills)

Solo Founder — Tier 3: Scale Month 6-18 (6 skills)

Skill-to-Agent Matrix

Coordination Protocols

Feature Request → Ship

Bug Fix

Research → Decision

Security Incident

Memory Layer

Tier 1 — MEMORY.md (Session Context)

Tier 2 — learnings.jsonl (Operational Learning)

Tier 3 — Checkpoints (Session State)

Tier 4 — claude-mem (Compressed Tool Output History)

Tier 5 — Auto Dream (Nightly MEMORY.md Consolidation)

Token Budget Summary

Human-in-the-Loop Gates

Getting Started

Prerequisites

Quickstart

Repository Structure

Forking This Template

Option 1 — Fork on GitHub

Option 2 — Clone Directly

Customization Checklist

Why Paperclip + gstack Together

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages