diff --git a/.gitignore b/.gitignore index 0f4588eeb..90dbc3316 100644 --- a/.gitignore +++ b/.gitignore @@ -36,7 +36,6 @@ Thumbs.db *.car # AI agents -AGENTS.md MEMORY.md .claude-config/ .agent-trigger diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..1c76bb5fb --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,908 @@ +# PostHog Code Development Guide + +This is the single source of truth for how PostHog Code is built. Architecture rules, conventions, recipes and patterns all live here. If something contradicts this file, this file wins. + +## Architecture rules (read this first) + +Read this section before writing or modifying code. These rules are load-bearing. They are what keeps business logic out of the renderer and lets the app stay portable to other clients. + +**The principle: three layers, each with one job.** + +| Layer | One job | +| ------------------------------ | ------------------------------------------------------------------------------------------------------------ | +| **Main process services** | All business logic and I/O. Orchestration, fetching, polling, parsing, auth, side effects, system telemetry. | +| **Renderer Zustand stores** | Pure UI state. Subscription-fed caches. Thin action wrappers over tRPC. Nothing else. | +| **React components and hooks** | Render the store. Wire user input to store actions or tRPC mutations. Local component state only. | + +**Renderer services are a narrow escape hatch.** Only for renderer-only UI mechanics shared across components (visual queues, drag-and-drop, focus rings). Never for data fetching, never for cross-store coordination on system events, never for multi-step async orchestration. + +### Rules in one screen + +- **R1** Main services own business logic. `@injectable()`, singleton, exposed via a tRPC router with Zod schemas in the service's `schemas.ts`. No imports from `apps/code/src/renderer/*`. +- **R2** Zustand stores are thin: UI state, subscription caches or queues. Actions do at most one `trpcClient` call plus one state update. No module-level `let` promises, no cross-store reach-ins, no business clients, no query-cache surgery, no system-event analytics. +- **R3** Renderer services are a narrow escape hatch. They live in `apps/code/src/renderer/services/`, are `@injectable()`, and never fetch data or coordinate cross-store reactions to system events. +- **R4** Components use `useQuery` and `useMutation`, not imperative `trpcClient` calls. Custom hooks wrap a single query or a store selector. Hooks that orchestrate multiple queries to derive a result become one tRPC procedure. +- **R5** Cross-feature coordination happens in main. Main emits an event; each affected store reacts via its feature's subscription registrar. Stores never reach into other stores. +- **R6** Every tRPC procedure has Zod `input` and (where it returns data) Zod `output`. Types are inferred from schemas, never declared separately. +- **R7** Persistence and platform APIs are main. The renderer persists pure UI prefs via `electronStorage`. Domain data persists in the SQLite DB via a `Repository`. +- **R8** No `container.get(...)` inside service methods. Constructor injection only. A circular dep means the boundary is wrong; split or invert via events. +- **R9** Subscriptions are wired once per feature in `apps/code/src/renderer/features//subscriptions.ts`, started at app boot. Components do not start subscriptions ad hoc. +- **R10** tRPC routers are one-liners. No inline business logic. No reaching past the service to a repository. No router without a backing service. + +### Decision tree + +Apply on every new file or meaningful change. + +1. Network call, file system, git, shell, multi-step async? Main service. +2. Reusable across hosts (Electron, mobile, web, CI)? Domain package (`packages/*`). +3. Wraps a host capability (clipboard, dialog, secure storage)? Platform adapter behind a `@posthog/platform` interface. +4. Purely about how the UI looks right now? Store if shared, `useState` if local to one subtree. +5. Single user event triggers a single mutation? Component with `useMutation`. +6. Non-trivial renderer-only UI mechanic shared across features? Renderer service. +7. None of the above? Probably a main service. + +### Forbidden patterns + +These shapes exist in the codebase today. Do not copy them. Do not extend them. + +- **Module-level dedup state in stores.** `let inFlightAuthSync: Promise | null` and friends. Dedup belongs in the service. +- **Cross-store reach-ins in actions.** `useOtherStore.getState().something()` inside a store action. Main emits an event; each store reacts in its registrar. +- **Business clients held in stores.** `client: createClient(region, projectId)` in a store. Construct in main, store holds a serializable id. +- **Stores owning subscriptions.** `let globalSubscription = trpcClient.X.subscribe(...)` at store module scope. Use a feature subscription registrar. +- **Stores owning timers for domain cleanup.** `window.setTimeout(() => removeClone(id), 3000)`. The host owns the lifecycle and emits a `Removed` event. +- **Custom hooks that orchestrate multiple queries.** Two `useQuery` calls plus a `useMemo` merge. Expose one tRPC procedure that returns the merged shape. +- **Imperative `trpcClient` from components for routine reads.** `useEffect(() => trpcClient.X.query().then(setState))`. Use `useQuery`. +- **tRPC routers bypassing their service to call a repository.** `workspace.ts` does this today; do not extend the pattern. +- **tRPC routers with inline business logic.** Math, time arithmetic, conditional branching inside `.mutation`. Move to a service method. +- **tRPC routers with no backing service.** `os.ts` is 396 lines today with no `OsService`. New routers always have a service. +- **`container.get(X)` inside a service method to dodge a circular dep.** `WorkspaceService` does this with `FileWatcherService`. Split or event-ize instead. +- **Renderer services that fetch domain data or coordinate tRPC.** The 3,796-line `sessions/service/service.ts` is the canonical example. Move it to main. +- **Platform adapters with business logic.** Adapters wrap and translate. Decisions live in services that depend on the adapter via an interface. + +When in doubt, push logic toward main. The renderer is being thinned out, not thickened. + +--- + +## Project structure + +- Monorepo with pnpm workspaces and turbo +- `apps/code` PostHog Code Electron desktop app (React + Vite) +- `apps/cli` CLI app, thin shell over the external `@posthog/cli` npm package +- `apps/mobile` React Native mobile app (Expo) +- `packages/agent` TypeScript agent framework wrapping the Claude Agent SDK +- `packages/git` Git saga operations, gh CLI client, read-write locks +- `packages/enricher` AST-level PostHog flag detection across multiple languages +- `packages/platform` Interface-only declarations for host capabilities (fulfilled by per-target adapters in `apps/code/src/main/platform-adapters/`) +- `packages/electron-trpc` tRPC-over-Electron-IPC bridge +- `packages/shared` Zero-dependency shared utilities (Saga pattern, cloud-prompt encoding) + +## Commands + +- `pnpm install` Install all dependencies +- `pnpm dev` Run both agent (watch) and code app via phrocs +- `pnpm dev:mprocs` Run both agent (watch) and code app via mprocs +- `pnpm dev:agent` Run agent package in watch mode only +- `pnpm dev:code` Run code desktop app only +- `pnpm build` Build all packages (turbo) +- `pnpm typecheck` Type check all packages +- `pnpm lint` Lint and auto-fix with biome +- `pnpm format` Format with biome +- `pnpm test` Run tests across all packages + +### Code app + +- `pnpm --filter code test` Run vitest tests +- `pnpm --filter code typecheck` Type check code app +- `pnpm --filter code package` Package electron app +- `pnpm --filter code make` Make distributable + +### Agent package + +- `pnpm --filter agent build` Build agent with tsup +- `pnpm --filter agent dev` Watch mode build +- `pnpm --filter agent typecheck` Type check agent + +### Shared package + +- `pnpm --filter @posthog/shared build` Build shared with tsup +- `pnpm --filter @posthog/shared dev` Watch mode build +- `pnpm --filter @posthog/shared typecheck` Type check shared + +--- + +## Code style + +- Prefer writing our own solution over adding external packages when the fix is simple +- Keep functions focused with single responsibility +- Biome for linting and formatting (not ESLint or Prettier) +- 2-space indentation, double quotes +- No `console.*` in source. Use the logger instead (logger files exempt) +- Path aliases required in renderer code, no relative imports: `@features/*`, `@components/*`, `@stores/*`, `@hooks/*`, `@utils/*`, `@renderer/*`, `@shared/*`, `@api/*` +- Main process path aliases: `@main/*`, `@api/*`, `@shared/*` +- TypeScript strict mode enabled +- Tailwind CSS classes should be sorted (biome `useSortedClasses` rule) + +### Services over hooks for business logic + +Put data-fetching logic and derivation in main process services, not renderer hooks. Hooks should be thin wrappers around a single tRPC query. If a hook orchestrates multiple queries and derives a result, that logic belongs in a service exposed via tRPC so it can be reused from both the main process and the renderer. + +### Small focused components + +Extract distinct UI concerns into their own components instead of building long inline ternary chains or conditional blocks. If a section of JSX handles its own logic (e.g. icon selection based on state), pull it into a named component next to where it's used. Keep render functions short and scannable. + +### Async cleanup ordering + +When tearing down async operations that use an AbortController, always abort the controller **before** awaiting any cleanup that depends on it. Otherwise you get a deadlock: the cleanup waits for the operation to stop, but the operation won't stop until the abort signal fires. + +```typescript +// WRONG - deadlocks if interrupt() waits for the operation to finish +await this.interrupt(); // hangs: waits for query to stop +this.abortController.abort(); // never reached + +// RIGHT - abort first so the operation can actually stop +this.abortController.abort(); // cancels in-flight HTTP requests +await this.interrupt(); // resolves because the query was aborted +``` + +### Avoid barrel files + +Do not make use of `index.ts`. Barrel files: + +- Break tree-shaking +- Create circular dependency risks +- Hide the true source of imports +- Make refactoring harder + +Import directly from source files instead. + +--- + +## Architecture + +### Electron app (apps/code) + +The desktop app has two processes. Main is the system of record for business logic and host state. Renderer owns UI state via Zustand and renders the world the main process describes. + +``` +Main Process (Node.js) Renderer Process (React) +┌───────────────────────┐ ┌───────────────────────────┐ +│ DI Container │ │ DI Container │ +│ ├── GitService │ │ ├── TRPCClient │ +│ └── ... │ │ └── narrow renderer svcs │ +├───────────────────────┤ ├───────────────────────────┤ +│ tRPC Routers │ ◄─tRPC(ipcLink)─► │ tRPC Clients │ +│ (resolve services) │ │ ├── useTRPC() (hooks) │ +├───────────────────────┤ │ └── trpcClient (vanilla) │ +│ Services + I/O │ ├───────────────────────────┤ +│ (fs, git, shell, │ │ Zustand Stores │ +│ business logic) │ │ ├── pure UI state │ +└───────────────────────┘ │ └── subscription caches │ + ├───────────────────────────┤ + │ React UI │ + └───────────────────────────┘ +``` + +- Both processes use InversifyJS for DI with singleton scope +- Main holds all services. Renderer DI holds the tRPC client and narrow renderer services +- Zustand stores own all UI state (not in DI) +- Main services emit typed events. Renderer reacts via tRPC subscriptions wired once at boot + +### Dependency injection + +Both processes use [InversifyJS](https://inversify.io/) with singleton scope. Services declare dependencies via constructor injection. No `container.get(...)` inside service methods. + +**Define a service:** + +```typescript +// src/main/services/my-service/service.ts +import { injectable } from "inversify" + +@injectable() +export class MyService { + doSomething() { + // ... + } +} +``` + +**Register the token and binding:** + +```typescript +// src/main/di/tokens.ts +export const MAIN_TOKENS = Object.freeze({ + MyService: Symbol.for("Main.MyService"), +}) + +// src/main/di/container.ts +container.bind(MAIN_TOKENS.MyService).to(MyService) +``` + +**Inject dependencies via constructor:** + +```typescript +import { inject, injectable } from "inversify" +import { MAIN_TOKENS } from "../di/tokens" + +@injectable() +export class MyService { + constructor( + @inject(MAIN_TOKENS.OtherService) + private readonly otherService: OtherService, + ) {} +} +``` + +**Test with mocks via constructor injection or container rebind:** + +```typescript +// Direct instantiation +const mockOther = { getData: vi.fn().mockReturnValue("test") } +const service = new MyService(mockOther as OtherService) + +// Or rebind in container for integration tests +container.snapshot() +container.rebind(MAIN_TOKENS.OtherService).toConstantValue(mockOther) +// ... run tests +container.restore() +``` + +### IPC via tRPC + +We use [tRPC](https://trpc.io/) over Electron IPC via the workspace `@posthog/electron-trpc` package. All inputs and outputs are Zod schemas. Types are inferred from schemas, never declared separately. + +**Three tRPC exports, each for a different context:** + +| Export | Where to use | Purpose | +| ------------ | --------------------------------------------- | ------------------------------------------------------------------------ | +| `useTRPC()` | React components and hooks | Options proxy via React context | +| `trpc` | Outside React (module scope, services, stores) | Options proxy bound to the singleton `queryClient` | +| `trpcClient` | Anywhere (imperative calls) | Vanilla tRPC client for direct `.query()` / `.mutate()` / `.subscribe()` | + +**Create a router (main process). Routers are one-liners that delegate to a backing service:** + +```typescript +// src/main/trpc/routers/my-router.ts +import { container } from "../../di/container" +import { MAIN_TOKENS } from "../../di/tokens" +import { + getDataInput, + getDataOutput, + updateDataInput, +} from "../../services/my-service/schemas" +import { router, publicProcedure } from "../trpc" + +const getService = () => container.get(MAIN_TOKENS.MyService) + +export const myRouter = router({ + getData: publicProcedure + .input(getDataInput) + .output(getDataOutput) + .query(({ input }) => getService().getData(input.id)), + + updateData: publicProcedure + .input(updateDataInput) + .mutation(({ input }) => getService().updateData(input.id, input.value)), +}) +``` + +**Register the router on the root:** + +```typescript +// src/main/trpc/router.ts +import { myRouter } from "./routers/my-router" + +export const trpcRouter = router({ + my: myRouter, + // ... +}) +``` + +**Use in React with TanStack Query:** + +```typescript +import { useTRPC } from "@renderer/trpc/client" +import { useMutation, useQuery } from "@tanstack/react-query" + +function MyComponent() { + const trpc = useTRPC() + + const { data } = useQuery(trpc.my.getData.queryOptions({ id: "123" })) + + const mutation = useMutation( + trpc.my.updateData.mutationOptions({ + onSuccess: () => { /* ... */ }, + }), + ) + const handleUpdate = () => mutation.mutate({ id: "123", value: "new" }) +} +``` + +**Cache invalidation uses `pathFilter()` or `queryFilter()`:** + +```typescript +const queryClient = useQueryClient() + +// Invalidate all queries under a router path +queryClient.invalidateQueries(trpc.workspace.getAll.pathFilter()) + +// Invalidate a specific query by input +queryClient.invalidateQueries( + trpc.git.getCurrentBranch.queryFilter({ directoryPath: repoPath }), +) + +// Set cache data directly +queryClient.setQueryData( + trpc.git.getLatestCommit.queryKey({ directoryPath: repoPath }), + commitData, +) +``` + +**Outside React (stores, sagas, module-scope utilities):** + +```typescript +// Imperative calls use trpcClient +import { trpcClient } from "@renderer/trpc/client" + +const data = await trpcClient.my.getData.query({ id: "123" }) +await trpcClient.my.updateData.mutate({ id: "123", value: "new" }) + +// Cache operations outside React use trpc (the module-level options proxy) +import { trpc } from "@renderer/trpc" +import { queryClient } from "@utils/queryClient" + +queryClient.invalidateQueries(trpc.workspace.getAll.pathFilter()) +``` + +### State management + +All UI state lives in the renderer. Domain state and host state live in main and are exposed via tRPC. Anything that survives a renderer reload, or that another client (mobile, web, CLI) would also need, lives in main. + +```typescript +// ❌ Bad - main service hoarding renderer-shaped state +@injectable() +class TaskService { + private currentTask: Task | null = null // belongs in renderer +} + +// ✅ Good - main service is the system of record for task data +@injectable() +class TaskService { + async readTask(id: string): Promise { /* ... */ } + async writeTask(task: Task): Promise { /* ... */ } +} + +// ✅ Good - renderer state is pure UI selection +const useTaskUiStore = create((set) => ({ + currentTaskId: null, + setCurrentTaskId: (id) => set({ currentTaskId: id }), +})) +``` + +This keeps state predictable, easy to debug and naturally supports patterns like undo and rollback. + +### Services + +Main services live in `src/main/services//`: + +``` +src/main/services/ +└── my-service/ + ├── service.ts # The @injectable() service class + ├── schemas.ts # Zod schemas + event constants for tRPC + └── types.ts # Internal types (not exposed via tRPC) +``` + +**Zod schemas are the source of truth.** Types are inferred from schemas, never declared separately. + +```typescript +// src/main/services/my-service/schemas.ts +import { z } from "zod" + +export const getDataInput = z.object({ id: z.string() }) + +export const getDataOutput = z.object({ + id: z.string(), + name: z.string(), + createdAt: z.string(), +}) + +export type GetDataInput = z.infer +export type GetDataOutput = z.infer +``` + +Services and routers import the schemas and inferred types from the same `schemas.ts`. The router validates at the boundary; the service consumes the inferred types. + +### Events (tRPC subscriptions) + +For pushing real-time updates from main to renderer, services extend `TypedEventEmitter` and routers expose them as subscriptions. + +**Define event names and payload types in `schemas.ts`:** + +```typescript +// src/main/services/my-service/schemas.ts +export const MyServiceEvent = { + ItemCreated: "item-created", + ItemDeleted: "item-deleted", +} as const + +export interface MyServiceEvents { + [MyServiceEvent.ItemCreated]: { id: string; name: string } + [MyServiceEvent.ItemDeleted]: { id: string } +} +``` + +**Extend `TypedEventEmitter` in the service:** + +```typescript +// src/main/services/my-service/service.ts +import { TypedEventEmitter } from "../../lib/typed-event-emitter" +import { MyServiceEvent, type MyServiceEvents } from "./schemas" + +@injectable() +export class MyService extends TypedEventEmitter { + async createItem(name: string) { + const item = { id: "123", name } + this.emit(MyServiceEvent.ItemCreated, item) // typed + return item + } +} +``` + +**Expose as subscriptions via `toIterable()`. Global events broadcast to all subscribers:** + +```typescript +function subscribe(event: K) { + return publicProcedure.subscription(async function* (opts) { + const service = getService() + for await (const data of service.toIterable(event, { signal: opts.signal })) { + yield data + } + }) +} + +export const myRouter = router({ + // ... queries and mutations + onItemCreated: subscribe(MyServiceEvent.ItemCreated), + onItemDeleted: subscribe(MyServiceEvent.ItemDeleted), +}) +``` + +**For per-instance events (shell sessions, workspaces, etc.), filter server-side rather than broadcasting:** + +```typescript +export interface ShellEvents { + [ShellEvent.Data]: { sessionId: string; data: string } + [ShellEvent.Exit]: { sessionId: string; exitCode: number } +} + +function subscribeFiltered(event: K) { + return publicProcedure + .input(sessionIdInput) + .subscription(async function* (opts) { + const service = getService() + const targetSessionId = opts.input.sessionId + for await (const data of service.toIterable(event, { signal: opts.signal })) { + if (data.sessionId === targetSessionId) yield data + } + }) +} +``` + +**Subscribe in the renderer via the feature's subscription registrar, not in components:** + +```typescript +// src/renderer/features/my-feature/subscriptions.ts +import { trpcClient } from "@renderer/trpc/client" + +export function registerMyFeatureSubscriptions() { + trpcClient.my.onItemCreated.subscribe(undefined, { + onData: (item) => useMyStore.getState().handleItemCreated(item), + }) +} +``` + +Subscriptions are started once at app boot. Components do not start subscriptions ad hoc. + +### Adding a new feature + +1. Create the service in `src/main/services//`. Add `schemas.ts` for Zod inputs, outputs and event types. +2. Add a DI token in `src/main/di/tokens.ts`. +3. Register the service in `src/main/di/container.ts`. +4. Create a tRPC router in `src/main/trpc/routers/.ts`. Routers are one-liners that delegate to the service. +5. Mount the router on the root in `src/main/trpc/router.ts`. +6. In the renderer, consume the procedures via `useQuery` and `useMutation`. If the feature pushes events, add a subscription registrar in `src/renderer/features//subscriptions.ts` and register it at boot. + +### MCP apps + +MCP Apps let MCP servers ship interactive HTML UIs alongside their tools. When a tool has an associated `ui://` resource, we render the app's HTML inside a sandboxed iframe instead of the raw tool input and output. + +- Schemas live in `src/shared/types/mcp-apps.ts` because both processes need them. +- `McpAppsService` (`src/main/services/mcp-apps/service.ts`) manages MCP server connections, caches resources (capped at 5MB per resource) and proxies calls between the renderer and remote servers. +- `AgentService` intercepts ACP `sessionUpdate` callbacks for `mcp__` tools and forwards inputs and results to `McpAppsService`. +- The renderer feature is `src/renderer/features/mcp-apps/`. `McpToolBlock` always renders `McpToolView` and additionally renders `McpAppHost` when the tool has a UI resource and the server isn't disabled. +- Apps run in a double-iframe sandbox. The outer iframe loads a generated proxy with `sandbox="allow-scripts allow-same-origin ..."` and the inner iframe enforces a server-declared CSP meta tag. +- `useAppBridge` manages the host side of `@modelcontextprotocol/ext-apps`. App requests route to tRPC mutations. Host context (theme, display mode, dimensions) flows back via the bridge. +- Users can disable MCP Apps per server via `settingsStore.mcpAppsDisabledServers`. + +### Other packages + +- **`packages/agent`** TypeScript agent framework wrapping `@anthropic-ai/claude-agent-sdk`. Owns the ACP connection, worktree management, PostHog API integration, task execution and session management. The cloud agent server is exported via `@posthog/agent/server`. +- **`packages/git`** Platform-agnostic git saga operations (clone, branch, commit, push, stash, worktree, patch, publish), a read-write lock and a gh CLI client. Depends only on `@posthog/shared` and `@posthog/platform`. +- **`packages/enricher`** AST-based PostHog flag call detection and source enrichment across languages. No workspace dependencies. Reusable from any host (Electron, mobile, CI, server). +- **`packages/platform`** Interface-only. Declares the host capabilities a service can depend on (`ISecureStorage`, `IClipboard`, `IDialog`, `INotifier`, `IUpdater`, `IShell`, `IFileSystem`, etc.). No implementations. Per-target adapters fulfill the interfaces. Electron adapters live in `apps/code/src/main/platform-adapters/`. Future React Native and web adapters will live in their respective apps. Domain packages and main services depend on these interfaces, never on Electron APIs directly. +- **`packages/electron-trpc`** tRPC-over-Electron-IPC bridge. +- **`packages/shared`** Zero-dependency shared utilities (Saga pattern for atomic multi-step operations with automatic rollback, cloud-prompt encoding). Built with tsup, outputs ESM. +- **`apps/cli`** Thin shell over the external `@posthog/cli` npm package. Command files handle argument parsing and output formatting only. No business logic. No data transformation. No tree building. + +--- + +## Agent integration guidelines + +- **No rawInput**: Don't use Claude Code SDK's `rawInput`. Only use Zod validated meta fields. This keeps us agent agnostic and gives us a maintainable, extensible format for logs. +- **Use ACP SDK types**: Don't roll your own types for things available in the ACP SDK. Import types directly from `@anthropic-ai/claude-agent-sdk`. +- **Permissions via tool calls**: If something requires user input or approval, implement it through a tool call with a permission instead of custom methods plus notifications. Avoid patterns like `_array/permission_request`. + +## Key libraries + +- React 19, Radix UI Themes, Tailwind CSS +- TanStack Query for data fetching +- xterm.js for terminal emulation +- CodeMirror for code editing +- Tiptap for rich text +- Zod for schema validation +- InversifyJS for dependency injection +- Sonner for toast notifications + +--- + +## Patterns + +### Store / service boundary + +Stores and services have a strict separation of concerns: + +``` +Renderer Main Process ++------------------+ +------------------+ +| Zustand Store | -- tRPC --> | tRPC Router | +| | <-- subs -- +------------------+ +| - Pure state | | +| - Event cache | +------------------+ +| - UI concerns | | Service | +| - Thin actions | | | ++------------------+ | - Orchestration | + | | - Polling | ++------------------+ | - Data fetching | +| Renderer Svc | | - Business logic | +| (narrow only) | +------------------+ +| - UI mechanics | ++------------------+ +``` + +**Renderer stores own:** +- Pure UI state (open/closed, selected item, scroll position) +- Cached data from subscriptions +- Message queues and event buffers +- Permission display state +- Thin action wrappers that call tRPC mutations + +**Renderer services own (narrow escape hatch only):** +- Renderer-only UI mechanics shared across more than one component (visual action queues, global drag-and-drop coordinator, focus ring manager, debounced scroll broadcaster) +- Logic that is awkward to express in a component AND has no domain meaning + +**Renderer services DO NOT own:** +- Cross-store coordination on system events (that belongs in main, with each store reacting to an emitted event via a subscription registrar) +- Multi-step state machines that orchestrate tRPC calls (that is a main service exposed as a single procedure) +- Anything that fetches data, talks to PostHog or holds business state + +**Main process services own:** +- Business logic and orchestration +- Polling loops, retries, dedup, batching +- Data fetching, parsing, transformation +- Long-lived host state (registries, watchers, OAuth flow state) +- Cross-service coordination +- Emission of typed events for the renderer to react to + +Stores never contain business logic, orchestration or data fetching. If a store action does more than update local state or call a single tRPC method, that logic belongs in a main service. When multiple stores need to react to one event (logout clearing auth + seats + settings + navigation), main emits the event and each store reacts via its feature's subscription registrar in `apps/code/src/renderer/features//subscriptions.ts`. Stores never reach into other stores. + +### Zustand stores + +Stores hold pure state with thin actions. Separate state and action interfaces. Use persistence middleware where needed: + +```typescript +interface SidebarStoreState { + open: boolean; + width: number; +} + +interface SidebarStoreActions { + setOpen: (open: boolean) => void; + toggle: () => void; +} + +type SidebarStore = SidebarStoreState & SidebarStoreActions; + +export const useSidebarStore = create()( + persist( + (set) => ({ + open: false, + width: 256, + setOpen: (open) => set({ open }), + toggle: () => set((state) => ({ open: !state.open })), + }), + { + name: "sidebar-storage", + partialize: (state) => ({ open: state.open, width: state.width }), + } + ) +); +``` + +### React components + +Components are functional with hooks. Props typed with interfaces: + +```typescript +interface AgentMessageProps { + content: string; +} + +export function AgentMessage({ content }: AgentMessageProps) { + return ( + + + + ); +} +``` + +Complex components organize hooks by concern (data, UI state, side effects): + +```typescript +export function TaskDetail({ task: initialTask }: TaskDetailProps) { + const taskId = initialTask.id; + useTaskData({ taskId, initialTask }); // Data fetching + + const workspace = useWorkspaceStore((state) => state.workspaces[taskId]); // Store + const [filePickerOpen, setFilePickerOpen] = useState(false); // Local state + + useHotkeys("mod+p", () => setFilePickerOpen(true), {...}); // Effects + useFileWatcher(effectiveRepoPath ?? null, taskId); + // ... +} +``` + +### Tailwind over inline styles + +Always reach for Tailwind utility classes first. The codebase uses Tailwind v4 with CSS variables from Radix Themes (e.g. `--gray-12`, `--space-3`, `--radius-2`). Use Tailwind v4's CSS-var shorthand to bridge them: `text-(--gray-12)`, `bg-(--gray-2)`, `rounded-(--radius-2)`, `border-(--gray-5)`. Use arbitrary values (`text-[13px]`, `pl-[18px]`) when the design token doesn't have a named match. + +Inline `style={{}}` is acceptable in three cases only: + +1. **Genuinely dynamic values** computed at runtime that can't be a class. E.g. `style={{ width: ${pxFromHook}px }}`, `style={{ transform: translateY(${y}px) }}`, pixel positions from measurement, data-driven colors that don't fit a fixed palette. +2. **Library configuration** passed to non-React libraries (CodeMirror's `EditorView.theme(...)`, xterm.js options, etc.). +3. **CSS variables set from JS** that downstream classes consume. `style={{ "--row-color": item.color }}` paired with `className="bg-(--row-color)"`. + +Do NOT use inline `style` for: + +- Color tokens (use `text-(--gray-12)`, `bg-(--gray-2)`, `border-(--gray-5)`) +- Spacing (use `p-3`, `mt-2`, `pl-4`, `gap-2`). Radix `--space-N` matches Tailwind's spacing scale 1:1 for `--space-1`..`--space-4`. `--space-5` = `6`, `--space-6` = `8`, etc. +- Layout primitives (`shrink-0`, `min-w-0`, `flex-1`, `overflow-y-auto`, `w-full`, `h-full`) +- Borders (`border border-(--gray-5)`), radii (`rounded-(--radius-2)` or `rounded-full`) +- Cursors (`cursor-pointer`, `cursor-col-resize`) +- Opacity (`opacity-50`), text-align, text-transform (`uppercase`), white-space, word-break +- Position (`absolute`, `relative`, `fixed`), z-index (`z-10`, `z-[201]`), inset (`inset-0`) +- Animations that map to a Tailwind utility (`animate-spin`) +- Conditional values that can be `className={cond ? "x" : "y"}` or ``className={`base-classes ${cond ? "active-classes" : "inactive-classes"}`}`` + +Default line-heights have been tightened in [apps/code/src/renderer/styles/globals.css](./apps/code/src/renderer/styles/globals.css). Don't add a `leading-*` class for body text unless you specifically want a non-default line-height. For arbitrary sizes (`text-[13px]`), pair with `leading-snug` for body text or `leading-tight` for titles. + +When writing a custom React component that wraps a styled element, accept BOTH `className?: string` and `style?: React.CSSProperties` props and merge the `className` into the inner element's classes (e.g. ``className={`base-classes ${className ?? ""}`}``). This lets call sites override styling via Tailwind without forcing inline `style`. + +### Custom hooks + +Hooks extract store subscriptions or single tRPC queries into cleaner interfaces. Hooks that orchestrate multiple queries belong in a service instead: + +```typescript +export function useConnectivity() { + const isOnline = useConnectivityStore((s) => s.isOnline); + const check = useConnectivityStore((s) => s.check); + return { isOnline, check }; +} +``` + +### Learned hints + +The settings store (`src/renderer/features/settings/stores/settingsStore.ts`) provides a reusable "learned hints" system for progressive feature discovery. Hints are shown a limited number of times until the user demonstrates they've learned the behavior. + +```typescript +const store = useFeatureSettingsStore.getState() + +// Check if a hint should still be shown (max N times, not yet learned) +if (store.shouldShowHint("my-hint-key", 3)) { + store.recordHintShown("my-hint-key") + toast.info("Did you know?", "You can do X with Y.") +} + +// When the user demonstrates the behavior, mark it learned (stops showing) +store.markHintLearned("my-hint-key") +``` + +Hint state is persisted via `electronStorage`. Use this pattern instead of ad-hoc boolean flags when introducing new discoverable features. + +### Logger usage + +Use the scoped logger instead of `console`: + +```typescript +const log = logger.scope("navigation-store"); + +export const useNavigationStore = create()( + persist((set, get) => { + log.info("Folder path is stale, redirecting...", { folderId: folder.id }); + // ... + }) +); +``` + +### Analytics events + +Two PostHog clients emit events: + +- **Renderer** (`posthog-js`) via `track(eventName, properties)` in `src/renderer/utils/analytics.ts` +- **Main** (`posthog-node`) via `trackAppEvent(eventName, properties)` in `src/main/services/posthog-analytics.ts` + +Both register a super-property `team: "posthog-code"`. All event names and property types are defined in `ANALYTICS_EVENTS` and `EventPropertyMap` in `src/shared/types/analytics.ts`. Adding a new event without entries there will fail typechecking. + +**Event names** + +- Format: `Object verbed`. Title Case, sentence-cased, spaces between words. +- First word is the object (`Task`, `Prompt`, `Branch`, `File`). +- Second word is a past-tense verb (`created`, `viewed`, `sent`, `started`, `completed`, `failed`, `cancelled`). +- Only the first word is capitalized. Spell out abbreviations (`Pull request created`, not `PR created`). +- Group by object, not by feature. Prefer `Branch linked` over `Workspace branch linked`. +- Prefer a generic event with a discriminator property over many bespoke events. `Setting changed` with `setting_name`, not `Theme changed` plus `Font changed`. +- Do not prefix events with `First`. "First X" is always derivable in PostHog from the first occurrence of `X` per distinct ID. + +Good: `Task created`, `Prompt sent`, `Setup discovery completed`, `Onboarding step completed` +Bad: `task_created`, `TaskCreated`, `created_task`, `userClickedSendButton`, `PR created` + +**Property names** + +- snake_case, lowercase, no leading underscore. +- Booleans: prefix with `is_`, `has_` or `can_` (`is_initial`, `has_branch`, `has_uncommitted_changes`). +- Counts: suffix with `_count` (`event_count`, `staged_file_count`). +- Durations and sizes: suffix with the unit (`duration_seconds`, `prompt_length_chars`). +- IDs: suffix with `_id` (`task_id`, `discovery_task_run_id`). +- Enums: suffix with `_type`, `_mode`, `_source`, `_kind`, `_reason`, `_action`, or the bare noun if obvious (`category`, `region`). +- Pairs: when capturing a transition, use `from_*` / `to_*` (`from_mode`, `to_mode`). + +**Enum values** + +- snake_case strings, lowercase (`"user_cancelled"`, `"stale_feature_flag"`). +- Never `true`/`false` as a state value. Use a meaningful enum (`"completed"` / `"cancelled"` / `"failed"`, not `success: true/false` unless it really is just success). +- Closed enums get a TypeScript union in `analytics.ts`. Open-ended values are fine when the set evolves freely (e.g. `setting_name`). + +**What does not go into properties** + +- No PII in event names or property values. No email addresses, full names, file paths, prompt contents, repo URLs. Hash if you need to dedupe (`path_hash`). +- No free-form strings when an enum will do. +- No giant payloads. If the value can be reconstructed from another event plus an ID, store the ID. + +--- + +## Testing + +### Commands + +- `pnpm test` Run unit tests across all packages +- `pnpm --filter code test` Run code unit tests only +- `pnpm test:e2e` Run Playwright E2E tests + +### When to write unit tests vs E2E tests + +**Unit tests (Vitest)** Fast, isolated, run frequently: +- Zustand store logic and state transitions +- Pure utility functions and helpers +- Service methods with mocked dependencies +- Complex business logic in isolation +- Data transformations and validators + +**E2E tests (Playwright)** Slower, test real user flows: +- Critical user journeys (auth, task creation, workspace setup) +- IPC communication between main and renderer +- Features requiring real Electron APIs (file system, shell) +- Multi-step workflows spanning multiple components +- Regression tests for reported bugs + +**Rule of thumb**: If it can be tested without Electron running, use a unit test. If it requires the full app context or tests user-facing behavior, use E2E. + +### Test file location + +Tests are colocated with source code using `.test.ts` or `.test.tsx` extension. E2E tests live in `tests/e2e/`. + +### Store testing + +```typescript +describe("store", () => { + beforeEach(() => { + localStorage.clear(); + useStore.setState({ /* reset state */ }); + }); + + it("action changes state", () => { + useStore.getState().action(); + expect(useStore.getState().property).toBe(expectedValue); + }); + + it("persists to localStorage", () => { + useStore.getState().action(); + const persisted = localStorage.getItem("store-key"); + expect(JSON.parse(persisted).state).toEqual(expectedState); + }); +}); +``` + +### Mocking patterns + +**Hoisted mocks for complex modules:** +```typescript +const mockPty = vi.hoisted(() => ({ spawn: vi.fn() })); +vi.mock("node-pty", () => mockPty); +``` + +**Simple module mocks:** +```typescript +vi.mock("@utils/analytics", () => ({ track: vi.fn() })); +``` + +**Global fetch stubbing:** +```typescript +const mockFetch = vi.fn(); +vi.stubGlobal("fetch", mockFetch); +mockFetch.mockResolvedValueOnce(ok()); +``` + +### Test helpers + +Test utilities are in `src/test/`: +- `setup.ts` Global test setup with localStorage mock +- `utils.tsx` `renderWithProviders()` for component tests +- `fixtures.ts` Mock data factories +- `panelTestHelpers.ts` Domain-specific assertions + +--- + +## Directory structure + +``` +apps/code/src/ +├── main/ +│ ├── di/ # InversifyJS container + tokens +│ ├── services/ # Services own all business logic and I/O +│ ├── platform-adapters/ # Electron implementations of @posthog/platform interfaces +│ ├── trpc/ +│ │ ├── router.ts # Root router combining all routers +│ │ └── routers/ # One router per service +│ └── lib/logger.ts +├── renderer/ +│ ├── di/ # Renderer DI container (tRPC client + narrow renderer services) +│ ├── features/ # Feature modules (sessions, tasks, terminal, etc.) +│ │ └── /subscriptions.ts # Subscription registrars wired once at boot +│ ├── stores/ # Zustand stores (pure UI state + subscription caches) +│ ├── services/ # Narrow renderer services (UI mechanics only) +│ ├── hooks/ # Custom React hooks +│ ├── components/ # Shared components +│ ├── trpc/client.ts # tRPC client setup +│ └── utils/ # Utilities, logger, analytics, etc. +├── shared/ # Shared between main & renderer +│ ├── types.ts # Shared type definitions +│ └── constants.ts +├── api/ # PostHog API client +└── test/ # Test utilities +``` + +--- + +## Environment variables + +- Copy `.env.example` to `.env` diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index f70911967..000000000 --- a/CLAUDE.md +++ /dev/null @@ -1,558 +0,0 @@ -# PostHog Code Development Guide - -## Project Structure - -- Monorepo with pnpm workspaces and turbo -- `apps/code` - PostHog Code Electron desktop app (React + Vite) -- `apps/cli` - CLI tool (thin wrapper around @posthog/core) -- `apps/mobile` - React Native mobile app (Expo) -- `packages/agent` - TypeScript agent framework wrapping Claude Agent SDK -- `packages/core` - Shared business logic for jj/GitHub operations -- `packages/electron-trpc` - Custom tRPC package for Electron IPC -- `packages/shared` - Shared utilities (Saga pattern, etc.) used across packages - -## Commands - -- `pnpm install` - Install all dependencies -- `pnpm dev` - Run both agent (watch) and code app via phrocs -- `pnpm dev:mprocs` - Run both agent (watch) and code app via mprocs -- `pnpm dev:agent` - Run agent package in watch mode only -- `pnpm dev:code` - Run code desktop app only -- `pnpm build` - Build all packages (turbo) -- `pnpm typecheck` - Type check all packages -- `pnpm lint` - Lint and auto-fix with biome -- `pnpm format` - Format with biome -- `pnpm test` - Run tests across all packages - -### Code App Specific - -- `pnpm --filter code test` - Run vitest tests -- `pnpm --filter code typecheck` - Type check code app -- `pnpm --filter code package` - Package electron app -- `pnpm --filter code make` - Make distributable - -### Agent Package Specific - -- `pnpm --filter agent build` - Build agent with tsup -- `pnpm --filter agent dev` - Watch mode build -- `pnpm --filter agent typecheck` - Type check agent - -### Shared Package Specific - -- `pnpm --filter @posthog/shared build` - Build shared with tsup -- `pnpm --filter @posthog/shared dev` - Watch mode build -- `pnpm --filter @posthog/shared typecheck` - Type check shared - -## Code Style - -- Prefer writing our own solution over adding external packages when the fix is simple -- Keep functions focused with single responsibility -- Biome for linting and formatting (not ESLint/Prettier) -- 2-space indentation, double quotes -- No `console.*` in source - use logger instead (logger files exempt) -- Path aliases required in renderer code - no relative imports - - `@features/*`, `@components/*`, `@stores/*`, `@hooks/*`, `@utils/*`, `@renderer/*`, `@shared/*`, `@api/*` -- Main process path aliases: `@main/*`, `@api/*`, `@shared/*` -- TypeScript strict mode enabled -- Tailwind CSS classes should be sorted (biome `useSortedClasses` rule) - -### Services Over Hooks for Business Logic - -Put data-fetching logic and derivation in main process services, not renderer hooks. Hooks should be thin wrappers around a single tRPC query. If a hook orchestrates multiple queries and derives a result, that logic belongs in a service exposed via tRPC so it can be reused from both the main process and the renderer. - -### Small Focused Components - -Extract distinct UI concerns into their own components instead of building long inline ternary chains or conditional blocks. If a section of JSX handles its own logic (e.g. icon selection based on state), pull it into a named component next to where it's used. Keep render functions short and scannable. - -### Async Cleanup Ordering - -When tearing down async operations that use an AbortController, always abort the controller **before** awaiting any cleanup that depends on it. Otherwise you get a deadlock: the cleanup waits for the operation to stop, but the operation won't stop until the abort signal fires. - -```typescript -// WRONG - deadlocks if interrupt() waits for the operation to finish -await this.interrupt(); // hangs: waits for query to stop -this.abortController.abort(); // never reached - -// RIGHT - abort first so the operation can actually stop -this.abortController.abort(); // cancels in-flight HTTP requests -await this.interrupt(); // resolves because the query was aborted -``` - -### Avoid Barrel Files - -- Do not make use of index.ts - -Barrel files: - -- Break tree-shaking -- Create circular dependency risks -- Hide the true source of imports -- Make refactoring harder - -Import directly from source files instead. - -## Architecture - -See [ARCHITECTURE.md](./apps/code/ARCHITECTURE.md) for detailed patterns (DI, services, tRPC, state management). - -### Electron App (apps/code) - -- **Main process** (`src/main/`) - Services own all business logic, orchestration, polling, data fetching, and system I/O -- **Renderer process** (`src/renderer/`) - React app with Zustand stores holding pure UI state and thin action wrappers over tRPC -- **IPC**: tRPC over Electron IPC (type-safe via @posthog/electron-trpc) -- **DI**: InversifyJS in both processes (`src/main/di/`, `src/renderer/di/`) -- **Testing**: Vitest with React Testing Library - -### Agent Package (packages/agent) - -- Wraps `@anthropic-ai/claude-agent-sdk` -- Git worktree management in `worktree-manager.ts` -- PostHog API integration in `posthog-api.ts` -- Task execution and session management - -### CLI Package (packages/cli) - -- **Dumb shell, imperative core**: CLI commands should be thin wrappers that call `@posthog/core` -- All business logic belongs in `@posthog/core`, not in CLI command files -- CLI only handles: argument parsing, calling core, formatting output -- No data transformation, tree building, or complex logic in CLI - -### Core Package (packages/core) - -- Shared business logic for jj/GitHub operations - -### Shared Package (packages/shared) - -- Zero-dependency shared utilities used across packages -- Saga pattern for atomic multi-step operations with automatic rollback -- Built with tsup, outputs ESM - -### Mobile App (apps/mobile) - -- React Native + Expo (SDK 54), expo-router for file-based routing -- NativeWind v4 for styling (Tailwind classes compiled to RN styles) -- React Query for server state, Zustand for client state -- See [Mobile App](#mobile-app-appsmobile-1) section below for UI rules and patterns — Electron patterns in `Code Patterns` do NOT apply on mobile - -## Agent Integration Guidelines - -- **No rawInput**: Don't use Claude Code SDK's `rawInput` - only use Zod validated meta fields. This keeps us agent agnostic and gives us a maintainable, extensible format for logs. -- **Use ACP SDK types**: Don't roll your own types for things available in the ACP SDK. Import types directly from `@anthropic-ai/claude-agent-sdk` TypeScript SDK. -- **Permissions via tool calls**: If something requires user input/approval, implement it through a tool call with a permission instead of custom methods + notifications. Avoid patterns like `_array/permission_request`. - -## Key Libraries - -- React 19, Radix UI Themes, Tailwind CSS -- TanStack Query for data fetching -- xterm.js for terminal emulation -- CodeMirror for code editing -- Tiptap for rich text -- Zod for schema validation -- InversifyJS for dependency injection -- Sonner for toast notifications - -## Code Patterns - -### React Components - -Components are functional with hooks. Props typed with interfaces: - -```typescript -interface AgentMessageProps { - content: string; -} - -export function AgentMessage({ content }: AgentMessageProps) { - return ( - - - - ); -} -``` - -Complex components organize hooks by concern (data, UI state, side effects): - -```typescript -export function TaskDetail({ task: initialTask }: TaskDetailProps) { - const taskId = initialTask.id; - useTaskData({ taskId, initialTask }); // Data fetching - - const workspace = useWorkspaceStore((state) => state.workspaces[taskId]); // Store - const [filePickerOpen, setFilePickerOpen] = useState(false); // Local state - - useHotkeys("mod+p", () => setFilePickerOpen(true), {...}); // Effects - useFileWatcher(effectiveRepoPath ?? null, taskId); - // ... -} -``` - -### Tailwind over inline styles - -Always reach for Tailwind utility classes first. The codebase uses Tailwind v4 -with CSS variables from Radix Themes (e.g. `--gray-12`, `--space-3`, -`--radius-2`); use Tailwind v4's CSS-var shorthand to bridge them — `text-(--gray-12)`, -`bg-(--gray-2)`, `rounded-(--radius-2)`, `border-(--gray-5)`. Use arbitrary values -(`text-[13px]`, `pl-[18px]`) when the design token doesn't have a named match. - -Inline `style={{}}` is acceptable in three cases only: - -1. **Genuinely dynamic values** computed at runtime that can't be a class — - e.g. `style={{ width: `${pxFromHook}px` }}`, `style={{ transform: `translateY(${y}px)` }}`, - pixel positions from measurement, data-driven colors that don't fit a fixed palette. -2. **Library configuration** passed to non-React libraries (CodeMirror's - `EditorView.theme(...)`, xterm.js options, etc.). -3. **CSS variables set from JS** that downstream classes consume — - `style={{ "--row-color": item.color }}` paired with `className="bg-(--row-color)"`. - -Do NOT use inline `style` for: - -- Color tokens (use `text-(--gray-12)`, `bg-(--gray-2)`, `border-(--gray-5)`) -- Spacing (use `p-3`, `mt-2`, `pl-4`, `gap-2`) — Radix `--space-N` matches Tailwind's - spacing scale 1:1 for `--space-1`..`--space-4`; `--space-5` = `6`, `--space-6` = `8`, etc. -- Layout primitives (`shrink-0`, `min-w-0`, `flex-1`, `overflow-y-auto`, `w-full`, `h-full`) -- Borders (`border border-(--gray-5)`), radii (`rounded-(--radius-2)` or `rounded-full`) -- Cursors (`cursor-pointer`, `cursor-col-resize`) -- Opacity (`opacity-50`), text-align, text-transform (`uppercase`), white-space, word-break -- Position (`absolute`, `relative`, `fixed`), z-index (`z-10`, `z-[201]`), inset (`inset-0`) -- Animations that map to a Tailwind utility (`animate-spin`) -- Conditional values that can be `className={cond ? "x" : "y"}` or - `className={\`base-classes ${cond ? "active-classes" : "inactive-classes"}\`}` - -Default line-heights have been tightened (`text-sm` ships with etc.) -in [apps/code/src/renderer/styles/globals.css](./apps/code/src/renderer/styles/globals.css). -Don't add a `leading-*` class for body text unless you specifically want a non-default -line-height. For arbitrary sizes (`text-[13px]`), pair with `leading-snug` for body -text or `leading-tight` for titles. - -When writing a custom React component that wraps a styled element, accept BOTH -`className?: string` and `style?: React.CSSProperties` props and merge the -`className` into the inner element's classes (e.g. ``className={`base-classes ${className ?? ""}`}``). -This lets call sites override styling via Tailwind without forcing inline `style`. - -### Store / Service Boundary - -Stores and services have a strict separation of concerns: - -``` -Renderer Main Process -+------------------+ +------------------+ -| Zustand Store | -- tRPC --> | tRPC Router | -| | <-- subs -- +------------------+ -| - Pure state | | -| - Event cache | +------------------+ -| - UI concerns | | Service | -| - Thin actions | | | -+------------------+ | - Orchestration | - | | - Polling | -+------------------+ | - Data fetching | -| Service | | - Business logic | -| | +------------------+ -| - Cross-store | -| coordination | -| - Client-side | -| state machines | -+------------------+ -``` - -**Renderer stores own:** -- Pure UI state (open/closed, selected item, scroll position) -- Cached data from subscriptions -- Message queues and event buffers -- Permission display state -- Thin action wrappers that call tRPC mutations - -**Renderer services own:** -- Coordination between multiple stores -- Client-side-only state machines and logic - -**Main process services own:** -- Business logic and orchestration -- Polling loops and background work -- Data fetching, parsing, and transformation -- Connection management and coordination between services - -Stores should never contain business logic, orchestration, or data fetching. If a store action does more than update local state or call a single tRPC method, that logic belongs in a service. Services typically live in the main process, but renderer-side services are fine when the logic is purely client-side (e.g., coordinating between stores, managing local-only state machines). - -### Zustand Stores - -Stores hold pure state with thin actions. Separate state and action interfaces, use persistence middleware where needed: - -```typescript -interface SidebarStoreState { - open: boolean; - width: number; -} - -interface SidebarStoreActions { - setOpen: (open: boolean) => void; - toggle: () => void; -} - -type SidebarStore = SidebarStoreState & SidebarStoreActions; - -export const useSidebarStore = create()( - persist( - (set) => ({ - open: false, - width: 256, - setOpen: (open) => set({ open }), - toggle: () => set((state) => ({ open: !state.open })), - }), - { - name: "sidebar-storage", - partialize: (state) => ({ open: state.open, width: state.width }), - } - ) -); -``` - -### tRPC Routers (Main Process) - -Routers get services from DI container per-request: - -```typescript -const getService = () => container.get(MAIN_TOKENS.GitService); - -export const gitRouter = router({ - detectRepo: publicProcedure - .input(detectRepoInput) - .output(detectRepoOutput) - .query(({ input }) => getService().detectRepo(input.directoryPath)), - - onCloneProgress: publicProcedure.subscription(async function* (opts) { - const service = getService(); - for await (const data of service.toIterable(GitServiceEvent.CloneProgress, { signal: opts.signal })) { - yield data; - } - }), -}); -``` - -### Services (Main Process) - -Services are injectable, own all business logic, and emit events to the renderer via tRPC subscriptions. Orchestration, polling, data fetching, and coordination between services all belong here - not in stores: - -```typescript -@injectable() -export class GitService extends TypedEventEmitter { - public async detectRepo(directoryPath: string): Promise { - if (!directoryPath) return null; - const remoteUrl = await this.getRemoteUrl(directoryPath); - // ... - } -} -``` - -### Custom Hooks - -Hooks extract store subscriptions into cleaner interfaces: - -```typescript -export function useConnectivity() { - const isOnline = useConnectivityStore((s) => s.isOnline); - const check = useConnectivityStore((s) => s.check); - return { isOnline, check }; -} -``` - -### Logger Usage - -Use scoped logger instead of console: - -```typescript -const log = logger.scope("navigation-store"); - -export const useNavigationStore = create()( - persist((set, get) => { - log.info("Folder path is stale, redirecting...", { folderId: folder.id }); - // ... - }) -); -``` - -## Mobile App (apps/mobile) - -When working in `apps/mobile/`, the patterns in `Code Patterns` above are for the **Electron renderer** (web DOM, Radix, web Tailwind v4). They do NOT apply here. Mobile is React Native: no `
`, no `window`/`document`/`localStorage`, no `:hover`/`cursor-*`/`focus-visible:`, no CSS `position: fixed`, no `overflow-y-auto`. If a feature only exists in CSS, it doesn't exist on mobile — design for touch and native primitives. - -See [apps/mobile/README.md](./apps/mobile/README.md) for setup, build profiles, and full command list. - -### Mobile UI Principles - -Every screen must be designed for a phone: portrait-first, touch-driven, dark + light mode, safe areas honoured, keyboard-aware. Treat tablet/landscape as a stretch goal, not a baseline — but never let layouts hard-break on them. - -- **Touch targets are 44pt minimum.** Use `hitSlop` to widen the hit area when the visual element is smaller. Never assume a pointer. -- **Provide press feedback.** `active:opacity-*` or `active:bg-*` on every `Pressable`. There is no hover state — feedback only happens on press. -- **Honour safe areas.** Use `useSafeAreaInsets()` from `react-native-safe-area-context` for top/bottom padding. Never hardcode status-bar height. Edge-to-edge screens (no native header) MUST account for the notch and home indicator. -- **Keyboard handling is mandatory for any input.** Use `react-native-keyboard-controller`'s `KeyboardAvoidingView` / `KeyboardAwareScrollView`. Set `keyboardShouldPersistTaps="handled"` on scroll containers that contain inputs. Verify the composer/input remains visible with the keyboard up. -- **Dark mode is not optional.** Every new screen must work in both light and dark. Pick from theme tokens, never raw hex. -- **One-handed reachability.** Primary actions belong in the bottom half of the screen where the thumb actually lives. Avoid forcing reach to the top corners for frequent actions — that's what `FloatingBackButton` / floating CTAs are for. -- **Respect platform conventions.** iOS swipe-back gestures, Android hardware back, sheet/modal idioms. Don't reinvent navigation. - -### Primitives - -- **Layout & containers:** `View`, `ScrollView`, `FlatList`. Never reach for HTML elements; they don't exist. -- **Long lists:** Always `FlatList` (or `SectionList`) with a stable `keyExtractor`. Plain `ScrollView` is for short, bounded content only. -- **Text:** Import from `@components/text` — it applies the project's default font stack. Direct `react-native` `Text` is monkey-patched in [textDefaults.ts](apps/mobile/src/lib/textDefaults.ts) but the wrapper is preferred for consistency. -- **Buttons / tappables:** `Pressable`. Always set `hitSlop` and an `active:*` class. -- **Icons:** `phosphor-react-native`. Pass color via `useThemeColors()` (e.g. `color={themeColors.gray[12]}`), never a hex literal. -- **Animations:** `react-native-reanimated` v4. Do not use the legacy `Animated` API. -- **Haptics:** `expo-haptics` for confirmation / destructive actions. Pair with visual feedback — haptics alone are not a signal. - -### Styling: NativeWind + Theme Tokens - -Mobile uses NativeWind v3 with the token system defined in [theme.ts](apps/mobile/src/lib/theme.ts) and exposed via [tailwind.config.js](apps/mobile/tailwind.config.js). - -- **Use named token classes**, not hex: `bg-gray-1`, `bg-gray-2`, `text-gray-12`, `border-gray-6`, `bg-accent-9`, `text-accent-11`, `bg-background`, `bg-card`, `text-status-error`. These automatically switch between light and dark. -- **Arbitrary values** (`text-[15px]`, `pl-[18px]`) are fine when the design token doesn't match. Pair body text with `leading-snug`, titles with `leading-tight`. -- **For native props that take a color directly** — `ActivityIndicator`, `RefreshControl`, `StatusBar`, gradient stops, icon `color={...}` — call `useThemeColors()` and pass the hex. Don't hardcode. -- **For transparent variants** (gradients, overlays), use `toRgba(themeColors.background, 0.92)` rather than guessing rgba values. - -Inline `style={{}}` on mobile is acceptable ONLY for: - -1. **Runtime-computed values:** `style={{ paddingTop: insets.top + 8 }}`, `style={{ height: fadeHeight }}`, `transform: [{ translateY }]` driven by Reanimated/measurement. -2. **Library configuration objects** that aren't React props (e.g. `LinearGradient`'s absolute fill, gesture handler configs). -3. **Theme tokens consumed by native components** that don't accept className (passed to `contentStyle`, `headerStyle`, etc.). - -Do NOT use inline `style` for static color, spacing, layout, border, radius, opacity, position, or z-index — those are all NativeWind classes. If a conditional looks like `style={{ color: isActive ? a : b }}`, rewrite as ``className={`base ${isActive ? "text-accent-9" : "text-gray-10"}`}``. - -When writing custom components, accept `className?: string` and merge it into the inner element so call sites can override styling without inline `style`. - -### Navigation & Screen Patterns - -- **expo-router**, file-based. Routes live in [src/app/](apps/mobile/src/app/). `(group)/` is a layout group, `[id].tsx` is a dynamic param. -- **Modals:** Configure on the Stack screen with `presentation: "modal"` — see [_layout.tsx](apps/mobile/src/app/_layout.tsx). Don't roll a custom modal component when a stack modal will do. -- **Headers:** Prefer the existing floating header pattern ([FloatingBackButton](apps/mobile/src/components/FloatingBackButton.tsx), [FloatingTaskHeader](apps/mobile/src/features/tasks/components/FloatingTaskHeader.tsx)) over the native stack header. It lets content fill the full screen (incl. behind the status bar) and looks correct in both light/dark. -- **Don't go back blindly.** Always guard with `if (router.canGoBack()) router.back()`. - -### Storage & Side Effects - -- **Persistent key/value:** `@react-native-async-storage/async-storage` — NOT `localStorage` (doesn't exist on RN). -- **Secrets / tokens:** `expo-secure-store`. -- **Logger:** Use `@/lib/logger`. Never `console.*` in source. -- **Path alias:** `@/*` → `apps/mobile/src/*`. Don't use deep relative imports. - -### Platform Differences - -- Split iOS/Android behavior with `Platform.OS === "ios"`. Don't ship iOS-only APIs (`expo-glass-effect`, certain haptics, modal `presentation: "formSheet"`) without an Android fallback. -- iOS swipe-back is on by default — don't disable it without a strong reason. On Android, ensure hardware back behaves the same. - -### Verifying Mobile UI Work - -You cannot fully validate mobile UI from a typecheck. Before claiming a mobile UI task is done: - -1. Mentally (or actually) walk the layout through: small iPhone (e.g. iPhone SE), large iPhone (Pro Max), with and without dynamic type bumped. -2. Check both light and dark mode — switch the simulator's appearance and verify token-based colors still read. -3. With the keyboard up — does the focused input stay visible? Does the back/submit button still tap? -4. Safe areas — does anything sit under the notch or home indicator? -5. If you can't actually run it, say so explicitly rather than reporting success. - -## Testing - -### Commands - -- `pnpm test` - Run unit tests across all packages -- `pnpm --filter code test` - Run code unit tests only -- `pnpm test:e2e` - Run Playwright E2E tests - -### When to Write Unit Tests vs E2E Tests - -**Unit tests (Vitest)** - Fast, isolated, run frequently: -- Zustand store logic and state transitions -- Pure utility functions and helpers -- Service methods with mocked dependencies -- Complex business logic in isolation -- Data transformations and validators - -**E2E tests (Playwright)** - Slower, test real user flows: -- Critical user journeys (auth, task creation, workspace setup) -- IPC communication between main and renderer -- Features requiring real Electron APIs (file system, shell) -- Multi-step workflows spanning multiple components -- Regression tests for reported bugs - -**Rule of thumb**: If it can be tested without Electron running, use a unit test. If it requires the full app context or tests user-facing behavior, use E2E. - -### Test File Location - -Tests are colocated with source code using `.test.ts` or `.test.tsx` extension. E2E tests live in `tests/e2e/`. - -### Store Testing - -```typescript -describe("store", () => { - beforeEach(() => { - localStorage.clear(); - useStore.setState({ /* reset state */ }); - }); - - it("action changes state", () => { - useStore.getState().action(); - expect(useStore.getState().property).toBe(expectedValue); - }); - - it("persists to localStorage", () => { - useStore.getState().action(); - const persisted = localStorage.getItem("store-key"); - expect(JSON.parse(persisted).state).toEqual(expectedState); - }); -}); -``` - -### Mocking Patterns - -**Hoisted mocks for complex modules:** -```typescript -const mockPty = vi.hoisted(() => ({ spawn: vi.fn() })); -vi.mock("node-pty", () => mockPty); -``` - -**Simple module mocks:** -```typescript -vi.mock("@utils/analytics", () => ({ track: vi.fn() })); -``` - -**Global fetch stubbing:** -```typescript -const mockFetch = vi.fn(); -vi.stubGlobal("fetch", mockFetch); -mockFetch.mockResolvedValueOnce(ok()); -``` - -### Test Helpers - -Test utilities are in `src/test/`: -- `setup.ts` - Global test setup with localStorage mock -- `utils.tsx` - `renderWithProviders()` for component tests -- `fixtures.ts` - Mock data factories -- `panelTestHelpers.ts` - Domain-specific assertions - -## Directory Structure - -``` -apps/code/src/ -├── main/ -│ ├── di/ # InversifyJS container + tokens -│ ├── services/ # Stateless services (git, shell, workspace, etc.) -│ ├── trpc/ -│ │ ├── router.ts # Root router combining all routers -│ │ └── routers/ # Individual routers per service -│ └── lib/logger.ts -├── renderer/ -│ ├── di/ # Renderer DI container -│ ├── features/ # Feature modules (sessions, tasks, terminal, etc.) -│ ├── stores/ # Zustand stores (21+ stores) -│ ├── hooks/ # Custom React hooks -│ ├── components/ # Shared components -│ ├── trpc/client.ts # tRPC client setup -│ └── utils/ # Utilities, logger, analytics, etc. -├── shared/ # Shared between main & renderer -│ ├── types.ts # Shared type definitions -│ └── constants.ts -├── api/ # PostHog API client -└── test/ # Test utilities -``` - -## Environment Variables - -- Copy `.env.example` to `.env` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4c6356c0f..eee212675 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,7 +30,7 @@ See [docs/LOCAL-DEVELOPMENT.md](./docs/LOCAL-DEVELOPMENT.md) for connecting to a - Resolve merge conflicts before requesting review - Keep changes focused -- one logical change per PR - Add tests where they meaningfully improve confidence -- Follow existing patterns and conventions in the areas you touch (see [CLAUDE.md](./CLAUDE.md) for code style details) +- Follow existing patterns and conventions in the areas you touch (see [AGENTS.md](./AGENTS.md) for architecture rules and code style) ## What to expect diff --git a/README.md b/README.md index 164d7cc80..09a03efda 100644 --- a/README.md +++ b/README.md @@ -82,10 +82,9 @@ posthog-code/ | File | Description | |------|-------------| | [apps/code/README.md](./apps/code/README.md) | Desktop app: building, signing, distribution, and workspace configuration | -| [apps/code/ARCHITECTURE.md](./apps/code/ARCHITECTURE.md) | Desktop app: dependency injection, tRPC, state management, and events | | [apps/mobile/README.md](./apps/mobile/README.md) | Mobile app: Expo setup, EAS builds, and TestFlight deployment | | [apps/cli/README.md](./apps/cli/README.md) | CLI: stacked PR management with Jujutsu | -| [CLAUDE.md](./CLAUDE.md) | Code style, patterns, and testing guidelines | +| [AGENTS.md](./AGENTS.md) | Architecture rules, code style, patterns, and testing guidelines (read by Claude Code, Codex, Cursor, Aider, etc.) | | [CONTRIBUTING.md](./CONTRIBUTING.md) | How to contribute to PostHog Code | | [docs/LOCAL-DEVELOPMENT.md](./docs/LOCAL-DEVELOPMENT.md) | Connecting PostHog Code to a local PostHog instance | | [docs/UPDATES.md](./docs/UPDATES.md) | Release versioning and git tagging | diff --git a/REFACTOR.md b/REFACTOR.md new file mode 100644 index 000000000..aa4608277 --- /dev/null +++ b/REFACTOR.md @@ -0,0 +1,5 @@ +# Foundation Refactor: Architecture and Rules + +This is the w.i.p. architectural plan for PostHog Code. It defines what the next version of the codebase looks like, where each kind of code lives, how the pieces talk and the rules every PR (human or agent) follows. + +We are doing this because the codebase has drifted into "business logic everywhere", time to move mountains! \ No newline at end of file diff --git a/apps/code/ARCHITECTURE.md b/apps/code/ARCHITECTURE.md deleted file mode 100644 index f5055a363..000000000 --- a/apps/code/ARCHITECTURE.md +++ /dev/null @@ -1,613 +0,0 @@ -# PostHog Code Architecture - -Implementation patterns for the PostHog Code desktop app. For code style and commands, see [CLAUDE.md](./CLAUDE.md). - -## Overview - -PostHog Code is an Electron app with a React renderer. The main process handles system operations (stateless), while the renderer owns all application state. - -``` -Main Process (Node.js) Renderer Process (React) -┌───────────────────────┐ ┌───────────────────────────┐ -│ DI Container │ │ DI Container │ -│ ├── GitService │ │ ├── TRPCClient │ -│ └── ... │ │ └── TaskService, ... │ -├───────────────────────┤ ├───────────────────────────┤ -│ tRPC Routers │ ◄─tRPC(ipcLink)─► │ tRPC Clients │ -│ (use DI services) │ │ ├── useTRPC() (hooks) │ -├───────────────────────┤ │ └── trpcClient (vanilla) │ -│ System I/O │ ├───────────────────────────┤ -│ (fs, git, shell) │ │ Zustand Stores (state) │ -│ STATELESS │ │ ├── taskStore │ -└───────────────────────┘ │ ├── workspaceStore │ - │ └── ... │ - ├───────────────────────────┤ - │ React UI │ - └───────────────────────────┘ -``` - -**Key points:** - -- Both processes use InversifyJS for DI -- Renderer DI holds services + tRPC client; services can coordinate stores -- Zustand stores own all application state (not in DI) -- Main process is stateless - pure I/O operations only - -## Dependency Injection - -Both processes use [InversifyJS](https://inversify.io/) for dependency injection with singleton scope. - -| Process | Container | Holds | -| -------- | ------------------ | ------------------------------------- | -| Main | `src/main/di/` | Stateless services (GitService, etc.) | -| Renderer | `src/renderer/di/` | Services + TRPCClient | - -### Defining a Service - -```typescript -// src/main/services/my-service/service.ts (or src/renderer/services/) -import { injectable } from "inversify" - -@injectable() -export class MyService { - doSomething() { - // ... - } -} -``` - -### Registering a Service - -```typescript -// src/main/di/container.ts (or src/renderer/di/container.ts) -container.bind(TOKENS.MyService).to(MyService) -``` - -```typescript -// src/main/di/tokens.ts (or src/renderer/di/tokens.ts) -export const MAIN_TOKENS = Object.freeze({ - MyService: Symbol.for("Main.MyService"), -}) -``` - -### Injecting Dependencies - -Services should declare dependencies via constructor injection: - -```typescript -import { inject, injectable } from "inversify" -import { MAIN_TOKENS } from "../di/tokens" - -@injectable() -export class MyService { - constructor( - @inject(MAIN_TOKENS.OtherService) - private readonly otherService: OtherService, - ) {} - - doSomething() { - return this.otherService.getData() - } -} -``` - -### Using Services in tRPC Routers - -tRPC routers resolve services from the container: - -```typescript -import { container } from "../../di/container" -import { MAIN_TOKENS } from "../../di/tokens" - -const getService = () => container.get(MAIN_TOKENS.MyService) - -export const myRouter = router({ - getData: publicProcedure.query(() => getService().getData()), -}) -``` - -### Testing with Mocks - -Constructor injection makes testing straightforward: - -```typescript -// Direct instantiation with mock -const mockOtherService = { getData: vi.fn().mockReturnValue("test") } -const service = new MyService(mockOtherService as OtherService) - -// Or rebind in container for integration tests -container.snapshot() -container.rebind(MAIN_TOKENS.OtherService).toConstantValue(mockOtherService) -// ... run tests ... -container.restore() -``` - -## IPC via tRPC - -We use [tRPC](https://trpc.io/) with [trpc-electron](https://github.com/jsonnull/electron-trpc) for type-safe communication between main and renderer. The `ipcLink()` handles serialization over Electron IPC. - -### Creating a Router (Main Process) - -```typescript -// src/main/trpc/routers/my-router.ts -import { container } from "../../di/container" -import { MAIN_TOKENS } from "../../di/tokens" -import { - getDataInput, - getDataOutput, - updateDataInput, -} from "../../services/my-service/schemas" -import { router, publicProcedure } from "../trpc" - -const getService = () => container.get(MAIN_TOKENS.MyService) - -export const myRouter = router({ - getData: publicProcedure - .input(getDataInput) - .output(getDataOutput) - .query(({ input }) => getService().getData(input.id)), - - updateData: publicProcedure - .input(updateDataInput) - .mutation(({ input }) => getService().updateData(input.id, input.value)), -}) -``` - -### Registering the Router - -```typescript -// src/main/trpc/router.ts -import { myRouter } from "./routers/my-router" - -export const trpcRouter = router({ - my: myRouter, - // ... -}) -``` - -### Using tRPC in Renderer - -There are three tRPC exports, each for a different context: - -| Export | Where to use | Purpose | -| ------------ | ---------------------------------------------- | ------------------------------------------------------------------------ | -| `useTRPC()` | React components/hooks | Options proxy via React context | -| `trpc` | Outside React (module scope, services, stores) | Options proxy bound to the singleton `queryClient` | -| `trpcClient` | Anywhere (imperative calls) | Vanilla tRPC client for direct `.query()` / `.mutate()` / `.subscribe()` | - -**React components** use `useTRPC()` + TanStack Query hooks: - -```typescript -import { useTRPC } from "@renderer/trpc/client" -import { useMutation, useQuery } from "@tanstack/react-query" - -function MyComponent() { - const trpc = useTRPC() - - // Queries — pass queryOptions() to useQuery - const { data } = useQuery(trpc.my.getData.queryOptions({ id: "123" })) - - // Mutations — pass mutationOptions() to useMutation - const mutation = useMutation( - trpc.my.updateData.mutationOptions({ - onSuccess: () => { - /* ... */ - }, - }), - ) - const handleUpdate = () => mutation.mutate({ id: "123", value: "new" }) -} -``` - -**Subscriptions** use `useSubscription` from `@trpc/tanstack-react-query`: - -```typescript -import { useSubscription } from "@trpc/tanstack-react-query" - -useSubscription( - trpc.my.onItemCreated.subscriptionOptions(undefined, { - onData: (item) => { - /* ... */ - }, - }), -) -``` - -**Cache invalidation** uses `pathFilter()` or `queryFilter()` with the query client: - -```typescript -const queryClient = useQueryClient() - -// Invalidate all queries under a router path -queryClient.invalidateQueries(trpc.workspace.getAll.pathFilter()) - -// Invalidate a specific query by input -queryClient.invalidateQueries( - trpc.git.getCurrentBranch.queryFilter({ directoryPath: repoPath }), -) - -// Set cache data directly -queryClient.setQueryData( - trpc.git.getLatestCommit.queryKey({ directoryPath: repoPath }), - commitData, -) -``` - -**Outside React** (stores, sagas, services, module-scope utilities): - -```typescript -// Imperative calls — use trpcClient -import { trpcClient } from "@renderer/trpc/client" - -const data = await trpcClient.my.getData.query({ id: "123" }) -await trpcClient.my.updateData.mutate({ id: "123", value: "new" }) - -// Cache operations outside React — use trpc (the module-level options proxy) -import { trpc } from "@renderer/trpc" -import { queryClient } from "@utils/queryClient" - -queryClient.invalidateQueries(trpc.workspace.getAll.pathFilter()) -``` - -## State Management - -**All application state lives in the renderer.** Main process services should be stateless/pure. - -| Layer | State | Role | -| ------------ | -------------- | -------------------------------------------- | -| **Renderer** | Zustand stores | Owns all application state | -| **Main** | Stateless | Pure operations (file I/O, git, shell, etc.) | - -This keeps state predictable, easy to debug, and naturally supports patterns like undo/rollback. - -### Example - -```typescript -// ❌ Bad - main service with state -@injectable() -class TaskService { - private currentTask: Task | null = null // Don't do this -} - -// ✅ Good - main service is pure -@injectable() -class TaskService { - async readTask(id: string): Promise { - /* ... */ - } - async writeTask(task: Task): Promise { - /* ... */ - } -} - -// ✅ Good - state lives in renderer -// src/renderer/stores/task-store.ts -const useTaskStore = create((set) => ({ - currentTask: null, - setCurrentTask: (task) => set({ currentTask: task }), -})) -``` - -### Learned Hints - -The settings store (`src/renderer/features/settings/stores/settingsStore.ts`) provides a reusable "learned hints" system for progressive feature discovery. Hints are shown a limited number of times until the user demonstrates they've learned the behavior. - -```typescript -// In the store: hints is Record -const store = useFeatureSettingsStore.getState() - -// Check if a hint should still be shown (max N times, not yet learned) -if (store.shouldShowHint("my-hint-key", 3)) { - store.recordHintShown("my-hint-key") - toast.info("Did you know?", "You can do X with Y.") -} - -// When the user demonstrates the behavior, mark it learned (stops showing) -store.markHintLearned("my-hint-key") -``` - -Hint state is persisted via `electronStorage`. Use this pattern instead of ad-hoc boolean flags when introducing new discoverable features. - -## Services - -Services encapsulate business logic and exist in both processes: - -- **Main services** (`src/main/services/`) - System operations (file I/O, git, shell) -- **Renderer services** (`src/renderer/services/`) - UI logic, API calls - -Main services should be: - -- **Injectable**: Decorated with `@injectable()` for DI -- **Stateless**: No mutable instance state, pure operations only -- **Single responsibility**: One concern per service - -### Service Structure - -``` -src/main/services/ -├── my-service/ -│ ├── service.ts # The injectable service class -│ ├── schemas.ts # Zod schemas for tRPC input/output -│ └── types.ts # Internal types (not exposed via tRPC) - -src/renderer/services/ -├── my-service.ts # Renderer-side service -``` - -### Zod Schemas - -All tRPC inputs and outputs use Zod schemas as the single source of truth. Types are inferred from schemas. - -```typescript -// src/main/services/my-service/schemas.ts -import { z } from "zod" - -export const getDataInput = z.object({ - id: z.string(), -}) - -export const getDataOutput = z.object({ - id: z.string(), - name: z.string(), - createdAt: z.string(), -}) - -export type GetDataInput = z.infer -export type GetDataOutput = z.infer -``` - -```typescript -// src/main/trpc/routers/my-router.ts -import { getDataInput, getDataOutput } from "../../services/my-service/schemas" - -export const myRouter = router({ - getData: publicProcedure - .input(getDataInput) - .output(getDataOutput) - .query(({ input }) => getService().getData(input.id)), -}) -``` - -```typescript -// src/main/services/my-service/service.ts -import type { GetDataInput, GetDataOutput } from "./schemas" - -@injectable() -export class MyService { - async getData(id: string): Promise { - // ... - } -} -``` - -This pattern provides: - -- Runtime validation of inputs and outputs -- Single source of truth for types -- Explicit API contracts between main and renderer - -## Adding a New Feature - -1. **Create the service** in `src/main/services/` -2. **Add DI token** in `src/main/di/tokens.ts` -3. **Register service** in `src/main/di/container.ts` -4. **Create tRPC router** in `src/main/trpc/routers/` -5. **Add router** to `src/main/trpc/router.ts` -6. **Use in renderer** via `useTRPC()` + TanStack Query hooks - -## Events (tRPC Subscriptions) - -For pushing real-time updates from main to renderer, use tRPC subscriptions with typed event emitters. - -### 1. Define Events in schemas.ts - -Use a const object for event names and an interface for payloads: - -```typescript -// src/main/services/my-service/schemas.ts -export const MyServiceEvent = { - ItemCreated: "item-created", - ItemDeleted: "item-deleted", -} as const - -export interface MyServiceEvents { - [MyServiceEvent.ItemCreated]: { id: string; name: string } - [MyServiceEvent.ItemDeleted]: { id: string } -} -``` - -### 2. Extend TypedEventEmitter in Service - -```typescript -// src/main/services/my-service/service.ts -import { TypedEventEmitter } from "../../lib/typed-event-emitter" -import { MyServiceEvent, type MyServiceEvents } from "./schemas" - -@injectable() -export class MyService extends TypedEventEmitter { - async createItem(name: string) { - const item = { id: "123", name } - // TypeScript enforces correct event name and payload shape - this.emit(MyServiceEvent.ItemCreated, item) - return item - } -} -``` - -### 3. Create Subscriptions in Router - -Use `toIterable()` on the service to convert events to an async iterable. For global events (broadcast to all subscribers): - -```typescript -// src/main/trpc/routers/my-router.ts -import { - MyServiceEvent, - type MyServiceEvents, -} from "../../services/my-service/schemas" - -function subscribe(event: K) { - return publicProcedure.subscription(async function* (opts) { - const service = getService() - const iterable = service.toIterable(event, { signal: opts.signal }) - for await (const data of iterable) { - yield data - } - }) -} - -export const myRouter = router({ - // ... queries and mutations - onItemCreated: subscribe(MyServiceEvent.ItemCreated), - onItemDeleted: subscribe(MyServiceEvent.ItemDeleted), -}) -``` - -For per-instance events (e.g., shell sessions), filter by an identifier: - -```typescript -// Events include an identifier to filter on -export interface ShellEvents { - [ShellEvent.Data]: { sessionId: string; data: string } - [ShellEvent.Exit]: { sessionId: string; exitCode: number } -} - -// Router filters events to the specific session -function subscribeFiltered(event: K) { - return publicProcedure - .input(sessionIdInput) - .subscription(async function* (opts) { - const service = getService() - const targetSessionId = opts.input.sessionId - const iterable = service.toIterable(event, { signal: opts.signal }) - - for await (const data of iterable) { - if (data.sessionId === targetSessionId) { - yield data - } - } - }) -} - -export const shellRouter = router({ - onData: subscribeFiltered(ShellEvent.Data), - onExit: subscribeFiltered(ShellEvent.Exit), -}) -``` - -### 4. Subscribe in Renderer - -```typescript -import { useSubscription } from "@trpc/tanstack-react-query" - -const trpc = useTRPC() - -// React component - global events -useSubscription( - trpc.my.onItemCreated.subscriptionOptions(undefined, { - enabled: true, - onData: (item) => { - // item is typed as { id: string; name: string } - }, - }), -) - -// React component - per-session events -useSubscription( - trpc.shell.onData.subscriptionOptions( - { sessionId }, - { - enabled: !!sessionId, - onData: (event) => { - // event is typed as { sessionId: string; data: string } - terminal.write(event.data) - }, - }, - ), -) -``` - -## MCP Apps - -MCP Apps let MCP servers ship interactive HTML UIs alongside their tools. When a tool has an associated `ui://` resource, we render the app's HTML inside a sandboxed iframe instead of showing the raw tool input/output. - -### How It Works - -``` -Agent Session Main Process Renderer -┌──────────────┐ ┌─────────────────────┐ ┌───────────────────────────┐ -│ Tool call │─-session/update─►│ AgentService │ │ McpToolBlock │ -│ (mcp__X__Y) │ │ ├─notifyToolInput │──event──►│ ├─ hasUiForTool? │ -│ │ │ └─notifyToolResult │──event──►│ ├─ McpAppHost │ -└──────────────┘ ├─────────────────────┤ │ │ ├─ iframe (sandbox). │ - │ McpAppsService │ │ │ └─ useAppBridge │ - │ ├─ connections │◄─proxy───│ └─ McpToolView (fallback)│ - │ ├─ resourceCache │ └───────────────────────────┘ - │ └─ toolAssociations│ - └─────────────────────┘ -``` - -On session start, `AgentService` passes the active MCP server configs to `McpAppsService`, which connects to each server over Streamable HTTP and discovers UI resources. It lists the server's resources looking for `ui://` URIs with mime type `text/html;profile=mcp-app`, then maps each resource to its associated tool via the tool's `_meta.ui.resourceUri` field. The HTML content is fetched and cached in memory (capped at 5MB per resource). - -### Shared Types - -Schemas and event types for MCP Apps live in `src/shared/types/mcp-apps.ts` rather than in the service directory, since both processes need them. This file defines the Zod schemas for tRPC input/output, the `McpUiResource` interface, tool-to-UI association types, and the `McpAppsServiceEvent` constants. - -### Main Process - -`McpAppsService` (`src/main/services/mcp-apps/service.ts`) manages MCP server connections and acts as a proxy between the renderer and remote MCP servers. It extends `TypedEventEmitter` to push tool input/result/cancellation events to the renderer via tRPC subscriptions. - -`AgentService` hooks into the ACP `sessionUpdate` callback to intercept tool call updates for MCP tools (those prefixed with `mcp__`). It forwards tool inputs and results to `McpAppsService`, which re-emits them as typed events. - -The tRPC router (`src/main/trpc/routers/mcp-apps.ts`) exposes: - -- `getUiResource` / `hasUiForTool` — queries for UI resource lookup -- `proxyToolCall` / `proxyResourceRead` — mutations that forward calls to the remote MCP server, with visibility checks (tools marked as model-only are rejected) -- `openLink` — opens URLs via `shell.openExternal`, restricted to http/https -- `onToolInput` / `onToolResult` / `onToolCancelled` — per-tool filtered subscriptions - -### Renderer - -The renderer feature lives in `src/renderer/features/mcp-apps/`: - -``` -mcp-apps/ -├── components/ -│ ├── McpToolBlock.tsx # McpToolView + optional McpAppHost below -│ ├── McpAppHost.tsx # Iframe host with inline/fullscreen display modes -│ └── McpToolView.tsx # Standard MCP tool call rendering (moved from sessions/) -├── hooks/ -│ └── useAppBridge.ts # AppBridge lifecycle, message routing, context sync -└── utils/ - ├── mcp-app-csp.ts # CSP generation from server-declared domains - ├── mcp-app-sandbox-proxy.ts # Generates the outer sandbox iframe HTML - ├── mcp-app-host-utils.ts # Tool key parsing, container dimension helpers - └── mcp-app-theme.ts # Maps Radix theme tokens to MCP App CSS variables -``` - -`McpToolBlock` is the entry point, rendered from `ToolCallBlock` for any `mcp__` tool. It always renders `McpToolView` (the pre-existing MCP tool call display, moved here from `sessions/`). When the tool has a UI resource and the server isn't disabled in settings, it additionally renders `McpAppHost` below the tool view. This keeps the standard tool call display (input preview, status, expandable output) visible regardless of whether an app is present. - -### Sandbox Model - -Apps run inside a double-iframe sandbox. The outer iframe loads a generated proxy page (`mcp-app-sandbox-proxy.ts`) with `sandbox="allow-scripts allow-same-origin ..."`. The proxy receives the app's HTML from the host via postMessage and injects it into an inner iframe with a server-declared CSP meta tag. This isolates the app's DOM from the host while still allowing structured communication over the bridge. - -### App Bridge - -`useAppBridge` manages the host side of the `@modelcontextprotocol/ext-apps` `AppBridge`. It handles the full lifecycle: waiting for the sandbox proxy to signal readiness, creating the bridge with a `PostMessageTransport`, sending the HTML resource into the inner iframe, and tearing down on unmount. - -The bridge routes app requests to tRPC mutations in the main process — tool calls, resource reads, and link opens all proxy through `McpAppsService`. It also forwards host context changes (theme, display mode, container dimensions) to the app when those values change, and handles app-initiated actions like display mode requests and messages that get routed to the draft store. - -`sendWhenReady` buffers bridge calls until the app has finished its initialization handshake, then flushes them. This lets the component forward tool results from tRPC subscriptions without worrying about race conditions with app startup. - -### Disabling MCP Apps - -Users can disable MCP Apps per server via `settingsStore.mcpAppsDisabledServers`. When a server is disabled, `McpAppsService` skips connecting to it and the renderer falls back to `McpToolView`. - -## Code Style - -See [CLAUDE.md](./CLAUDE.md) for linting, formatting, and import conventions. - -Key points: - -- Use path aliases (`@main/*`, `@renderer/*`, etc.) -- No barrel files - import directly from source -- Use `logger` instead of `console.*` diff --git a/apps/code/SCHEMA.md b/apps/code/SCHEMA.md deleted file mode 100644 index 5ecc56d98..000000000 --- a/apps/code/SCHEMA.md +++ /dev/null @@ -1,250 +0,0 @@ -# Analytics Event Schema - -Naming conventions and the canonical catalog of PostHog events emitted by the desktop app. The authoritative type definitions live in [`src/shared/types/analytics.ts`](./src/shared/types/analytics.ts) — this doc explains the *why* and what each event means. - -Two PostHog clients emit events: - -- **Renderer** (`posthog-js`) via `track(eventName, properties)` in [`src/renderer/utils/analytics.ts`](./src/renderer/utils/analytics.ts). -- **Main process** (`posthog-node`) via `trackAppEvent(eventName, properties)` in [`src/main/services/posthog-analytics.ts`](./src/main/services/posthog-analytics.ts). - -Both register a super-property `team: "posthog-code"` on every event. All event names and property types are defined in `ANALYTICS_EVENTS` and `EventPropertyMap` — adding a new event without entries there will fail typechecking. - ---- - -## Naming conventions - -### Event names - -- **Format**: `Object verbed` — Title Case, sentence-cased, spaces between words. -- **First word is the object** (`Task`, `Prompt`, `Branch`, `File`, `Setup discovery`, `Onboarding`). -- **Second word is a past-tense verb** (`created`, `viewed`, `sent`, `started`, `completed`, `failed`, `cancelled`). -- **Only the first word is capitalized.** Spell out abbreviations (`Pull request created`, not `PR created`). -- **Group by object, not by feature.** Prefer `Branch linked` over `Workspace branch linked`. -- **Use generic events with a discriminator property over many bespoke events** when the shape is the same — e.g. `Setting changed` with `setting_name` instead of `Theme changed` + `Font changed` + ... -- **Do not prefix events with `First`** — "first X" is always derivable in PostHog from the first occurrence of `X` per distinct ID. Emit `X`, not `First X`. - -✅ `Task created`, `Prompt sent`, `Setup discovery completed`, `Onboarding step completed` -❌ `task_created`, `TaskCreated`, `created_task`, `userClickedSendButton`, `PR created` - -### Property names - -- **snake_case**, lowercase, no leading underscore. -- **Booleans**: prefix with `is_`, `has_`, or `can_` (`is_initial`, `has_branch`, `has_uncommitted_changes`). -- **Counts**: suffix with `_count` (`event_count`, `staged_file_count`, `total_discovered`). -- **Durations / sizes**: suffix with the unit (`duration_seconds`, `entry_age_seconds`, `prompt_length_chars`). -- **IDs**: suffix with `_id` (`task_id`, `discovery_task_run_id`, `discovered_task_id`). -- **Enums**: suffix with `_type`, `_mode`, `_source`, `_kind`, `_reason`, `_action`, or use the bare noun if obvious (`category`, `region`). -- **Pairs**: when an event captures a transition, use `from_*` / `to_*` (`from_mode`, `to_mode`, `from_value`, `to_value`). - -✅ `task_id`, `is_initial`, `duration_seconds`, `prompt_length_chars`, `repository_provider` -❌ `taskId`, `initial`, `duration`, `promptLength`, `repo_provider_type` (redundant suffix) - -### Enum values - -- **snake_case strings**, lowercase. e.g. `"user_cancelled"`, `"stale_feature_flag"`. -- **Never `true`/`false` as a state value** — use a meaningful enum (`"completed"` / `"cancelled"` / `"failed"`, not `success: true/false` unless it really is just success). -- **Open-ended values are fine** when the set evolves freely (e.g. `setting_name`, `tour_id`). Closed enums get a TypeScript union in `analytics.ts`. - -### What does *not* go into properties - -- **No PII** in event names or property values. No email addresses, full names, file paths, prompt contents, or repo URLs. Hash if you need to dedupe (`path_hash`). -- **No free-form strings** when an enum will do. If you find yourself writing `category: "bug" | "security" | ...`, define the union once in `analytics.ts`. -- **No giant payloads.** If the value can be reconstructed from another event + an ID, store the ID. - -### Adding a new event - -1. Add the constant to `ANALYTICS_EVENTS` in [`src/shared/types/analytics.ts`](./src/shared/types/analytics.ts). -2. Add the property interface (even if empty — use `never` for no-prop events). -3. Register it in `EventPropertyMap`. -4. Call `track(ANALYTICS_EVENTS.MY_EVENT, { … })` in the renderer or `trackAppEvent(...)` in main. -5. Add a row to the catalog below. - ---- - -## Common properties - -These appear across many events and should always use the same name and type when present. - -| Property | Type | Meaning | -|---|---|---| -| `task_id` | `string` | The task UUID. | -| `task_run_id` | `string` | The agent run UUID inside a task. | -| `execution_type` | `"local" \| "cloud"` | Where the agent runs. | -| `adapter` | `"claude" \| "codex"` | Which agent SDK adapter is in use. | -| `repository_provider` | `"github" \| "gitlab" \| "local" \| "none"` | Source of the repo associated with the task. | -| `workspace_mode` | `"local" \| "worktree" \| "cloud"` | How files are checked out for the task. | -| `source` | enum per event | Where the action originated from (button, menu, keyboard, etc.). | -| `region` | `string` | PostHog region (`us`, `eu`, etc.). | -| `project_id` | `string` | PostHog project ID. | -| `step_id` | `string` | Onboarding step identifier — matches `ONBOARDING_STEPS`. | -| `duration_seconds` | `number` | Wall-clock duration of the action. | - ---- - -## Event catalog - -### App lifecycle (main process) - -| Event | Properties | -|---|---| -| `App started` | — | -| `App quit` | — | - -### Authentication - -| Event | Properties | -|---|---| -| `User logged in` | `project_id?`, `region?` | -| `User logged out` | — | - -### Onboarding - -The first-session funnel. `step_id` ∈ `welcome`, `project-select`, `invite-code`, `github`, `install-cli` — matches the values in [`src/renderer/features/onboarding/types.ts`](./src/renderer/features/onboarding/types.ts). - -| Event | Properties | -|---|---| -| `Onboarding started` | — | -| `Onboarding step viewed` | `step_id`, `step_index`, `total_steps` | -| `Onboarding step completed` | `step_id`, `step_index`, `total_steps`, `duration_seconds` | -| `Onboarding step skipped` | `step_id`, `step_index`, `reason` | -| `Onboarding sign in initiated` | `region` | -| `Onboarding project selected` | `had_multiple_orgs`, `had_multiple_projects` | -| `Onboarding invite code submitted` | `success`, `error_type?` | -| `Onboarding folder selected` | `has_git_remote`, `repository_provider` | -| `Onboarding github connected` | — | -| `Onboarding cli check completed` | `git_installed`, `gh_installed`, `gh_authenticated` | -| `Onboarding completed` | `duration_seconds`, `github_connected`, `cli_skipped` | -| `Onboarding abandoned` | `last_step_id`, `duration_seconds` | -| `Ai consent gate shown` | `is_org_admin` | -| `Ai consent approved` | — | - -#### First-session funnel - -``` -App opened - → Onboarding started (welcome screen mounts) - → Onboarding step viewed [welcome] - → Onboarding step completed [welcome] - → Onboarding step viewed [project-select] - → Onboarding sign in initiated (clicked OAuth button) - → User logged in - → Onboarding project selected - → Onboarding step completed [project-select] - → Onboarding step viewed [invite-code] (conditional) - → Onboarding invite code submitted - → Onboarding step completed [invite-code] - → Onboarding step viewed [github] - → Onboarding folder selected - → Onboarding github connected (optional) - → Onboarding step completed [github] - → Onboarding step viewed [install-cli] - → Onboarding cli check completed - → Onboarding step completed [install-cli] (or skipped) - → Onboarding completed - → Ai consent gate shown (conditional) - → Ai consent approved (conditional) - → Setup discovery started - → Setup discovery completed - → Prompt sent (first occurrence per user = first prompt) - → Task created (ACTIVATION; first occurrence = activation) -``` - -`Onboarding abandoned` fires when the user closes the app or logs out while inside `OnboardingFlow` (i.e. the last `Onboarding step viewed` has no matching `Onboarding step completed`). - -Activation cohort: distinct ID has both `Onboarding started` and `Task created` (with `created_from: "command-menu"`) within 24h. - -### Task management - -| Event | Properties | -|---|---| -| `Task created` | `auto_run`, `created_from`, `repository_provider?`, `workspace_mode?`, `has_branch?`, `has_environment_setup?`, `has_sandbox_environment?`, `cloud_run_source?`, `cloud_pr_authorship_mode?`, `uses_worktree_link?`, `uses_worktree_include?`, `adapter?` | -| `Task viewed` | `task_id` | -| `Inbox viewed` | — | -| `Task run started` | `task_id`, `execution_type`, `initial_mode?`, `adapter?`, `model?` | -| `Task run cancelled` | `task_id`, `execution_type`, `duration_seconds`, `prompts_sent` | -| `Prompt sent` | `task_id`, `is_initial`, `execution_type`, `prompt_length_chars` | -| `Session config changed` | `task_id`, `category`, `from_value`, `to_value` | -| `Task feedback` | `task_id`, `task_run_id?`, `log_url?`, `event_count`, `feedback_type`, `feedback_comment?` | - -### Permissions - -| Event | Properties | -|---|---| -| `Permission responded` | `task_id`, `tool_name?`, `option_id?`, `option_kind?`, `custom_input?` | -| `Permission cancelled` | `task_id`, `tool_name?`, `option_id?`, `option_kind?` | - -### Git / branch - -| Event | Properties | -|---|---| -| `Git action executed` | `action_type`, `success`, `task_id?`, `staged_file_count?`, `unstaged_file_count?`, `commit_all?`, `staged_only?` | -| `Pull request created` | `task_id?`, `success` | -| `Agent file activity` | `task_id`, `branch_name` | -| `Branch linked` | `task_id`, `branch_name`, `source` | -| `Branch unlinked` | `task_id`, `source` | -| `Branch link default branch unknown` | `task_id`, `branch_name` | -| `Branch mismatch warning shown` | `task_id`, `linked_branch`, `current_branch`, `has_uncommitted_changes` | -| `Branch mismatch action` | `task_id`, `action`, `linked_branch`, `current_branch` | - -`action_type` for `Git action executed`: `push`, `pull`, `sync`, `publish`, `commit`, `commit_push`, `create_pr`, `view_pr`, `update_pr`, `branch_here`. - -### Files / diffs - -| Event | Properties | -|---|---| -| `File opened` | `file_extension`, `source`, `task_id?` | -| `File diff viewed` | `file_extension`, `change_type`, `task_id?` | -| `Diff view mode changed` | `from_mode`, `to_mode` | - -### Navigation - -| Event | Properties | -|---|---| -| `Command menu opened` | — | -| `Command menu action` | `action_type` | -| `Command center viewed` | — | -| `Skill button triggered` | `task_id`, `button_id`, `source` | - -### Settings - -| Event | Properties | -|---|---| -| `Setting changed` | `setting_name`, `new_value`, `old_value?` | - -Generic event — `setting_name` is the discriminator (`theme`, `terminal_font`, `desktop_notifications`, etc.). - -### Tour - -| Event | Properties | -|---|---| -| `Tour event` | `tour_id`, `action`, `step_id?`, `step_index?`, `total_steps?` | - -`action` ∈ `started`, `step_advanced`, `dismissed`, `completed`. - -### Setup discovery - -| Event | Properties | -|---|---| -| `Setup discovery started` | `discovery_task_id`, `discovery_task_run_id` | -| `Setup discovery completed` | `discovery_task_id`, `discovery_task_run_id`, `task_count`, `duration_seconds`, `signal_source` | -| `Setup discovery failed` | `discovery_task_id?`, `discovery_task_run_id?`, `reason`, `error_message?` | -| `Setup task selected` | `discovered_task_id`, `category`, `position`, `total_discovered` | -| `Setup task dismissed` | `discovered_task_id`, `category`, `position`, `total_discovered` | - -`category` ∈ `bug`, `security`, `dead_code`, `duplication`, `performance`, `stale_feature_flag`, `error_tracking`, `event_tracking`, `funnel`, `posthog_setup`, `experiment`. - -### Billing - -| Event | Properties | -|---|---| -| `Subscription started` | `plan_key`, `previous_plan_key?` | -| `Subscription cancelled` | `plan_key` | - -### Inbox & prompt history - -| Event | Properties | -|---|---| -| `Inbox viewed` | — | -| `Inbox interest registered` | — | -| `Prompt history opened` | `entry_count` | -| `Prompt history selected` | `entry_count`, `entry_age_seconds`, `had_pending_draft`, `had_search_query`, `prompt_length` | diff --git a/notes/CLOUD_ARCHITECTURE.md b/notes/CLOUD_ARCHITECTURE.md deleted file mode 100644 index 9fd440828..000000000 --- a/notes/CLOUD_ARCHITECTURE.md +++ /dev/null @@ -1,759 +0,0 @@ -# Cloud Mode Architecture - -## The Challenge - -Cloud coding agents face a fundamental tension: you want them to feel like your laptop, but they're not. You want the experience of running locally—real-time feedback, files on your disk, your IDE, full control—but the convenience of interacting from your phone or Slack while you're away. - -This creates two distinct experiences: - -**Interactive Mode** — "I'm watching" - -- Real-time feedback as the agent works -- You can interrupt, redirect, answer questions -- Feels like pair programming - -**Background Mode** — "Wake me when it's done" - -- Agent works autonomously -- You check in when you're ready -- Review changes, pull them locally, continue -- Feels like delegating to a colleague - -Most cloud agent implementations force you to choose one or the other. The goal here is to support both seamlessly—and let you switch between them without friction. - -### Key Goals - -1. **Seamless handoff** — Move sessions between local and cloud without losing state -2. **Local-first feel** — Edit in PostHog Code or your IDE, changes sync automatically -3. **Survive disconnection** — Close your laptop, agent keeps working -4. **Seamless resume** — Reconnect and catch up instantly -5. **Multiple clients** — Laptop, phone, Slack, API—all work -6. **Simple recovery** — If sandbox dies, state is recoverable -7. **Resume anywhere** — Stop on cloud, resume on local (or vice versa) - ---- - -## Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ CLIENTS │ -│ PostHog Code Desktop │ Slack Bot │ API │ Mobile App │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - │ Streamable HTTP (/sync) - │ POST (commands) + GET (SSE events) - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ POSTHOG BACKEND │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ /sync Endpoint │ │ -│ │ │ │ -│ │ POST /sync ──► Kafka ──┬──► SSE consumers (GET /sync) │ │ -│ │ └──► DynamoDB consumer (persistence) │ │ -│ └─────────────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ -│ │ Temporal │ │ Kafka │ │ Storage │ │ -│ │ Workflow │ │ (event bus) │ │ - DynamoDB │ │ -│ │ (lifecycle) │ │ │ │ - S3 (trees) │ │ -│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ -│ │ │ -└──────────┼──────────────────────────────────────────────────────────────┘ - │ provision_sandbox - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ SANDBOX (Docker/Modal) │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────┐ │ -│ │ @posthog/agent/server │ │ -│ │ │ │ -│ │ GET /sync ◄── receives commands (user_message, cancel, stop) │ │ -│ │ POST /sync ──► emits events (agent_message, tool_call, etc.) │ │ -│ │ │ │ -│ │ + ACP connection to Claude CLI subprocess │ │ -│ │ + TreeTracker for capturing file state → POST /sync → S3 │ │ -│ └───────────────────────────┬──────────────────────────────────────┘ │ -│ │ │ -│ │ ACP (Agent Client Protocol) │ -│ ▼ │ -│ ┌─────────────────────┐ │ -│ │ Claude CLI │ │ -│ │ (subprocess) │ │ -│ └─────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────┐ │ -│ │ Git Repository │ │ -│ └─────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────────┘ -``` - -**Data flow:** - -1. Client or Agent POSTs event to `/sync` -2. Backend publishes to Kafka -3. Kafka consumers: - - SSE consumers stream to connected clients (GET /sync) - - DynamoDB consumer persists events -4. For tree snapshots: Backend also uploads archive to S3 - -**Bidirectional /sync:** Both clients and agent use the same endpoint pattern: - -- **Clients:** POST commands (user_message, cancel) → GET events (agent responses) -- **Agent:** GET commands (user_message, cancel) → POST events (agent_message, tool_call, tree_snapshot) - -**Key insight:** Kafka is the event bus. All events flow through it. DynamoDB is just a consumer that persists for replay/resume. S3 stores tree archives (large binary snapshots). - ---- - -## Storage Architecture - -### Events → Kafka → Consumers - -All events flow through Kafka as the central event bus: - -``` -POST /sync ──► Backend ──► Kafka ──┬──► DynamoDB consumer (persistence) - └──► SSE consumers (real-time to clients) -``` - -**Why Kafka as the event bus:** - -- Decouples producers from consumers -- Multiple consumers can read independently (SSE, DynamoDB, analytics, etc.) -- Replay capability if a consumer falls behind -- Handles backpressure gracefully - -**Why agents/clients don't write directly to Kafka:** - -- Simpler implementation (just HTTP calls to /sync) -- Backend can add metadata, validate, rate-limit -- Single source of truth for event routing logic -- No Kafka credentials needed in sandbox - -### Tree Archives → S3 - -Tree archives (compressed working directory snapshots) still go to S3: - -``` -S3 Structure: - trees/ - {tree_hash}.tar.gz → compressed tree contents - {tree_hash}.manifest → file listing with hashes -``` - -**Why S3 for trees:** - -- Large binary blobs (tens/hundreds of MB) -- Infrequent access (only on resume) -- Cost-effective for storage - -### DynamoDB Schema - -**Table: `agent_events`** - -| Key | Type | Description | -| --------------------- | ------ | -------------------------------------------------- | -| `pk` (Partition Key) | String | `{task_id}#{run_id}` - groups all events for a run | -| `event_id` (Sort Key) | Number | Backend-assigned via internal counter per run. Stringified for SSE `id:` field. | - -**Event ID assignment:** The backend maintains an internal counter per `{task_id}#{run_id}` partition. This counter is incremented on each event received via POST /sync. The numeric ID is converted to a string for the SSE wire format (spec requires strings). When clients reconnect, they send `Last-Event-ID: "123"` which the backend parses back to a number for DynamoDB range queries. - -**Attributes:** - -| Attribute | Type | Description | -| ----------- | ----------------- | ------------------------------------------- | -| `version` | Number | Schema version (always 1 for now) | -| `timestamp` | String (ISO 8601) | When the event occurred | -| `method` | String | Event type (e.g., `_posthog/tree_snapshot`) | -| `params` | Map | Event-specific parameters | - -Team/user context comes from the task lookup in Postgres—no need to duplicate here. Note: Postgres is part of the existing PostHog infrastructure where task metadata lives; DynamoDB is specifically for the event stream storage. - -**Example event:** - -```json -{ - "pk": "task_123#run_456", - "event_id": 42, - "version": 1, - "timestamp": "2025-01-29T12:00:00Z", - "method": "_posthog/tree_snapshot", - "params": { - "treeHash": "abc123", - "baseCommit": "def456", - "device": { "id": "dev_1", "type": "local", "name": "MacBook Pro" } - } -} -``` - -**Access Patterns:** - -| Pattern | Operation | Key Condition | -| ---------------------------- | --------- | ------------------------------------------------------------ | -| Append event | `PutItem` | `pk={task_id}#{run_id}` | -| Get all events for run | `Query` | `pk={task_id}#{run_id}` | -| Get events after ID (resume) | `Query` | `pk={task_id}#{run_id}`, `event_id > {last_event_id}` | -| Get recent events (reverse) | `Query` | `pk={task_id}#{run_id}`, `ScanIndexForward=false`, `Limit=N` | -| Get latest by method | `Query` | `pk={task_id}#{run_id}`, `ScanIndexForward=false`, FilterExpression on `method` | - -**No GSI needed:** Task/run lookups happen in Postgres first, then DynamoDB is queried by the specific `pk`. No need to query DynamoDB across tasks. - -**Why DynamoDB:** - -- Serverless, scales automatically with demand -- Single-digit millisecond latency for key-value access -- Cost-effective for append-heavy workloads -- Simple key design matches access pattern perfectly - -**Retention:** Events are kept permanently (conversation history). Only S3 tree archives have TTL (30 days) since they're large and recoverable from git commits. - -**Capacity:** On-demand mode recommended. Typical task run generates ~100-1000 events. - ---- - -## Tree-Based Storage - -### Git Trees Instead of Individual Files - -Instead of uploading every file change, we use `git diff-tree` to capture state changes as trees. This is more efficient and aligns with how git already tracks changes. - -**Benefits:** - -- Atomic snapshots (entire working state, not individual files) -- Efficient transfer (only changed trees uploaded) -- Natural git integration (trees are git's native unit) -- Simpler recovery (restore a tree, not replay file events) - -### Tree Capture Flow - -``` -Agent works on files - │ - ▼ -TreeTracker detects significant change -(commit, tool completion, or periodic) - │ - ├──► git write-tree (capture current state) - │ - ├──► git diff-tree (compare to last snapshot) - │ - ├──► Pack changed files into tree archive - │ - └──► POST /sync with _posthog/tree_snapshot event - │ - ▼ - Backend handles: - ├──► PUT to S3: trees/{tree_hash}.tar.gz (archive only) - └──► Kafka ──┬──► DynamoDB consumer (persistence) - └──► SSE consumers (real-time to clients) -``` - -### When Trees Are Captured - -- After each git commit -- After significant tool completions (file writes, bash commands) -- On stop (final tree before shutdown) -- Periodically (every N minutes of activity) - ---- - -## Resume & State - -Since tree snapshots are captured continuously via POST /sync, we can resume from any point. There's no special "pause" operation—state just exists. - -### State = Task + Tree - -Everything needed to resume is in DynamoDB (events) and S3 (tree archives): - -```typescript -// From the latest tree_snapshot event in DynamoDB -interface ResumeState { - taskId: string; - baseCommit: string; // Git commit the tree is based on - treeHash: string; // The diff-tree reference - treeUrl: string; // S3 location of tree archive -} -``` - -To resume a task anywhere: - -1. Find task by `taskId` -2. Query DynamoDB for task run events via backend API -3. Find latest `tree_snapshot` event with `baseCommit` + `treeHash` + `treeUrl` -4. Download tree archive from S3 -5. Restore from there - -### Resume Flow - -``` -resumeFromLog(taskId, runId) called - │ - ├──► Fetch events from backend API (queries DynamoDB) - │ - ├──► Parse events to find latest tree_snapshot - │ - ├──► Return resume state: { latestSnapshot, interrupted } - │ - └──► Agent server sets TreeTracker to last known state - │ - ▼ - Agent continues where it left off -``` - -### Handoff Scenarios - -All handoffs are just: stop current environment, resume elsewhere. - -**Local → Cloud:** - -``` -Local PostHog Code Backend Cloud Sandbox - │ │ │ - │── stop local agent │ │ - │ (tree snapshot via │ │ - │ POST /sync) │ │ - │ │ │ - │── startCloud(task_id) ────►│ │ - │ │── provision sandbox ────────────►│ - │ │── start AgentServer ────────────►│ - │ │ │── resumeFromLog() - │ │◄── ready ────────────────────────│ - │◄── connected ──────────────│ │ -``` - -**Cloud → Local:** - -``` -Cloud Sandbox Backend Local PostHog Code - │ │ │ - │── stop() ─────────────────►│ │ - │ (final tree via │ │ - │ POST /sync) │ │ - │── shutdown ────────────────│ │ - │ │ │ - │ │◄── pullToLocal(task_id) ─────────│ - │ │ │── resumeFromLog() - │ │ │── restore from DynamoDB logs + S3 trees - │ │ │── continue locally -``` - -**Resume later (any environment):** - -``` -... time passes ... - │ - │── resumeFromLog(task_id) ──────► query DynamoDB, restore tree from S3 - │── continue working -``` - -### Robustness Requirements - -Resume must handle: - -1. **Partial uploads** — Tree upload must complete before stop confirms -2. **Large repos** — Stream tree archives, don't load in memory -3. **Network failures** — Retry with exponential backoff -4. **Conversation replay** — Rebuild conversation from log events -5. **Concurrent access** — Prevent two environments from running same task simultaneously - ---- - -## State & Recovery - -### Events in DynamoDB = Recovery - -Recovery is just `resumeFromLog(taskId, runId)`. DynamoDB has all events: - -``` -DynamoDB Events (pk = {task_id}#{run_id}): - - { method: "_posthog/git_commit", params: { sha: "abc123", device: { id: "dev_1", type: "local" } } } - { method: "_posthog/tree_snapshot", params: { treeHash: "def456", baseCommit: "abc123", device: { id: "dev_1", type: "local" }, ... } } - { method: "_posthog/user_message", params: { content: "..." } } - { method: "agent_message_chunk", params: { text: "..." } } - -- handoff to cloud -- - { method: "_posthog/git_commit", params: { sha: "ghi789", device: { id: "sandbox_x", type: "cloud" } } } - { method: "_posthog/tree_snapshot", params: { treeHash: "jkl012", device: { id: "sandbox_x", type: "cloud" }, ... } } -``` - -Device info is embedded in `params` for events that track it (snapshots, commits). Standard ACP events like `agent_message_chunk` don't need device tracking. Device changes are visible naturally in the event stream—no explicit handoff events needed. - -**To resume:** Query DynamoDB for latest `tree_snapshot` (query in reverse order, filter by method), download archive from S3, restore from it. - -**If tree expired in S3:** Fall back to latest `git_commit` (loses uncommitted work). - -### Trees vs Commits - -| Mechanism | When | What's captured | Durability | -| ------------- | ------------------------------- | -------------------------- | ---------------------------- | -| Tree snapshot | After tool completions, on stop | Working tree (uncommitted) | 30 days in S3 | -| Git commit | On significant changes | Committed files | Permanent (pushed to remote) | - -**Best practice:** Agent commits frequently so that even if trees expire, minimal work is lost. - -### Data Retention - -| Data | Storage | Retention | Recovery | -| ------------- | ----------- | --------- | ---------------------------------------- | -| Git commits | Remote repo | Permanent | Always recoverable (committed work only) | -| Tree archives | S3 | 30 days | Full state including uncommitted | -| Event history | DynamoDB | Permanent | Conversation + history | - ---- - -## Agent Architecture - -### The AgentServer (in @posthog/agent) - -The agent server runs in cloud sandboxes (Docker/Modal). It lives in the `@posthog/agent` package: - -``` -packages/agent/src/server/ -├── agent-server.ts # Main AgentServer class -├── index.ts # CLI entry point + exports -└── types.ts # AgentServerConfig, DeviceInfo, TreeSnapshot -``` - -Exported via `@posthog/agent/server` subpath. - -### How It Works - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ AgentServer (cloud sandbox) │ -│ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ SSE Connection │ │ -│ │ (GET /sync from backend) │ │ -│ │ │ │ -│ │ Receives: user_message, cancel, stop commands │ │ -│ └──────────────────────────┬───────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ ACP Connection │ │ -│ │ (to Claude CLI subprocess) │ │ -│ │ │ │ -│ │ clientConnection.prompt() → sessionUpdate callbacks │ │ -│ └──────────────────────────┬───────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ TreeTracker │ │ -│ │ (captures file state) │ │ -│ │ │ │ -│ │ After file changes → _posthog/tree_snapshot events │ │ -│ └──────────────────────────┬───────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ POST /sync │ │ -│ │ (persist events to backend) │ │ -│ └─────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Key Methods - -- `start()` — Connect SSE, initialize ACP, resume from previous state, process initial prompt -- `stop()` — Capture final tree state, cleanup connections -- `handleUserMessage()` — Process user prompt via `clientConnection.prompt()` -- `captureTreeState()` — Capture and emit `_posthog/tree_snapshot` events - -### Dependencies - -- Internal modules: TreeTracker, resumeFromLog, CloudConnection -- `@agentclientprotocol/sdk` — ACP protocol (ClientSideConnection) - -### Message Types - -- `user_message` — New prompt or response to question -- `cancel` — Stop current operation -- `stop` — Shut down agent (writes final tree, then exits) - -**How commands reach the agent:** On startup, the agent server opens an outbound SSE connection to the backend (`GET /sync`). When a client sends a command via `POST /sync`, the backend publishes to Kafka. The agent's SSE consumer receives commands from Kafka, giving low-latency interactive feel. - -### Agent Resume API - -The AgentServer uses `resumeFromLog` to restore state: - -```typescript -// In packages/agent/src/server/agent-server.ts -private async resumeFromPreviousState(): Promise { - const resumeState = await resumeFromLog({ - taskId, - runId, - repositoryPath, - apiClient, // PostHogAPIClient fetches logs from backend - logger, - }) - - if (resumeState.latestSnapshot) { - // Set tree tracker to continue from last known state - this.treeTracker.setLastTreeHash(resumeState.latestSnapshot.treeHash) - } -} -``` - -The `resumeFromLog` function: - -1. Fetches task run logs from the backend API (which reads from DynamoDB) -2. Parses NDJSON entries to find latest `_posthog/tree_snapshot` -3. Returns the resume state including latest snapshot and interrupted flag - -**Stop implementation:** - -```typescript -async stop(): Promise { - // 1. Capture final tree state via POST /sync - await this.captureTreeState({ interrupted: true, force: true }) - - // 2. Clean up ACP connection - if (this.acpConnection) { - await this.acpConnection.cleanup() - } - - // 3. Close SSE connection - this.sseAbortController?.abort() -} -``` - -### Temporal Workflow - -Temporal handles **lifecycle only**, not message routing: - -```python -@workflow.defn -class CloudSessionWorkflow: - - @workflow.signal - def stop(self): - self.should_stop = True - - @workflow.run - async def run(self, input: SessionInput): - # Always provision fresh - resume logic is in the agent - sandbox_id = await provision_sandbox(input) - - # Agent handles resume(task_id) internally if resuming - await start_agent_server(sandbox_id, task_id=input.task_id) - - while not self.should_stop: - try: - await workflow.wait_condition( - lambda: self.should_stop, - timeout=timedelta(minutes=10) - ) - except asyncio.TimeoutError: - # Inactivity timeout - agent writes final tree on stop - break - - # Tell agent to stop (it will write final tree) - await stop_agent(sandbox_id) - await cleanup_sandbox(sandbox_id) -``` - -**Key behaviors:** - -- Temporal provisions sandbox and handles cleanup -- Messages/commands flow through Kafka to the agent's SSE connection (not through Temporal) -- Agent handles resume internally (reads state from backend API → DynamoDB logs + S3 trees) -- 10-min inactivity triggers stop -- Agent always writes tree on stop → always resumable - ---- - -## PostHog Code Integration - -In PostHog Code, the `AgentService` (main process) talks to agents through a provider interface. For cloud mode, we swap the provider without changing the rest of the app. - -``` -Renderer ──tRPC──► AgentService ──► SessionProvider - │ - ┌─────────────┴─────────────┐ - │ │ - ▼ ▼ - LocalProvider CloudProvider - (in-process SDK) (SSE to backend) -``` - -**The provider interface** (simplified): - -```typescript -interface SessionProvider { - readonly capabilities: SessionCapabilities; - readonly executionEnvironment: "local" | "cloud"; - - connect(config: SessionConfig): Promise; - disconnect(): Promise; - prompt(blocks: ContentBlock[]): Promise<{ stopReason: string }>; - cancelPrompt(): Promise; - - onEvent(handler: (event: AcpMessage) => void): void; -} -``` - -**Key files:** - -Array packages: - -- `packages/agent/` — Core agent SDK (createAcpConnection, TreeTracker, CloudConnection, resumeFromLog) -- `packages/agent/src/server/` — Cloud sandbox runner (AgentServer class, exported via `@posthog/agent/server`) -- `packages/core/` — Shared business logic for jj/GitHub operations - -PostHog Code app: - -- `apps/code/src/main/services/agent/service.ts` — AgentService, picks provider type -- `apps/code/src/main/services/agent/providers/local-provider.ts` — Local ACP/SDK logic -- `apps/code/src/main/services/agent/providers/cloud-provider.ts` — Cloud SSE logic (uses CloudConnection) - -PostHog backend (not in this repo): - -- `products/tasks/backend/api.py` — /sync endpoint (POST + SSE) -- `products/tasks/backend/sync/router.py` — Kafka event routing -- `products/tasks/temporal/process_task/` — Temporal workflow for sandbox lifecycle - ---- - -## Communication Protocol - -### Streamable HTTP - -Following [MCP's pattern](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http): - -- **POST** — Client sends messages (user input, cancel, stop) -- **GET** — Client opens SSE stream for server events -- **Session-Id header** — Identifies the session (run ID) -- **Last-Event-ID header** — Resume from where you left off - -### Endpoint - -``` -/api/projects/{project_id}/tasks/{task_id}/runs/{run_id}/sync -``` - -### Sending Messages (POST) - -```http -POST /sync -Content-Type: application/json -Session-Id: {run_id} - -{ - "jsonrpc": "2.0", - "method": "_posthog/user_message", - "params": { "content": "Please fix the auth bug" } -} -``` - -Response: `202 Accepted` - -### Receiving Events (GET) - -```http -GET /sync -Accept: text/event-stream -Session-Id: {run_id} -Last-Event-ID: 123 -``` - -```http -HTTP/1.1 200 OK -Content-Type: text/event-stream - -id: 124 -data: {"jsonrpc":"2.0","method":"_posthog/tree_snapshot","params":{"treeHash":"abc123","baseCommit":"def456","filesChanged":["src/auth.py"]}} - -id: 125 -data: {"jsonrpc":"2.0","method":"agent_message_chunk","params":{"text":"I found the issue..."}} -``` - -**Event replay:** When `Last-Event-ID` is provided, backend replays missed events from storage, then continues with live events. - -### Why SSE + Kafka + DynamoDB? - -- **Kafka** — Real-time event streaming, handles multiple consumers -- **DynamoDB** — Low-latency event storage with efficient key-based queries -- **SSE** — Works with load balancing, built-in resumability via `Last-Event-ID` -- No WebSocket state to manage across pods - -The backend handles all storage concerns. The agent and clients only interact via the `/sync` endpoint (POST to send, GET for SSE). - ---- - -## Client Modes - -### Interactive (Connected) - -``` -Client Backend Sandbox - │ │ │ - │── GET /sync (SSE) ────────────►│◄── GET /sync (SSE) ───────────│ - │ │ │ - │◄── tree_snapshot ──────────────│◄── POST /sync ───────────────│ - │◄── agent_message ──────────────│◄── POST /sync ───────────────│ - │ │ │ - │── POST /sync {message} ───────►│── (via Kafka) ───────────────►│ - │◄── 202 Accepted ───────────────│ │ -``` - -### Background (Disconnected) - -``` - Backend Sandbox - │ │ - │◄── agent keeps working ───────│ - │◄── POST /sync ───────────────│ - │ │ │ - │ ▼ │ - │ DynamoDB (events) │ - │ S3 (tree archives) │ - │ │ - │ (no client connected) │ -``` - -Agent continues autonomously. Events persist to DynamoDB via POST /sync → Kafka → DynamoDB consumer. - -### Resume (Reconnect) - -``` -Client Backend - │ │ - │── GET /sync ──────────────────►│ - │ Last-Event-ID: 50 │ - │ │── Query DynamoDB (event_id > 50) - │◄── id:51 (from DynamoDB) ──────│ Replay missed events - │◄── id:52 ──────────────────────│ - │◄── ... ────────────────────────│ - │◄── id:100 (live from Kafka) ───│ Switch to live stream -``` - -Client catches up from DynamoDB, then receives live events via Kafka. - ---- - -## Event Format - -We will use the standard ACP format being used already. Some key events we will need here are: - -**State tracking:** - -- `_posthog/tree_snapshot` — Working tree captured (includes treeHash, baseCommit, files list) -- `_posthog/git_commit` — Agent committed changes - -**Mode:** - -- `_posthog/mode_change` — Switched between interactive/background (background disables questions) - ---- - -## References - -- [MCP Streamable HTTP Transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports) -- [Agent Client Protocol (ACP)](https://github.com/anthropics/acp) -- [Temporal Signals](https://docs.temporal.io/workflows#signal) -- [DynamoDB Documentation](https://docs.aws.amazon.com/dynamodb/) -- [Kafka Documentation](https://kafka.apache.org/documentation/)