Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/test-benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ pnpm test:bench --threshold 30 # 墙钟超 30s 则 exit 1(CI 回归闸)

## 当前瓶颈 & 后续可优化项

16 核墙钟的下限(~10s)现在由 8 个**真实 spawn `node dist/cli.js`** 的集成测试门控(`workflow-cli` / `workflow-cli-ls-tail` / `preset-export-cli` / `worker-budget-cli` / `workflow-c0-isolation` / `seed-adapter` / `hook-installer` / `tmux-env-isolation`)。它们 **CPU-throughput-bound**:14 个 vitest fork + 每用例再 spawn 一个 node 子进程,16 核被超额订阅,单文件耗时在 5–9s 间抖动。
16 核墙钟的下限(~10s)现在由 7 个**真实 spawn `node dist/cli.js`** 的集成测试门控(`workflow-cli` / `workflow-cli-ls-tail` / `preset-export-cli` / `workflow-c0-isolation` / `seed-adapter` / `hook-installer` / `tmux-env-isolation`)。它们 **CPU-throughput-bound**:14 个 vitest fork + 每用例再 spawn 一个 node 子进程,16 核被超额订阅,单文件耗时在 5–9s 间抖动。

- **(大杠杆,需取舍)in-process 跑 CLI。** 从 4138 行的 `src/cli.ts` 抽出可测的 `main(argv): Promise<number>` 入口,让这 8 个文件直接进程内调用而非 spawn `node`。能同时砍掉串行总工作量和并行争抢,但牺牲「真实启动二进制 / argv 解析 / 退出码」的保真度,是较大的重构。
- **(CI 侧)`vitest --shard=i/N`** 跨 runner 分片,缩短 CI 墙钟(不影响本地)。
Expand Down
17 changes: 17 additions & 0 deletions src/bot-registry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,18 @@ export interface BotConfig {
*/
sandboxHidePaths?: string[];
backendType?: BackendType;
/**
* Max simultaneously-LIVE sessions for this bot. When the bot's live session
* count exceeds this, the idle-worker sweeper suspends its longest-idle,
* not-currently-busy sessions (resumable backends only) down to the cap — the
* worker AND the CLI are killed to reclaim memory, and the session
* cold-resumes from its on-disk transcript on the next message. Unset → the
* built-in default {@link DEFAULT_MAX_LIVE_WORKERS} (30); an explicit positive
* integer overrides it. Pure count-based: there is NO idle-time threshold.
* Configured per bot from the dashboard (Groups & Bots → bot card). Adopted
* sessions are never suspended. See core/idle-worker-sweeper.ts.
*/
maxLiveWorkers?: number;
workingDir?: string;
workingDirs?: string[];
allowedUsers?: string[];
Expand Down Expand Up @@ -752,6 +764,11 @@ export function parseBotConfigsFromText(jsonText: string): BotConfig[] {
? entry.sandboxHidePaths.filter((p: unknown): p is string => typeof p === 'string' && !!p.trim())
: [],
backendType: entry.backendType,
// Positive integer only; ≤0 / non-int / absent → undefined (= no cap).
maxLiveWorkers: typeof entry.maxLiveWorkers === 'number'
&& Number.isInteger(entry.maxLiveWorkers) && entry.maxLiveWorkers > 0
? entry.maxLiveWorkers
: undefined,
workingDir: workingDirs?.[0] ?? entry.workingDir,
workingDirs,
allowedUsers: entry.allowedUsers,
Expand Down
90 changes: 1 addition & 89 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
* botmux delete <id> — close a session by ID prefix
* botmux delete all — close all active sessions
* botmux autostart enable|disable|status — manage boot-time autostart (launchd / user systemd)
* botmux worker-budget status|set|unset — inspect/override idle worker suspension budget
*/
import { execSync, execFileSync, spawnSync, spawn } from 'node:child_process';
import { existsSync, mkdirSync, copyFileSync, readFileSync, writeFileSync, renameSync, readdirSync, readlinkSync, appendFileSync, statSync, unlinkSync } from 'node:fs';
Expand Down Expand Up @@ -71,8 +70,7 @@ import {
} from './utils/bot-routing.js';
import { isLocale, localeForBot, setDefaultLocale, SUPPORTED_LOCALES, t, type Locale } from './i18n/index.js';
import { type Brand, chatAppLink, larkHosts, normalizeBrand, sdkDomain } from './im/lark/lark-hosts.js';
import { mergeGlobalConfig, readGlobalConfig, setGlobalLocale, globalConfigPath, type WorkerConfig } from './global-config.js';
import { detectWorkerResources, resolveWorkerBudget } from './core/worker-budget.js';
import { mergeGlobalConfig, readGlobalConfig, setGlobalLocale, globalConfigPath } from './global-config.js';
import { buildBridgeSendMarkerContent } from './services/bridge-fallback-gate.js';
import { writeManualIntentIfAbsentTo } from './services/restart-intent-store.js';

Expand Down Expand Up @@ -2591,9 +2589,6 @@ botmux v${getVersion()} — IM ↔ AI 编程 CLI 桥接
autostart enable 注册开机自启(macOS launchd / Linux user systemd,无需 sudo)
autostart disable 注销开机自启
autostart status 查看自启状态
worker-budget [status] 查看 idle worker 自动暂停预算
set --max-live-workers N [--idle-minutes N]
覆盖全局 worker 预算,写入 ~/.botmux/config.json(agent 推荐用命令改,不手写 JSON)
unset 清除 worker 预算覆盖,恢复按机器 CPU/内存自动推导
lang [zh|en] 切换 UI 语言(无参 = 查看当前设置)
--bot N 仅改 bots.json 中第 N 个 bot 的 lang
Expand Down Expand Up @@ -4953,88 +4948,6 @@ async function cmdLang(args: string[]): Promise<void> {
await reportLocaleApplied();
}

// ─── botmux worker-budget ───────────────────────────────────────────────────

function parsePositiveInt(value: string | undefined, label: string): number {
const n = Number(value);
if (!Number.isInteger(n) || n <= 0) {
console.error(`${label} must be a positive integer.`);
process.exit(1);
}
return n;
}

function formatGib(bytes: number): string {
return `${(bytes / 1024 ** 3).toFixed(1)}GiB`;
}

function cmdWorkerBudget(args: string[]): void {
const sub = (args[0] ?? 'status').toLowerCase();
if (sub === '--help' || sub === '-h' || sub === 'help') {
console.log(`Usage:
botmux worker-budget [status]
botmux worker-budget set --max-live-workers <n> [--idle-minutes <n>|--idle-ms <n>]
botmux worker-budget unset`);
return;
}

if (sub === 'status') {
const cfg = readGlobalConfig();
const resources = detectWorkerResources();
const budget = resolveWorkerBudget(cfg.worker, resources);
console.log('Worker budget');
console.log(` maxLiveWorkers: ${budget.maxLiveWorkers} (${budget.maxLiveWorkersSource})`);
console.log(` idleSuspendMs: ${budget.idleSuspendMs} (${budget.idleSuspendMsSource})`);
console.log(` auto baseline: ${budget.autoMaxLiveWorkers} from cpu=${resources.cpuCount}, memory=${formatGib(resources.memoryBytes)}`);
console.log(` Config file: ${globalConfigPath()}`);
console.log('');
console.log('Agent-safe edit commands:');
console.log(' botmux worker-budget set --max-live-workers 12 --idle-minutes 45');
console.log(' botmux worker-budget unset');
return;
}

if (sub === 'set') {
const rest = args.slice(1);
const maxLive = argValue(rest, '--max-live-workers', '--max-live');
const idleMs = argValue(rest, '--idle-ms', '--idle-suspend-ms');
const idleMinutes = argValue(rest, '--idle-minutes', '--idle-min');
if (maxLive === undefined && idleMs === undefined && idleMinutes === undefined) {
console.error('Usage: botmux worker-budget set --max-live-workers <n> [--idle-minutes <n>|--idle-ms <n>]');
process.exit(1);
}
if (idleMs !== undefined && idleMinutes !== undefined) {
console.error('Use only one of --idle-ms or --idle-minutes.');
process.exit(1);
}

const current = readGlobalConfig().worker ?? {};
const next: WorkerConfig = { ...current };
if (maxLive !== undefined) next.maxLiveWorkers = parsePositiveInt(maxLive, '--max-live-workers');
if (idleMs !== undefined) next.idleSuspendMs = parsePositiveInt(idleMs, '--idle-ms');
if (idleMinutes !== undefined) next.idleSuspendMs = parsePositiveInt(idleMinutes, '--idle-minutes') * 60_000;

mergeGlobalConfig({ worker: next });
const budget = resolveWorkerBudget(next);
console.log('✅ Updated worker budget.');
console.log(` maxLiveWorkers: ${budget.maxLiveWorkers} (${budget.maxLiveWorkersSource})`);
console.log(` idleSuspendMs: ${budget.idleSuspendMs} (${budget.idleSuspendMsSource})`);
console.log(` Config file: ${globalConfigPath()}`);
console.log('Daemon reads this on the next idle-worker sweep; restart also picks it up.');
return;
}

if (sub === 'unset' || sub === 'clear') {
mergeGlobalConfig({ worker: null });
console.log('✅ Cleared worker budget override; daemon will use the auto-derived budget.');
console.log(` Config file: ${globalConfigPath()}`);
return;
}

console.error('Usage: botmux worker-budget [status|set|unset]');
process.exit(1);
}

// ─── botmux preset ────────────────────────────────────────────────────────────

/**
Expand Down Expand Up @@ -5403,7 +5316,6 @@ switch (command) {
case 'quoted': await cmdQuoted(process.argv.slice(3)); break;
case 'lang': await cmdLang(process.argv.slice(3)); break;
case 'voice': await cmdVoiceSetup(process.argv.slice(3)); break;
case 'worker-budget': cmdWorkerBudget(process.argv.slice(3)); break;
case 'thread': {
// Removed in favor of `botmux history` (普通群也兼容). Friendly stderr so
// pre-rename scripts/skills surface the rename instead of "unknown command".
Expand Down
39 changes: 38 additions & 1 deletion src/core/dashboard-ipc-server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ import * as cardPrefsStore from '../services/card-prefs-store.js';
import * as observedBotsStore from '../services/observed-bots-store.js';
import { getDeploymentIdentity } from '../services/deployment-identity.js';
import * as grantPrefsStore from '../services/grant-prefs-store.js';
import { findConfigField, applyConfigField } from '../services/bot-config-store.js';
import { findConfigField, applyConfigField, coerceConfigValue } from '../services/bot-config-store.js';
import { config } from '../config.js';
import { computeSandboxDiff, applySandboxDiff } from '../services/sandbox-land.js';
import { readRawConfig, findEntryIndex, requireConfigPath } from '../services/config-store.js';
Expand Down Expand Up @@ -845,6 +845,11 @@ ipcRoute('GET', '/api/bot-default-oncall', async (_req, res) => {
const grantPrefs = grantPrefsStore.getBotGrantPrefs(cachedLarkAppId);
let p2pMode: 'thread' | 'chat' = 'thread';
try { if (getBot(cachedLarkAppId).config.p2pMode === 'chat') p2pMode = 'chat'; } catch { /* default thread */ }
let maxLiveWorkers: number | null = null;
try {
const m = getBot(cachedLarkAppId).config.maxLiveWorkers;
if (typeof m === 'number' && Number.isInteger(m) && m > 0) maxLiveWorkers = m;
} catch { /* default unlimited */ }
jsonRes(res, 200, {
larkAppId: cachedLarkAppId,
botName: getBotName(),
Expand All @@ -864,6 +869,7 @@ ipcRoute('GET', '/api/bot-default-oncall', async (_req, res) => {
restrictGrantCommands: grantPrefs.restrictGrantCommands,
messageQuotaDefaultLimit: grantPrefs.messageQuotaDefaultLimit,
p2pMode,
maxLiveWorkers,
});
});

Expand Down Expand Up @@ -968,6 +974,37 @@ ipcRoute('PUT', '/api/bot-p2p-mode', async (req, res) => {
jsonRes(res, 200, { ok: true, p2pMode: value ?? 'thread' });
});

// Per-bot 最大同时活跃会话数 maxLiveWorkers。Body `{ maxLiveWorkers: number | null }`:
// • 正整数 → 设上限;超过后 idle-worker sweeper 把最久未用的会话休眠到上限内
// • null → 清除(回落到内置默认 30)
// 走 applyConfigField(与 /config 同一写盘 + 内存热更新路径):sweeper 每分钟读
// 实时 bot.config.maxLiveWorkers,免重启即生效。
ipcRoute('PUT', '/api/bot-max-live-workers', async (req, res) => {
if (!cachedLarkAppId) return jsonRes(res, 503, { error: 'larkAppId_not_set' });
let raw: unknown;
try { raw = await readJsonBody(req); }
catch { return jsonRes(res, 400, { ok: false, error: 'bad_json' }); }
if (typeof raw !== 'object' || raw === null || Array.isArray(raw)) {
return jsonRes(res, 400, { ok: false, error: 'no_valid_fields' });
}
const body = raw as { maxLiveWorkers?: unknown };
const spec = findConfigField('maxLiveWorkers');
if (!spec) return jsonRes(res, 500, { ok: false, error: 'spec_missing' });

// null(含 JSON null)= 清除上限;number 走 coerce 校验正整数。
let value: number | null;
if (body.maxLiveWorkers === null || body.maxLiveWorkers === undefined) {
value = null;
} else {
const c = coerceConfigValue(spec, body.maxLiveWorkers);
if (!c.ok || typeof c.value !== 'number') return jsonRes(res, 400, { ok: false, error: 'invalid_number' });
value = c.value;
}
const r = await applyConfigField(cachedLarkAppId, spec, value);
if (!r.ok) return jsonRes(res, 400, { ok: false, error: r.reason });
jsonRes(res, 200, { ok: true, maxLiveWorkers: value });
});

// Per-bot file-sandbox toggle. Body `{ enabled: boolean }`. When on, this bot's
// CLI sessions run inside a per-session bwrap file sandbox (Linux). For oncall
// bots shared with semi-trusted users.
Expand Down
54 changes: 39 additions & 15 deletions src/core/idle-worker-sweeper.ts
Original file line number Diff line number Diff line change
@@ -1,35 +1,58 @@
import type { DaemonSession } from './types.js';
import { readGlobalConfig } from '../global-config.js';
import { DEFAULT_IDLE_SUSPEND_MS, resolveWorkerBudget, type ResolvedWorkerBudget } from './worker-budget.js';
import { suspendWorker } from './worker-pool.js';
import { isSuspendableBackendType } from './persistent-backend.js';

/**
* Default per-bot live-session cap applied when a bot has no explicit
* `maxLiveWorkers` configured. Keeps memory bounded out of the box: beyond this
* many live sessions, the least-recently-used ones are suspended (CLI freed,
* cold-resumes from transcript on the next message). A bot can override it from
* the dashboard. NOTE: the dashboard help copy hardcodes this number
* ('botDefaults.maxLiveWorkers*' i18n) — keep them in sync.
*/
export const DEFAULT_MAX_LIVE_WORKERS = 30;

export interface IdleWorkerSweepOptions {
now?: number;
workerBudget?: Pick<ResolvedWorkerBudget, 'maxLiveWorkers' | 'idleSuspendMs'>;
/**
* Explicit per-bot cap for THIS bot (one daemon = one bot, so the whole
* `activeSessions` map belongs to a single bot). `undefined` (bot unset) →
* fall back to {@link DEFAULT_MAX_LIVE_WORKERS}. `≤0` → no cap (escape hatch:
* never suspend).
*/
maxLiveWorkers?: number;
}

export interface IdleWorkerSweepResult {
sessionId: string;
reason: string;
}

export const DEFAULT_IDLE_WORKER_MS = DEFAULT_IDLE_SUSPEND_MS;

function liveWorkers(activeSessions: Map<string, DaemonSession>): DaemonSession[] {
return [...activeSessions.values()].filter(ds => !!ds.worker && !ds.worker.killed);
}

/**
* Count-based live-worker cap. When this bot has more live workers than its
* configured `maxLiveWorkers`, suspend its longest-idle (by lastMessageAt),
* not-currently-busy, resumable-backend sessions down to the cap. The CLI keeps
* running detached; the next message / terminal open re-forks the worker
* (daemon.ts worker-null resume path).
*
* Deliberately has NO idle-time threshold: 申晗's policy is "while resources
* allow, never time out an old session" — suspension only kicks in to enforce
* an explicit per-bot count cap. The only guard kept is correctness, not a
* timeout: a session that is mid-turn (`lastScreenStatus !== 'idle'`) is never
* suspended so an in-flight reply is never interrupted. If every over-cap
* session is busy, none are suspended this round and the next sweep retries.
*/
export function sweepIdleWorkers(
activeSessions: Map<string, DaemonSession>,
opts: IdleWorkerSweepOptions = {},
): IdleWorkerSweepResult[] {
const now = opts.now ?? Date.now();
const budget = opts.workerBudget ?? resolveWorkerBudget(readGlobalConfig().worker);
const maxLiveWorkers = budget.maxLiveWorkers;
const idleMs = budget.idleSuspendMs;
const cap = opts.maxLiveWorkers ?? DEFAULT_MAX_LIVE_WORKERS;
if (cap <= 0) return []; // explicit ≤0 = unlimited escape hatch
const running = liveWorkers(activeSessions);
if (running.length <= maxLiveWorkers) return [];
if (running.length <= cap) return [];

const candidates = running
// Never suspend an adopted session. forkAdoptWorker stamps its
Expand All @@ -42,16 +65,17 @@ export function sweepIdleWorkers(
// marker so a restored adopt session is excluded too.
.filter(ds => !ds.adoptedFrom && !ds.session.adoptedFrom)
.filter(ds => isSuspendableBackendType(ds.initConfig?.backendType))
// Correctness guard (not a timeout): never suspend a session that is
// currently producing output — that would cut off an in-flight reply.
.filter(ds => ds.lastScreenStatus === 'idle')
.filter(ds => now - (ds.lastMessageAt || 0) >= idleMs)
.sort((a, b) => (a.lastMessageAt || 0) - (b.lastMessageAt || 0));

const suspended: IdleWorkerSweepResult[] = [];
let liveCount = running.length;
for (const ds of candidates) {
if (liveCount <= maxLiveWorkers) break;
if (!suspendWorker(ds, 'idle_worker_budget')) continue;
suspended.push({ sessionId: ds.session.sessionId, reason: 'idle_worker_budget' });
if (liveCount <= cap) break;
if (!suspendWorker(ds, 'live_worker_cap')) continue;
suspended.push({ sessionId: ds.session.sessionId, reason: 'live_worker_cap' });
liveCount--;
}
return suspended;
Expand Down
9 changes: 9 additions & 0 deletions src/core/session-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -796,6 +796,15 @@ export async function restoreActiveSessions(activeSessions: Map<string, DaemonSe
const probe = probePersistentSession(backendType, backendName);
if (probe === 'missing') {
const tag = ds.session.sessionId.substring(0, 8);
// Intentionally cold-resume-suspended (idle-worker sweeper killed the
// backing session + CLI to reclaim memory over the per-bot live cap). The
// 'missing' backing is EXPECTED here, not a zombie — keep the worker-less
// active record so the next message cold-resumes from the transcript
// (forkWorker(resume=true) clears the marker once the worker is back).
if (ds.session.suspendedColdResume) {
logger.info(`[${tag}] ${backendType} session was cap-suspended — keeping active for lazy cold-resume`);
continue;
}
// 'missing' is ambiguous: it means EITHER this one pane is gone while the
// server runs (a true solo zombie) OR the whole multiplexer server is down
// (e.g. machine reboot) and every pane vanished at once. Only the former is
Expand Down
Loading