Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Changelog

## 0.6.0 — 2026-06-18

### Added

- **TUI Planner Cockpit (`/plan <goal>`)** — the TUI is no longer just chat + a read-only monitor. `/plan` runs the Planner to preview a decomposition, gates on human approval (`y` / `n` / `edit` to drop sub-tasks), then dispatches into the daemon loop; progress shows in the Tasks tab. Available in both the blessed TUI (`chat`) and the readline chat. (INT-1572)
- `POST /api/plan/dispatch` — dual-path dispatch: with Linear configured it creates a parent issue + dependency-wired sub-issues and triggers a heartbeat (reusing the autonomous decomposition engine); otherwise it falls back to running each sub-task through the exec pipeline.
- **Web tools in the agentic loop** — `web_fetch` (keyless: URL → readable text) and `web_search` (pluggable backend: Tavily/Brave when `TAVILY_KEY`/`BRAVE_SEARCH_KEY` is set, else a keyless DuckDuckGo fallback) are now exposed to every adapter (openrouter/gpt/local), restoring the web capability the `claude -p` harness used to provide. Enabled by default (`webTools` option); disabled for the SWE-bench harness to keep the benchmark honest. (INT-1573)

### Changed

- **Planner migrated off `claude -p`** — `runPlanner` now runs through the OpenSwarm agentic loop via the configured adapter (read-only, multi-turn) instead of shelling out to `claude -p --max-turns 1`. Completes the INT-1420 `claude -p` removal, drops the claude-binary dependency, and lets the planner read the codebase before decomposing. `PlannerResult` contract unchanged.
- Extracted `createSubIssuesWithDependencies()` from the autonomous runner so the `/plan` endpoint and `decomposeTask` share one sub-issue/dependency engine (no logic fork).
- Extracted `startExecTask()` in the web server so `POST /api/exec` and the `/plan` fallback share one exec-task lifecycle.

## 0.5.0 — 2026-06-11

### Added
Expand Down
5 changes: 5 additions & 0 deletions benchmarks/sweBench.ts
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ async function solveOne(inst: SweInstance, model: string): Promise<{ pred: Recor
model: diagModel,
timeoutMs: 900_000,
maxTurns: 50,
// Benchmark integrity — no web tools, or the model could just search up
// the instance's GitHub issue / gold patch instead of diagnosing.
webTools: false,
onLog: process.env.SWE_VERBOSE ? (l) => console.log(` [diag] ${l}`) : () => {},
});
// 진단자가 실수로 수정했어도 구현 단계는 깨끗한 베이스에서 시작
Expand Down Expand Up @@ -214,6 +217,8 @@ async function solveOne(inst: SweInstance, model: string): Promise<{ pred: Recor
// run_tests.sh = docker cp + in-container pytest — the 30s default times
// out into a silent no-output failure the model reads as a broken env.
bashTimeoutMs: 240_000,
// Benchmark integrity — no web search of the gold patch / GitHub issue.
webTools: false,
onLog: process.env.SWE_VERBOSE ? (l) => console.log(` ${l}`) : () => {},
});

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@intrect/openswarm",
"version": "0.5.0",
"version": "0.6.0",
"description": "Autonomous AI agent orchestrator — Claude, GPT, Codex, and local models (Ollama/LMStudio/llama.cpp)",
"license": "GPL-3.0",
"type": "module",
Expand Down
8 changes: 7 additions & 1 deletion src/adapters/agenticLoop.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
// ============================================

import { TOOL_DEFINITIONS, executeToolCalls, createReadCache, type ToolCall, type ToolResult, type ToolDefinition } from './tools.js';
import { WEB_TOOL_DEFINITIONS } from './webTools.js';
import type { CliRunResult } from './types.js';

// ============ 토큰 카운팅 (VEGA token_count.py 이식) ============
Expand Down Expand Up @@ -115,6 +116,8 @@ export interface AgenticLoopOptions {
protectedFiles?: string[];
/** bash tool timeout — docker-based tests need minutes (default 30s) */
bashTimeoutMs?: number;
/** Expose web_fetch + web_search tools (default true). Disabled e.g. for SWE-bench integrity. */
webTools?: boolean;
}

/** 루프 실행 결과 */
Expand Down Expand Up @@ -160,6 +163,7 @@ export async function runAgenticLoop(options: AgenticLoopOptions): Promise<Agent
nudgeMaxOnNoEdit = 0,
protectedFiles,
bashTimeoutMs,
webTools = true,
} = options;

const startTime = Date.now();
Expand All @@ -180,7 +184,9 @@ export async function runAgenticLoop(options: AgenticLoopOptions): Promise<Agent
`or absolute paths under this root. Do NOT use "/" or a bare repo name — those are outside the project and will be rejected.\n\n`;
messages.push({ role: 'user', content: cwdNote + prompt });

const tools = enableTools ? TOOL_DEFINITIONS : [];
const tools = enableTools
? (webTools ? [...TOOL_DEFINITIONS, ...WEB_TOOL_DEFINITIONS] : TOOL_DEFINITIONS)
: [];
const readCache = createReadCache(); // 루프 단위 read 캐시 (중복 read 차단)
let toolCallCount = 0;
let editToolCount = 0; // edit_file/write_file 호출 수 (no-edit 가드용)
Expand Down
1 change: 1 addition & 0 deletions src/adapters/gpt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ export class GptCliAdapter implements CliAdapter {
nudgeMaxOnNoEdit: options.nudgeMaxOnNoEdit,
protectedFiles: options.protectedFiles,
bashTimeoutMs: options.bashTimeoutMs,
webTools: options.webTools,
};

try {
Expand Down
1 change: 1 addition & 0 deletions src/adapters/local.ts
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ export class LocalModelAdapter implements CliAdapter {
nudgeMaxOnNoEdit: options.nudgeMaxOnNoEdit,
protectedFiles: options.protectedFiles,
bashTimeoutMs: options.bashTimeoutMs,
webTools: options.webTools,
};

try {
Expand Down
1 change: 1 addition & 0 deletions src/adapters/openrouter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ export class OpenRouterCliAdapter implements CliAdapter {
nudgeMaxOnNoEdit: options.nudgeMaxOnNoEdit,
protectedFiles: options.protectedFiles,
bashTimeoutMs: options.bashTimeoutMs,
webTools: options.webTools,
};

try {
Expand Down
11 changes: 11 additions & 0 deletions src/adapters/tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import fs from 'node:fs/promises';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import path from 'node:path';
import { webFetch, webSearch } from './webTools.js';

const execFileAsync = promisify(execFile);

Expand Down Expand Up @@ -394,6 +395,16 @@ export async function executeTool(
}
}

case 'web_fetch': {
const text = await webFetch(args.url);
return { tool_call_id: callId, content: text, is_error: text.startsWith('Invalid URL') || text.startsWith('Fetch ') };
}

case 'web_search': {
const text = await webSearch(args.query, args.max_results);
return { tool_call_id: callId, content: text, is_error: text.startsWith('Search failed') || text.startsWith('Invalid query') };
}

default:
return { tool_call_id: callId, content: `Unknown tool: ${name}`, is_error: true };
}
Expand Down
2 changes: 2 additions & 0 deletions src/adapters/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ export interface CliRunOptions {
protectedFiles?: string[];
/** bash tool timeout in ms (default 30s). Raise for docker-based tests that take minutes. */
bashTimeoutMs?: number;
/** Expose web_fetch + web_search tools (default true). Set false for SWE-bench integrity. */
webTools?: boolean;
}

/**
Expand Down
93 changes: 93 additions & 0 deletions src/adapters/webTools.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import { afterEach, describe, expect, it, vi } from 'vitest';
import { webFetch, webSearch, searchBackend, WEB_TOOL_DEFINITIONS } from './webTools.js';

afterEach(() => {
vi.unstubAllGlobals();
vi.unstubAllEnvs();
vi.clearAllMocks();
});

describe('WEB_TOOL_DEFINITIONS', () => {
it('exposes exactly web_fetch and web_search', () => {
expect(WEB_TOOL_DEFINITIONS.map((t) => t.function.name)).toEqual(['web_fetch', 'web_search']);
});
});

describe('webFetch', () => {
it('strips HTML to readable text', async () => {
vi.stubGlobal('fetch', vi.fn(async () =>
new Response(
'<html><body><h1>Hi</h1><script>bad()</script><p>world &amp; co</p></body></html>',
{ status: 200, headers: { 'content-type': 'text/html' } },
),
));
const out = await webFetch('https://example.com');
expect(out).toContain('Hi');
expect(out).toContain('world & co');
expect(out).not.toContain('<');
expect(out).not.toContain('bad()');
});

it('rejects a non-http URL without fetching', async () => {
const f = vi.fn();
vi.stubGlobal('fetch', f);
const out = await webFetch('ftp://x');
expect(out).toContain('Invalid URL');
expect(f).not.toHaveBeenCalled();
});

it('reports an HTTP error rather than throwing', async () => {
vi.stubGlobal('fetch', vi.fn(async () => new Response('nope', { status: 404, statusText: 'Not Found' })));
const out = await webFetch('https://example.com/x');
expect(out).toContain('404');
});
});

describe('webSearch — backend selection', () => {
it('defaults to duckduckgo with no keys', () => {
expect(searchBackend()).toBe('duckduckgo');
});

it('prefers Tavily when TAVILY_KEY is set', async () => {
vi.stubEnv('TAVILY_KEY', 'tk');
expect(searchBackend()).toBe('tavily');
const f = vi.fn(async () =>
new Response(JSON.stringify({ results: [{ title: 'T', url: 'https://t', content: 'snip' }] }), { status: 200 }),
);
vi.stubGlobal('fetch', f);
const out = await webSearch('q', 3);
expect(String(f.mock.calls[0][0])).toContain('api.tavily.com');
expect(out).toContain('T');
expect(out).toContain('https://t');
});

it('uses Brave when BRAVE_SEARCH_KEY is set', async () => {
vi.stubEnv('BRAVE_SEARCH_KEY', 'bk');
expect(searchBackend()).toBe('brave');
const f = vi.fn(async () =>
new Response(JSON.stringify({ web: { results: [{ title: 'B', url: 'https://b', description: 'd' }] } }), { status: 200 }),
);
vi.stubGlobal('fetch', f);
const out = await webSearch('q');
expect(String(f.mock.calls[0][0])).toContain('brave.com');
expect(out).toContain('B');
});

it('parses keyless DuckDuckGo HTML results', async () => {
const html =
'<a class="result__a" href="//duckduckgo.com/l/?uddg=https%3A%2F%2Fex.com%2Fa">Result A</a>' +
'<a class="result__snippet">snippet a</a>';
vi.stubGlobal('fetch', vi.fn(async () => new Response(html, { status: 200 })));
const out = await webSearch('q', 5);
expect(out).toContain('Result A');
expect(out).toContain('https://ex.com/a');
expect(out).toContain('snippet a');
});

it('returns an error string (does not throw) on backend failure', async () => {
vi.stubGlobal('fetch', vi.fn(async () => { throw new Error('net down'); }));
const out = await webSearch('q');
expect(out).toContain('Search failed');
expect(out).toContain('TAVILY_KEY'); // keyless hint
});
});
Loading
Loading