List view
CodeWhale v0.8.60 release cycle. Target: ~15 issues.
No due date•0/14 issues closedCodeWhale v0.8.59 release cycle. Target: ~15 issues.
No due date•0/12 issues closedCodeWhale v0.8.58 release cycle. Target: ~15 issues.
No due date•1/12 issues closedCodeWhale v0.8.57 release cycle. Target: ~15 issues.
No due date•1/15 issues closedCodeWhale v0.8.56 release cycle. Target: ~15 issues.
No due date•0/13 issues closedCodeWhale v0.8.55 release cycle. Target: ~15 issues.
No due date•1/15 issues closedCodeWhale v0.8.54 release cycle. Target: ~15 issues.
No due date•1/15 issues closedCodeWhale v0.8.53 release cycle. Target: ~15 issues.
No due date•0/15 issues closedCodeWhale v0.8.52 release cycle. Target: ~15 issues.
No due date•0/15 issues closedCodeWhale v0.8.51 release cycle. Target: ~15 issues.
No due date•3/15 issues closedCodeWhale v0.8.50 release cycle. Target: ~15 issues.
No due date•0/15 issues closedCodeWhale v0.8.49 release cycle. Target: ~15 issues.
No due date•5/15 issues closedFirst run, defaults, visual hierarchy, task finales, website / community front door, contributor guidance, and release smoke feel composed, legible, and exact. The project's visual identity locks in: an orange / blue / white work-state palette against a dark navy / near-black background, used to encode state rather than decorate. Drop the legacy `deepseek` and `deepseek-tui` binary shims now that the rename has baked through the v0.8.x line. ## In scope - **First-run wizard.** A short structured wizard for new users: provider setup (DeepSeek, Hugging Face Inference Providers, OpenRouter, or a local runtime via the Serving Workset path), model choice, execution-mode default (Plan / Agent / YOLO), `--model auto` opt-in, theme, optional memory-store opt-in, optional verifier opt-in. Replaces an unguided "just start typing" first turn. - **Control panel polish.** `/config` becomes a control panel: modes, permissions, provider, model, thinking, memory, MCP, theme — every setting inspectable, adjustable, and accompanied by a "what does this do?" line. - **Visual direction (project identity).** The orange / blue / white work-state palette (against dark navy / near-black background) becomes the project's visual identity. The palette encodes state, not decoration: - **Blue** owns frame, control, selection, borders, taskbar, stable system state. - **White** owns readable content, primary text, final answers, clean surfaces. - **Orange** owns active reasoning, live progress, Fin wakeup/verifier checks, warnings, and goal momentum — appears only when Brother Whale is actively doing something. - **Dark navy / near-black** is the background that keeps the palette crisp instead of loud. The feeling is crisp, cold, useful, kinetic — not a sports-brand parody, not gradient overload. Restraint is the brief. Affordance, not nostalgia. Other themes still ship; this one becomes the default. - **Shim-drop.** Remove `[[bin]] deepseek` from `crates/cli/Cargo.toml` and `[[bin]] deepseek-tui` from `crates/tui/Cargo.toml`. Drop `npm/deepseek-tui/` package directory. Drop `deepseek-artifacts-sha256.txt` alias from `release.yml`; stop building `deepseek-*` release assets. Drop `CODEWHALE_*=DEEPSEEK_TUI_*` legacy env-var aliases (TUI-brand vars only — DeepSeek API vars stay forever). - **Governance + community docs.** `GOVERNANCE.md` (decision rights, trust boundary), `SPONSORSHIP.md` (how sponsorship works, what it does not buy — no preferential merging, no "powered by" branding without explicit maintainer approval), `SUPPORT.md` (where to ask for help). `security@codewhale.net` security inbox listed in `SECURITY.md`. Issue templates refreshed. - **Public positioning.** README + landing page frame the project precisely: **CodeWhale is an agentic terminal workbench for open-source and open-weight coding models. DeepSeek is first-class; Hugging Face Inference Providers and OpenRouter serve as the open-model discovery and routing layer; vLLM, SGLang, Ollama, and TGI cover local serving.** Historical CHANGELOG entries untouched. ## Out of scope (kept forever) - `~/.deepseek/` user config directory name. - DeepSeek API env vars (`DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `DEEPSEEK_MODEL`, `DEEPSEEK_PROVIDER`). - `api.deepseek.com`, `api.deepseeki.com`. - Model IDs (`deepseek-v4-pro`, `deepseek-v4-flash`). - `ProviderKind::Deepseek` enum + `"deepseek"` provider string. - Historical CHANGELOG entries. - CNB mirror namespace. ## Definition of done - Fresh install via `npm install -g codewhale` or `cargo install codewhale-cli` ships `codewhale` and `codewhale-tui` binaries; no `deepseek` shim present. - First-run wizard runs cleanly for a new user and produces a working setup (provider key for DeepSeek / HF / OpenRouter / local, default execution mode, theme). - `/config` opens the control panel; every entry has a "what does this do?" line. - The orange / blue / white work-state palette is the default theme; alternate themes still ship. - `GOVERNANCE.md`, `SPONSORSHIP.md`, `SUPPORT.md` live and linked from README. - README provider matrix lists DeepSeek + HF + OpenRouter + the local-runtime path with clear "what each is for" framing. - A new user sees codewhale-named welcome surfaces with zero legacy strings. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.48]` thanks contributors to the v0.8.x DeepSeek-TUI line; frames v0.8.48 as the brand-consolidation + workbench-affordance milestone. - Release notes document the binary removal explicitly (running `deepseek` produces a shell "command not found" — no in-binary message because there is no binary).
No due date•13/36 issues closedA bounded source-linked orientation memory plus a shared state / evidence / permission contract across terminal, server, editor, bridge, and agents. Handoffs become first-class artifacts: the workbench knows where it left off, what it owes, what it changed, and how to resume. Provider abstraction lands so model selection is invisible to the rest of the system — and Hugging Face Inference Providers becomes a first-class provider alongside DeepSeek and OpenRouter, anchoring the harness's open-model story. ## In scope - **Evidence ledger.** Every receipt from v0.8.43 + every decision card + every tool inspection + every memory entry compounds into a per-session evidence ledger. Inspectable, exportable, queryable from `/evidence`. - **Handoff artifacts.** Closing a workbench session writes a handoff artifact (goal, last state, blockers, decisions, evidence). Opening one resumes the workbench — "Resume previous workbench" surfaces matching artifacts. - **Orientation cache.** Bounded, source-linked, evidence-tagged. Decays as freshness drops. First-class fact source from `codewhale.net/api/state.json` (latest release, install commands per platform, published crates, known-bad version ranges). - **Provider abstractions.** Unified `Provider` trait in `codewhale-agent` consolidating env-var precedence, secret resolution, base-URL normalization, and auth-header construction (currently scattered across `crates/config`, `crates/secrets`, `crates/tui/src/client.rs`). `ProviderKind` registry becomes configurable; model selection is provider-agnostic. - **Hugging Face as a first-class provider.** New `[providers.huggingface]` config block with `api_key` (default `HF_TOKEN`, alias `HUGGINGFACE_API_KEY`), `base_url` (default `https://router.huggingface.co/v1`), and `provider = "auto"` (or a specific Inference Provider). OpenAI-compatible route. Model picker pulls model passport metadata from the HF Hub API (license, base model, context length, chat template, tool-call support, reasoning support, gated / private status). Distinct from the **Hugging Face Workset** (#1977) which adds Hub registry / datasets / adapters / Jobs — the two share auth but ship through different surfaces. - **Cross-surface alignment.** Consistent command names, output formats, error messages across CLI (`codewhale`), TUI, runtime API (`codewhale serve --http`), bridges, and web. - **VS Code extension beta.** Scaffold, local runtime detection, chat webview. Ship as VSIX attached to GitHub Release; not Marketplace-published until beta feedback. - **Protocol contract** in `crates/protocol` carries provider-auth shape explicitly so external clients don't have to special-case. - **Per-tool migration PRs.** Start ExternalTool migrations one tool at a time (git, gh, python, node, rust, cargo) with Windows CI green per step. ## Out of scope - New providers beyond the HF Inference Providers integration (the rest stay as they are). - Cloud-hosted runtime API. - Marketplace publish of VS Code extension. - Plugin tool runtime implementation (still gated on v0.8.46 RFC). - Model Lab workset implementation (post-v0.9.0; see #1977). The HF Workset specifically depends on this milestone's provider work landing first. ## Definition of done - Switching providers mid-session is one config change with no surrounding code change. - Hugging Face Inference Providers works end-to-end against `Qwen/*`, `deepseek-ai/*`, and `meta-llama/*` model IDs without per-model special-casing in the engine. - Model picker surfaces HF model passport metadata (license, context length, gated status) before selection. - `/evidence` returns the per-session ledger. - Closing and reopening a session restores the workbench state (active task, last decision, pending blockers). - Orientation cache surfaces the latest published release within freshness window after restart. - VS Code beta VSIX attached to v0.8.47 GitHub Release; smoke-tested against local runtime API. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.47]` entry calls out HF as a first-class provider and provider abstractions as the model-neutral lever. - README provider matrix + "Bring your own open-weight model" section updated; HF Inference Providers and OpenRouter framed as the open-model discovery+routing layer.
No due date•16/32 issues closedMake documents, images, search, execution, previews, dependency probes, and safe approvals feel native to the workbench instead of like loose subprocesses. Tools become first-class workbench objects with inspectors, properties, previews, and copy/select. Preview verifier-native completion hooks behind an opt-in flag so the v0.9.0 surface lands with field experience accumulated. ## In scope - **Tool inspectors.** Every tool call has a properties pane: inputs, outputs, duration, exit, environment, sandbox decision. Tool results render in a preview panel (image, table, diff, doc) rather than as raw text. - **Copy / select mode.** First-class text selection inside the workbench (transcript, properties, previews) without breaking terminal-line semantics. The terminal-copy-includes-wrapped-lines bug (#1853) gets fixed structurally. - **Provider / model diagnostics.** Inspectable view of provider state: active provider + model, last-call latency, last error, prompt-cache hit rate. Surfaces from doctor + `/provider` slash command + properties pane. When the active provider is Hugging Face Inference Providers (via the v0.8.47 first-class integration when that lands first, or via the OpenAI-compatible path in the interim), the pane previews **model passport** metadata pulled from the HF Hub: license, base model, context length, chat template, tool-call support, reasoning support, gated / private status. - **Tool surface polish.** PDF / image / search / exec ergonomics; dependency probe + skip-on-miss (existing pattern); approval-flow clarity. Ship the small, independent RuntimeTool Windows `.cmd` fallback as a standalone PR. - **Plugin tool design RFC.** Open `docs/proposals/plugin-tools.md` defining how `// approval: auto` frontmatter interacts with the execution modes (Plan / Agent / YOLO), the approval policy, trust, and `--model auto` model routing. Implementation deferred until design lands. - **Verifier preview (general).** Opt-in via `[verifier] enabled = true`. On claim-of-done, auto-spawn a verifier sub-agent with fresh context and read-only tools, time-boxed. Verdict-as-gate: pass / partial / fail. `/force-complete` override is audit-logged. `/tasks` shows a "verified" badge. Extends the Goal-mode-specific Fin wakeup from v0.8.43 to the general claim-of-done case. ## Out of scope - Full ExternalTool / RuntimeTool 65-call-site migration (defer; ship per-tool migration PRs in v0.8.47). - Plugin tool runtime implementation (gated on the RFC). - Verifier as default (preview only — explicit opt-in). - New execution modes — Plan / Agent / YOLO remain the three modes. - Hugging Face Workset implementation (Hub registry / datasets / adapters / Jobs — see #1977). v0.8.46 only previews HF model-passport metadata in the provider pane; the workset surface ships post-v0.9.0. ## Definition of done - Every tool surfaces an inspectable properties pane and a result preview. - Copy / select mode ships with terminal-line semantics preserved. - `/provider` slash command and provider properties pane in place; HF model passport preview works against at least three model IDs (`Qwen/*`, `deepseek-ai/*`, `meta-llama/*`). - Verifier preview enables via config; verdict appears on completed tasks; `/force-complete` writes to audit log. - Plugin tool design RFC lands as draft; one round of maintainer review captured in PR. ## Release gate - Parity gates green. - `CHANGELOG.md` `[0.8.46]` entry marks verifier preview as experimental and tool inspectors as the headline. - `docs/verifier-preview.md` explains opt-in + override semantics.
No due date•40/43 issues closedThe architectural release. Three sub-projects ship together — a micro-LM service layer for cheap typed micro-operations, an operations + hints + eval pipeline, and verifier-native completion hooks graduated from preview. The release earns the .0 by changing how the agent thinks, not just what it can do. The provider-neutral substrate from v0.8.47 means every piece works against any open-weight model — DeepSeek, Hugging Face Inference Providers, OpenRouter, or a local runtime (vLLM / SGLang / Ollama / TGI) are interchangeable as the cheap-small-model backend. ## In scope - **A — Micro-LM Service Layer.** Typed micro-operations (routing, synthesis, planning, dispatch, turn-state evaluation) backed by cheap small-model calls. Dispatches through the v0.8.47 provider abstraction — Hugging Face Inference Providers, DeepSeek, OpenRouter, and the local-runtime path are all interchangeable. Any open-weight or hosted small model can power it. - **B — Operations + Hints + Eval Pipeline.** Decomposition-hints injection, sub-agent dispatch optimization, and an eval pipeline that measures lift across providers and model sizes. - **D — Verifier-Native Completion Hooks** (graduated from v0.8.46 preview). Default-on for paid / large operations, audit-logged `/force-complete` override, prompts optimized via the eval pipeline. ## Public claims (measurable, reproducible) 1. ≥ 15 pp lift on long-context recursive tasks vs comparable harnesses running on the same model. 2. ≤ 5 % false-completion rate on the deliberate-trap suite. Both claims publish with reproduction instructions, public eval suites, and the model + provider matrix the runs were taken against — at minimum DeepSeek, one HF Inference Providers backend, and one local-runtime path. ## Out of scope - Distilled model release. - Major UI overhaul. - Model-facing tool surface change. - New providers beyond what v0.8.47 lands. - Public pre-announcement before claims are reproducible against the published eval suites. - **Model Lab** (fine-tuning / eval bridge to open-weight services and worksets — Hugging Face, Unsloth, NeMo, Arcee, Serving, Eval, Observability, Training Infra) — see #1977. v0.9.0 ships the eval-pipeline substrate that makes Model Lab possible; the user-facing surface (`/model-lab capture | export | redact | dataset | eval | finetune --workset … --provider …`) is scoped for the next .0 (likely v0.10.0). ## Definition of done - Eval suites `evals/recursive-hard/` and `evals/false-completion/` dogfooded; results reproducible on a clean install with publicly listed models. - All three sub-projects pass full workspace parity gates on the integration branch. - The provider abstraction from v0.8.47 carries the runtime end-to-end with no provider-specific hot paths. ## Release gate - Parity gates green on `main` after integration branch merges. - `CHANGELOG.md` `[0.9.0]` entry frames the release as a capability shift, not a rewrite. - Published results posted with the exact reproduction commands. ## Post-shipping direction - **Model Lab** (#1977): turn the eval-pipeline substrate into a user-facing surface for exporting redacted session traces, running eval replays against alternate models, and (optionally) handing curated data to open-weight fine-tuning backends via curated worksets (Hugging Face, Unsloth, NeMo, Arcee, Serving, Eval, Observability, Training Infra). Scoped as v0.10.0 once v0.9.0 has shipped and stabilized.
No due date•6/95 issues closed