Skip to content

feat: first-class Windows support across CLI, runtime, and judges #31

@lbfsc

Description

@lbfsc

Problem / motivation

skill-up is currently a second-class citizen on Windows — the build passes, but core evaluation paths either fail or are explicitly skipped:

  • The script judge is effectively unusable on Windows. Every script-related test in internal/judge/script_test.go and internal/judge/e2e_test.go is gated by if runtime.GOOS == "windows" { t.Skip(...) } (see script_test.go:29,113,151,186 and e2e_test.go:619). That means the script judge has neither test coverage nor a working execution path on Windows: scripts are hard-coded with #!/bin/sh / #!/bin/bash shebangs, and scenarios like missing POSIX interpreters or CRLF line endings have never been exercised.
  • Examples and setup tooling assume POSIX shell + GNU coreutils. examples/judge-debug-eval.sh, install.sh, and the make hooks / make lint-tools / make verify targets in the Makefile all assume /bin/sh. A Windows user who installs Go 1.25 cannot follow the Setup commands documented in AGENTS.md without WSL.
  • Agent Engine adapters are written against a POSIX process model. internal/agent/{qodercli,claude_code,codex}.go shell out to external CLIs, and internal/shellquote only implements POSIX quoting. Windows-specific concerns — cmd.exe / PowerShell quoting, the .exe suffix, and PATH / PATHEXT resolution — are not handled.
  • CI does not cover Windows. .github/workflows/ci.yml only runs on Linux, so regressions are invisible and the gap keeps widening.

Who is affected: contributors and users developing Agent Skills on Windows (native, not just WSL). As the Skill ecosystem expands, lack of Windows support is a hard blocker for a meaningful fraction of potential adopters.

Proposed solution

Promote Windows to a first-class supported platform in phased steps:

  1. Add a Windows CI job first. Extend .github/workflows/ci.yml with a windows-latest runner executing go build and go test -race ./... (initially as continue-on-error to surface the current gap) so subsequent work has a regression baseline.
  2. Make the script judge cross-platform (internal/judge/script.go):
    • Dispatch to an interpreter based on shebang and file extension (.ps1 / .cmd / .sh).
    • On Windows, fall back to a user-configured bash (Git Bash / WSL) for .sh scripts, returning a clear error when none is available instead of failing silently.
    • Remove every t.Skip("skipping on windows") in script_test.go / e2e_test.go and replace them with platform-aware table-driven cases.
  3. Audit Agent Engine adapters (internal/agent/):
    • Centralize executable discovery through an exec.LookPath wrapper that handles the .exe suffix and PATHEXT.
    • Route all shell composition through internal/shellquote, and add a Windows quoting implementation (see golang.org/x/sys/windows / CommandLineToArgvW semantics).
  4. Provide Windows-equivalent tooling scripts. Add PowerShell counterparts under scripts/windows/ for make hooks / lint-tools / verify, and document them in the Setup section of AGENTS.md and CONTRIBUTING.md.
  5. Path and newline hygiene. Sweep internal/runner, internal/report, and internal/skill to ensure all path construction uses filepath.Join (mostly already the case) and that generated scripts / transcripts are written with explicit LF endings to avoid Git autocrlf surprises.
  6. Documentation. Add a "Windows support" page under docs/ covering supported features, known limitations, and recommended workflows (native vs. WSL2).

Alternatives considered

  • Recommend WSL2 only and skip native Windows. Cheapest to implement, but it contradicts the project's positioning as a CLI evaluation framework for Agent Skill developers. The supported engines (Qoder CLI / Claude Code / Codex) already ship native Windows builds, so forcing WSL splits the user's engine and the evaluator across two environments and creates path / credential synchronization friction.
  • Restrict the script judge to explicitly typed scripts (.ps1 on Windows, .sh on POSIX). Sidesteps shebang parsing but breaks compatibility with existing case configs and forces Skill authors to maintain parallel scripts per platform — a poor user experience.
  • Embed a Go-native shell interpreter (e.g. mvdan/sh) to run .sh scripts. Removes the dependency on external bash, but subtle behavioral differences vs. real bash + coreutils would surprise Skill authors. Better positioned as an optional fallback than the default.

Additional context

  • Concrete Windows-skip locations that can serve as a remediation checklist:
    • internal/judge/script_test.go:29,113,151,186
    • internal/judge/e2e_test.go:619
  • Hard-coded POSIX shebangs in fixtures and tests:
    • internal/judge/script_test.go, internal/evaluator/evaluator_test.go:1484, e2e/contract_test.go:703
  • Related files that need to stay in sync with any change: AGENTS.md (Setup commands / Testing), .github/workflows/ci.yml, Makefile.
  • Toolchain note: Cobra, golangci-lint, and goreleaser all ship official Windows binaries, so there is no upstream blocker.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

Status
Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions