Problem / motivation
skill-up is currently a second-class citizen on Windows — the build passes, but core evaluation paths either fail or are explicitly skipped:
- The script judge is effectively unusable on Windows. Every script-related test in
internal/judge/script_test.go and internal/judge/e2e_test.go is gated by if runtime.GOOS == "windows" { t.Skip(...) } (see script_test.go:29,113,151,186 and e2e_test.go:619). That means the script judge has neither test coverage nor a working execution path on Windows: scripts are hard-coded with #!/bin/sh / #!/bin/bash shebangs, and scenarios like missing POSIX interpreters or CRLF line endings have never been exercised.
- Examples and setup tooling assume POSIX shell + GNU coreutils.
examples/judge-debug-eval.sh, install.sh, and the make hooks / make lint-tools / make verify targets in the Makefile all assume /bin/sh. A Windows user who installs Go 1.25 cannot follow the Setup commands documented in AGENTS.md without WSL.
- Agent Engine adapters are written against a POSIX process model.
internal/agent/{qodercli,claude_code,codex}.go shell out to external CLIs, and internal/shellquote only implements POSIX quoting. Windows-specific concerns — cmd.exe / PowerShell quoting, the .exe suffix, and PATH / PATHEXT resolution — are not handled.
- CI does not cover Windows.
.github/workflows/ci.yml only runs on Linux, so regressions are invisible and the gap keeps widening.
Who is affected: contributors and users developing Agent Skills on Windows (native, not just WSL). As the Skill ecosystem expands, lack of Windows support is a hard blocker for a meaningful fraction of potential adopters.
Proposed solution
Promote Windows to a first-class supported platform in phased steps:
- Add a Windows CI job first. Extend
.github/workflows/ci.yml with a windows-latest runner executing go build and go test -race ./... (initially as continue-on-error to surface the current gap) so subsequent work has a regression baseline.
- Make the script judge cross-platform (
internal/judge/script.go):
- Dispatch to an interpreter based on shebang and file extension (
.ps1 / .cmd / .sh).
- On Windows, fall back to a user-configured
bash (Git Bash / WSL) for .sh scripts, returning a clear error when none is available instead of failing silently.
- Remove every
t.Skip("skipping on windows") in script_test.go / e2e_test.go and replace them with platform-aware table-driven cases.
- Audit Agent Engine adapters (
internal/agent/):
- Centralize executable discovery through an
exec.LookPath wrapper that handles the .exe suffix and PATHEXT.
- Route all shell composition through
internal/shellquote, and add a Windows quoting implementation (see golang.org/x/sys/windows / CommandLineToArgvW semantics).
- Provide Windows-equivalent tooling scripts. Add PowerShell counterparts under
scripts/windows/ for make hooks / lint-tools / verify, and document them in the Setup section of AGENTS.md and CONTRIBUTING.md.
- Path and newline hygiene. Sweep
internal/runner, internal/report, and internal/skill to ensure all path construction uses filepath.Join (mostly already the case) and that generated scripts / transcripts are written with explicit LF endings to avoid Git autocrlf surprises.
- Documentation. Add a "Windows support" page under
docs/ covering supported features, known limitations, and recommended workflows (native vs. WSL2).
Alternatives considered
- Recommend WSL2 only and skip native Windows. Cheapest to implement, but it contradicts the project's positioning as a CLI evaluation framework for Agent Skill developers. The supported engines (Qoder CLI / Claude Code / Codex) already ship native Windows builds, so forcing WSL splits the user's engine and the evaluator across two environments and creates path / credential synchronization friction.
- Restrict the script judge to explicitly typed scripts (
.ps1 on Windows, .sh on POSIX). Sidesteps shebang parsing but breaks compatibility with existing case configs and forces Skill authors to maintain parallel scripts per platform — a poor user experience.
- Embed a Go-native shell interpreter (e.g.
mvdan/sh) to run .sh scripts. Removes the dependency on external bash, but subtle behavioral differences vs. real bash + coreutils would surprise Skill authors. Better positioned as an optional fallback than the default.
Additional context
- Concrete Windows-skip locations that can serve as a remediation checklist:
internal/judge/script_test.go:29,113,151,186
internal/judge/e2e_test.go:619
- Hard-coded POSIX shebangs in fixtures and tests:
internal/judge/script_test.go, internal/evaluator/evaluator_test.go:1484, e2e/contract_test.go:703
- Related files that need to stay in sync with any change:
AGENTS.md (Setup commands / Testing), .github/workflows/ci.yml, Makefile.
- Toolchain note: Cobra,
golangci-lint, and goreleaser all ship official Windows binaries, so there is no upstream blocker.
Problem / motivation
skill-upis currently a second-class citizen on Windows — the build passes, but core evaluation paths either fail or are explicitly skipped:internal/judge/script_test.goandinternal/judge/e2e_test.gois gated byif runtime.GOOS == "windows" { t.Skip(...) }(seescript_test.go:29,113,151,186ande2e_test.go:619). That means thescriptjudge has neither test coverage nor a working execution path on Windows: scripts are hard-coded with#!/bin/sh/#!/bin/bashshebangs, and scenarios like missing POSIX interpreters or CRLF line endings have never been exercised.examples/judge-debug-eval.sh,install.sh, and themake hooks/make lint-tools/make verifytargets in theMakefileall assume/bin/sh. A Windows user who installs Go 1.25 cannot follow the Setup commands documented inAGENTS.mdwithout WSL.internal/agent/{qodercli,claude_code,codex}.goshell out to external CLIs, andinternal/shellquoteonly implements POSIX quoting. Windows-specific concerns —cmd.exe/ PowerShell quoting, the.exesuffix, andPATH/PATHEXTresolution — are not handled..github/workflows/ci.ymlonly runs on Linux, so regressions are invisible and the gap keeps widening.Who is affected: contributors and users developing Agent Skills on Windows (native, not just WSL). As the Skill ecosystem expands, lack of Windows support is a hard blocker for a meaningful fraction of potential adopters.
Proposed solution
Promote Windows to a first-class supported platform in phased steps:
.github/workflows/ci.ymlwith awindows-latestrunner executinggo buildandgo test -race ./...(initially ascontinue-on-errorto surface the current gap) so subsequent work has a regression baseline.internal/judge/script.go):.ps1/.cmd/.sh).bash(Git Bash / WSL) for.shscripts, returning a clear error when none is available instead of failing silently.t.Skip("skipping on windows")inscript_test.go/e2e_test.goand replace them with platform-aware table-driven cases.internal/agent/):exec.LookPathwrapper that handles the.exesuffix andPATHEXT.internal/shellquote, and add a Windows quoting implementation (seegolang.org/x/sys/windows/CommandLineToArgvWsemantics).scripts/windows/formake hooks/lint-tools/verify, and document them in the Setup section ofAGENTS.mdandCONTRIBUTING.md.internal/runner,internal/report, andinternal/skillto ensure all path construction usesfilepath.Join(mostly already the case) and that generated scripts / transcripts are written with explicit LF endings to avoid Gitautocrlfsurprises.docs/covering supported features, known limitations, and recommended workflows (native vs. WSL2).Alternatives considered
.ps1on Windows,.shon POSIX). Sidesteps shebang parsing but breaks compatibility with existing case configs and forces Skill authors to maintain parallel scripts per platform — a poor user experience.mvdan/sh) to run.shscripts. Removes the dependency on external bash, but subtle behavioral differences vs. realbash+ coreutils would surprise Skill authors. Better positioned as an optional fallback than the default.Additional context
internal/judge/script_test.go:29,113,151,186internal/judge/e2e_test.go:619internal/judge/script_test.go,internal/evaluator/evaluator_test.go:1484,e2e/contract_test.go:703AGENTS.md(Setup commands / Testing),.github/workflows/ci.yml,Makefile.golangci-lint, andgoreleaserall ship official Windows binaries, so there is no upstream blocker.