Skip to content

fix(ci): add mock LLM service and switch drills to config-driven agents#207

Merged
Colin4k1024 merged 11 commits into
mainfrom
fix/ci-mock-llm
May 28, 2026
Merged

fix(ci): add mock LLM service and switch drills to config-driven agents#207
Colin4k1024 merged 11 commits into
mainfrom
fix/ci-mock-llm

Conversation

@Colin4k1024
Copy link
Copy Markdown
Owner

@Colin4k1024 Colin4k1024 commented May 27, 2026

Summary

Makes the CI "release gates" runnable on Linux GitHub Actions runners by introducing an in-network mock OpenAI-compatible LLM service (avoiding host.docker.internal) and switching release drills to use pre-configured, config-driven agents.

GitHub Actions Linux runners cannot reach host.docker.internal:11434 (Ollama). The conversation agent's chatModel.Generate() call hangs indefinitely, causing all perf-gate jobs to timeout with completion_ratio=0.00, failing the P0 gate.

Changes

  • cmd/llm-mock/main.go: minimal Go HTTP server on :11434 returning valid OpenAI-format chat completion responses instantly (no model required, pure stdlib, zero external deps). Uses http.Server with explicit ReadHeaderTimeout/ReadTimeout/WriteTimeout/IdleTimeout to prevent hung builds.
  • deployments/compose/Dockerfile: build llm-mock binary in builder stage and include it in the runtime image.
  • deployments/compose/docker-compose.ci.yml: CI Compose overlay that adds llm-mock as a service and redirects MODEL_LLM_PROVIDERS_OPENAI_BASE_URL from host.docker.internal:11434 to http://llm-mock:11434/v1 for api, worker1, and worker2. Each service explicitly depends on both postgres and llm-mock for unambiguous startup ordering.
  • scripts/local-2.0-stack.sh: auto-detect CI=true (set by GitHub Actions) and activate the CI overlay; use consistent CI=true check for both overlay activation and health-wait extension (30s → 90s).
  • scripts/release-p0-drill.sh: switch to pre-configured agent (DRILL_AGENT_ID) instead of dynamically creating agents via the removed POST /api/agents endpoint; remove unused ts parameter from create_job_resilient and all call sites; remove dead create_agent helper.
  • .github/workflows/release-gates.yml: pass DRILL_AGENT_ID=conversation to the drill job.

No application code changes. Local (non-CI) behaviour is unchanged.

Validation

  • go test ./...
  • go build ./...
  • docs updated when behavior changed

Compatibility / Risk

No breaking changes. The CI overlay is only activated when CI=true; local developer workflows are unaffected. The llm-mock binary is included in the compose image but only started when the CI overlay is used.

Hermes Agent added 3 commits May 27, 2026 21:09
GitHub Actions Linux runners cannot reach host.docker.internal:11434
(Ollama). The conversation agent's chatModel.Generate() call hangs
indefinitely, causing all 6 perf-gate jobs to timeout with
completion_ratio=0.00, failing the P0 gate.

Fix:
- cmd/llm-mock/main.go: minimal Go HTTP server on :11434 that returns
  valid OpenAI-format chat completion responses instantly (no model
  required, pure stdlib, zero external deps)
- deployments/compose/Dockerfile: build llm-mock binary in builder stage
  and include it in the runtime image
- deployments/compose/docker-compose.ci.yml: CI Compose overlay that
  adds llm-mock as a service and redirects MODEL_LLM_PROVIDERS_OPENAI_BASE_URL
  from host.docker.internal:11434 to http://llm-mock:11434/v1 for
  api, worker1, and worker2
- scripts/local-2.0-stack.sh: auto-detect CI=true (set by GitHub Actions)
  and activate the CI overlay; extend health-wait from 30s to 90s in CI
  to allow time for the mock service build

No application code changes. local (non-CI) behaviour is unchanged.
…ved)

Agents are now config-driven; POST /api/agents endpoint no longer exists.
Replace all create_agent() calls in the drill script with DRILL_AGENT_ID
(defaults to 'conversation') so the drill uses the pre-loaded config agent.

Changes:
- scripts/release-p0-drill.sh: add DRILL_AGENT_ID env var, replace every
  create_agent() call in main() / Drill D / Drill E with the env var,
  simplify create_job_resilient to remove the 404-recreate-agent path
- .github/workflows/release-gates.yml: add DRILL_AGENT_ID: 'conversation'
  to the release-gates job env block
Copilot AI review requested due to automatic review settings May 27, 2026 13:36
@Colin4k1024
Copy link
Copy Markdown
Owner Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make the CI “release gates” runnable on Linux GitHub Actions runners by introducing an in-network mock OpenAI-compatible LLM service (avoiding host.docker.internal) and by switching release drills to use pre-configured, config-driven agents.

Changes:

  • Add a minimal llm-mock HTTP server binary and include it in the compose image build.
  • Add a CI compose override file and update the local stack script to automatically layer it when CI=true.
  • Update the P0 drill script/workflow to use a pre-configured agent ID instead of dynamically creating agents.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
scripts/release-p0-drill.sh Use DRILL_AGENT_ID (pre-configured agent) and treat 404 as a real failure.
scripts/local-2.0-stack.sh Auto-detect CI to apply compose overlay; extend health wait time in CI.
deployments/compose/Dockerfile Build and ship the new llm-mock binary in the compose image.
deployments/compose/docker-compose.ci.yml Add llm-mock service and override OpenAI base URL for api/workers in CI.
cmd/llm-mock/main.go New minimal OpenAI-compatible mock server for CI.
.github/workflows/release-gates.yml Provide DRILL_AGENT_ID env for release drills.

Comment thread scripts/release-p0-drill.sh
Comment thread scripts/local-2.0-stack.sh Outdated
Comment thread cmd/llm-mock/main.go Outdated
Comment thread scripts/release-p0-drill.sh
Comment thread deployments/compose/docker-compose.ci.yml Outdated
Comment thread deployments/compose/docker-compose.ci.yml Outdated
Comment thread deployments/compose/docker-compose.ci.yml
@Colin4k1024
Copy link
Copy Markdown
Owner Author

@copilot resolve the merge conflicts in this pull request

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@Colin4k1024 Colin4k1024 added the ready-to-merge Queued and ready for serial merge label May 27, 2026
Colin4k1024 and others added 3 commits May 27, 2026 21:43
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@Colin4k1024
Copy link
Copy Markdown
Owner Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in 6d21aa8. Kept our improved versions for all three conflicted files: http.Server with timeouts in llm-mock/main.go, postgres deps in docker-compose.ci.yml, and consistent CI=true health-wait check in local-2.0-stack.sh.

@Colin4k1024
Copy link
Copy Markdown
Owner Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented May 28, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in 0fd9c52. Main deleted .github/workflows/release-gates.yml (#208), so the deletion was accepted on this branch as well.

@Colin4k1024 Colin4k1024 merged commit 0ed8171 into main May 28, 2026
Copilot stopped work on behalf of Colin4k1024 due to an error May 28, 2026 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Queued and ready for serial merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants