Skip to content

ci: trigger CI on ci-fix branches and ci-fixer-e2e-test PRs#8

Merged
rnagulapalle merged 11 commits intomainfrom
fix/ci-workflow-triggers
Apr 18, 2026
Merged

ci: trigger CI on ci-fix branches and ci-fixer-e2e-test PRs#8
rnagulapalle merged 11 commits intomainfrom
fix/ci-workflow-triggers

Conversation

@rnagulapalle
Copy link
Copy Markdown
Collaborator

Summary

  • Add phalanx/ci-fix/** to push triggers — CI runs when the fixer commits a fix branch
  • Add ci-fixer-e2e-test to pull_request base branch list — CI runs on fix PRs targeting that branch

Why

CI fix PRs (e.g. PR #7) were opening with no CI checks because the workflow only triggered on PRs targeting main or develop. Fix branches target the original failing branch (ci-fixer-e2e-test), so CI never fired.

Result

Every phalanx/ci-fix/* push now runs the full quality gate suite, giving the fix PR a green/red CI signal before anyone reviews it.

@rnagulapalle
Copy link
Copy Markdown
Collaborator Author

🔍 Phalanx CI Fixer investigated the ruff failure but could not produce a safe fix.

Diagnosed root cause: Multiple files contain unused imports, unsorted import blocks, and f-strings without placeholders, requiring edits in several locations.

Reason: max_turns_exceeded

No code was committed. Fix run: 60f1de27-7352-4194-ad67-4c6217b07d8e

FORGE added 2 commits April 15, 2026 16:12
- ruff --fix auto-resolved 131 of 160 violations (F401, I001, UP037, F541, TC003, TC005)
- pyproject.toml: add per-file-ignores for tests/** (N806, SIM117, F811, E402, SIM105)
  — mock class names, nested-with patterns, and late imports are
    intentional conventions in test scaffolding, not real bugs

ruff check phalanx/ tests/ → All checks passed!
@rnagulapalle
Copy link
Copy Markdown
Collaborator Author

🔍 Phalanx CI Fixer investigated the mypy failure but could not produce a safe fix.

Diagnosed root cause: Multiple files assign objects of the wrong type to variables typed as 'LintError', causing mypy type errors.

Reason: max_turns_exceeded

No code was committed. Fix run: 71511e35-4fa6-4f0e-8689-cd5e80d8e86f

FORGE added 8 commits April 15, 2026 17:27
…PI endpoints

Introduces the foundational shared-state layer for the multi-agent CI fix
pipeline, PR deduplication, and inspectable API endpoints for pipeline state.

What's in Phase 1:
- phalanx/ci_fixer/context.py: CIFixContext dataclass (+ StructuredFailure,
  ClassifiedFailure, ReproductionResult, VerifiedPatch, VerificationResult)
  persisted as JSON in pipeline_context_json; full to_dict/from_dict
  round-trip; current_stage property walks populated fields
- alembic/versions/20260415_0001_ci_fix_context.py: migration adding
  pipeline_context_json Text column to ci_fix_runs
- phalanx/db/models.py: pipeline_context_json mapped_column
- phalanx/agents/ci_fixer.py: _find_existing_fix_pr() — checks GitHub API
  for open phalanx/ci-fix/* PRs before opening a new one (no duplicate PRs);
  _persist_context() helper; context initialised + updated at each stage
- phalanx/api/routes/ci_fix_runs.py: GET /v1/ci-fix-runs/{run_id}/context,
  GET /v1/ci-fix-runs/{run_id}, GET /v1/ci-fix-runs (list w/ filters)
- phalanx/api/main.py: register ci_fix_runs_router
- docs/MULTI_AGENT_CI_FIXER.md: full architecture brainstorm doc — 7-agent
  DAG, sandbox design, fallback ladder, phased plan

Tests: 46 new unit tests across test_ci_fix_context.py and
test_ci_fix_runs_api.py; Phase 1 files at 100% coverage; full suite
1666 passed, 80.57% overall.
Adds the sandbox provisioning and failure reproduction layer to the
multi-agent CI fix pipeline.

What's in Phase 2:

phalanx/ci_fixer/sandbox.py (NEW):
  - SandboxProvisioner.detect_stack() — pure file-existence detection
    for python/node/go/rust/unknown; python wins tie-breaks (checked first)
  - SandboxProvisioner.provision() — returns SandboxResult with sandbox_id,
    stack, image, workspace_path; async for Phase 3 Docker forward-compat
  - sandbox_enabled=False fast-path → returns None → reproducer skips
  - _STACK_FILES / _STACK_IMAGES module constants; stack_hint bypass

phalanx/ci_fixer/reproducer.py (NEW):
  - ReproducerAgent.reproduce() — runs reproducer_cmd in subprocess,
    classifies into: confirmed / flaky / env_mismatch / timeout / skipped
  - _output_matches_failure() — conservative match: tool name OR any
    structured error code (e.g. F401) in stdout/stderr → confirmed
  - asyncio.create_subprocess_shell + asyncio.wait_for for timeout
  - Process killed + reaped on timeout breach
  - Empty/whitespace cmd guard → skipped

phalanx/agents/ci_fixer.py (MODIFIED):
  - Import SandboxProvisioner + ReproducerAgent at module level
  - Phase 2 block inserted after clone, before analyst loop:
    provision → update ctx → reproduce → update ctx → flaky gate →
    env_mismatch gate; both gate exits call _mark_failed + ctx.complete

phalanx/config/settings.py (MODIFIED):
  - sandbox_docker_cmd, sandbox_timeout_seconds, sandbox_enabled settings

Tests: 39 new unit tests; sandbox.py 100%, reproducer.py 97%; full suite
1705 passed, 80.71% overall.
Adds a broad post-fix verification sweep to catch regressions before
committing, and wires ctx.verification_result into the pipeline.

What's in Phase 3:

phalanx/ci_fixer/verifier.py (NEW):
  - VerifierAgent.verify() — runs the full verification profile for the
    detected stack; returns VerificationResult(verdict: passed/failed/
    skipped/timeout)
  - Per-stack profiles: python (pytest if infra detected + ruff full repo),
    node (npm test), go (go test ./...), rust (cargo test)
  - Unknown stack → skipped (non-blocking)
  - Timeout per step is skipped (conservative); all-timed-out → "timeout"
  - FileNotFoundError → step reports tool-not-found, does not raise
  - _has_pytest(): detects pyproject.toml / pytest.ini / setup.cfg
  - _run_cmd(): asyncio.create_subprocess_exec + wait_for timeout

phalanx/agents/ci_fixer.py (MODIFIED):
  - Import VerifierAgent at module level
  - Post-fix verification block: verify() → ctx.verification_result set →
    verdict="failed" → ctx.complete("escalated"), _mark_failed, early return
  - Bug fix: removed Phase 1 stub that unconditionally overwrote
    ctx.reproduction_result with verdict="skipped" on the success path;
    now only sets it as a fallback when sandbox was disabled (still None)

Tests: 23 new unit tests in test_ci_fixer_verifier.py;
verifier.py 97% coverage; full suite 1728 passed, 80.80% overall.
…, container exec

CircleCI:
- log_fetcher: full v2 API (workflow jobs → step logs → raw output)
- ci_webhooks: signature verify, workflow-completed handler, double-prefix bug fix
- settings: circleci_token, circleci_webhook_secret

Sandbox pool:
- sandbox_pool.py: SandboxPool with asyncio.Queue per stack, checkout/checkin/borrow,
  background refill + reaper, Celery-fork-safe lazy singleton, docker exec helpers
- sandbox.py: SandboxResult gains container_id + mount_path; SandboxProvisioner
  uses pool (provision/release); fallback chain preserved (available=False on error)
- reproducer.py: _run_subprocess wraps command with docker exec when container_id set
- verifier.py: _run_cmd wraps command with docker exec when container_id set
- docker/sandbox/: Dockerfiles for python/node/go/rust + reset.sh

Tests: 1794 passing, 81% coverage, all new modules ≥85%
Wrap the reproduce→verify→commit block in try/finally so
provisioner.release() is guaranteed on every exit path — normal
return, early return (flaky/env_mismatch/validation_failed), and
uncaught exception.  Container is returned to the pool rather than
leaking until the reaper kills it.
Copy link
Copy Markdown
Collaborator Author

@rnagulapalle rnagulapalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rnagulapalle rnagulapalle merged commit 739af6f into main Apr 18, 2026
6 of 8 checks passed
@rnagulapalle rnagulapalle deleted the fix/ci-workflow-triggers branch April 18, 2026 00:03
rnagulapalle added a commit that referenced this pull request Apr 25, 2026
Three test files, 13 tests, ~0.8s. Each one is a static or
schema-level guard against a bug class that bit us during canary,
without requiring real Anthropic, real OpenAI, or real Docker:

  test_techlead_openai_message_shape.py (5 tests, bug #5)
    Mimics the OpenAI Responses API's input contract via a small
    schema validator. Re-runs cifix_techlead._tool_result_message and
    asserts it would be ACCEPTED. If a future refactor regresses to
    role='tool' or top-level tool_use_id (the actual canary failure),
    the validator raises ResponsesApiSchemaError before deploy.

  test_engineer_wires_llm_call.py (5 tests, bug #6)
    Source-level inspection of cifix_engineer.execute(). Asserts:
      - run_coder_subagent is called
      - llm_call= is passed (not the test-only NotImplementedError stub)
      - build_sonnet_coder_callable + coder_subagent_tool_schemas +
        CODER_SUBAGENT_SYSTEM_PROMPT are imported
    Plus a sister check that v2's _call_sonnet_llm IS still a stub —
    the day someone wires it for real, this test reminds us we no
    longer need the explicit injection.

  test_state_transition_audit.py (3 tests, bug #2)
    Asserts ALL four v3 agents inherit BaseAgent._audit unchanged
    (no shadowing). The signature-mismatch bug from canary #2 fails
    this check at import time. Plus a real-DB integration test that
    runs cifix_commander._transition_run('INTAKE','RESEARCHING')
    against a live Postgres row and verifies it doesn't TypeError —
    skips cleanly if DATABASE_URL isn't reachable so dev workflow
    isn't blocked.

  conftest.py
    Real-Postgres fixtures (db_engine module-scoped, db_session per-
    test with rollback) following tests/integration/test_db_constraints
    pattern. Plus cifix_project + cifix_work_order fixtures with
    work_order_type='ci_fix' shape.

Coverage of the canary bug list now:
  Bug | Class       | Tier-1 | Tier-2 | Tier-3
  #1  | infra       | ✓      |        |
  #2  | shadowing   |        | ✓      |
  #3  | infra       | ✓      |        |
  #4  | parser      | ✓      |        |
  #5  | provider    |        | ✓      |
  #6  | wiring      |        | ✓      |
  #7  | prompt      |        |        | (canary)
  #8  | prompt      |        |        | (canary)
  apt | regex       | ✓      |        |

6 of 8 humanize-canary bugs are now caught locally pre-deploy. The
remaining 2 (prompt issues) require real LLM + real repo and stay
in the canary process.

Combined harness runtime: 51 + 13 = 64 tests, ~2 seconds total.

Run with:
  pytest tests/integration/v3_harness/         (Tier-1, no deps)
  pytest tests/integration/v3_harness_t2/      (Tier-2, skips DB tests
                                                 if Postgres absent)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant