Skip to content

test: make multi-worker bob setup deterministic#985

Open
RerankerGuo wants to merge 1 commit into
agentscope-ai:mainfrom
RerankerGuo:fix/test-06-deterministic-bob-setup
Open

test: make multi-worker bob setup deterministic#985
RerankerGuo wants to merge 1 commit into
agentscope-ai:mainfrom
RerankerGuo:fix/test-06-deterministic-bob-setup

Conversation

@RerankerGuo

Copy link
Copy Markdown

Summary

  • make test-06-multi-worker create Bob via hiclaw apply worker instead of a Manager DM request
  • keep the rest of the test focused on multi-worker collaboration after Alice and Bob exist
  • avoid CI flakes where the Manager is still processing heartbeat/task follow-ups or hits tool-guard approval before creating Bob

Why

Recent failures on #972, #983, #984, and older PRs all failed in test-06-multi-worker with the same pattern: the Manager did not produce a matching Bob creation reply, wait_worker_provisioned bob timed out, hiclaw get workers bob returned 404, and hiclaw-worker-bob did not exist. The logs also show the Manager processing unrelated heartbeat/task work or denied shell commands during the Bob creation window.

This makes Bob creation an unreliable LLM interaction inside a test whose stated purpose is multi-worker collaboration. test-02-create-worker already covers Manager-mediated worker creation for Alice; test-06 can use deterministic setup for Bob.

Verification

  • bash -n tests/test-06-multi-worker.sh
  • git diff --check

I could not run the embedded integration test locally because this environment cannot access /var/run/docker.sock (permission denied).

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

📊 CI Metrics Report

Summary

Metric Current Baseline Change
LLM Calls 147 81 +66 ↑ +81.5%
Input Tokens 5209931 2803871 +2406060 ↑ +85.8%
Output Tokens 38619 16791 +21828 ↑ +130.0%
Total Tokens 5248550 2820662 +2427888 ↑ +86.1%

By Role

Role Metric Current Baseline Change
🧠 Manager LLM Calls 92 68 +24 ↑ +35.3%
Input Tokens 3319336 2502214 +817122 ↑ +32.7%
Output Tokens 16853 13725 +3128 ↑ +22.8%
Total Tokens 3336189 2515939 +820250 ↑ +32.6%
🔧 Workers LLM Calls 55 13 +42 ↑ +323.1%
Input Tokens 1890595 301657 +1588938 ↑ +526.7%
Output Tokens 21766 3066 +18700 ↑ +609.9%
Total Tokens 1912361 304723 +1607638 ↑ +527.6%

Per-Test Breakdown

Test Mgr Calls Wkr Calls Δ Calls Mgr In Wkr In Mgr Out Wkr Out Δ Tokens Trend
02-create-worker 4 0 -8 ↓ -66.7% 107727 0 694 0 -250201 ↓ -69.8% ✅ improved
03-assign-task 12 6 +3 ↑ +20.0% 329985 135237 1744 907 -5783 ↓ -1.2% ⚠️ regressed
04-human-intervene 18 10 +15 ↑ +115.4% 464566 247203 2495 1113 +282379 ↑ +65.2% ⚠️ regressed
05-heartbeat 9 7 +9 ↑ +128.6% 289127 152095 2126 751 +168847 ↑ +61.3% ⚠️ regressed
06-multi-worker 49 32 +47 ↑ +138.2% 2127931 1356060 9794 18995 +2232646 ↑ +174.4% ⚠️ regressed

Trends

1 test(s) improved (fewer LLM calls)
⚠️ 4 test(s) regressed (more LLM calls)


Generated by HiClaw CI on 2026-07-03 09:09:38 UTC


📦 Download debug logs & test artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant