fix(rca): reduce prompt rigidity and let LLM decide when to load integration skills by isiddharthsingh · Pull Request #436 · Arvo-AI/aurora

isiddharthsingh · 2026-05-22T07:02:03Z

Summary

The RCA agent prompt had three classes of rigidity that pushed the model to satisfy floors instead of judging when it had the answer, and to call integrations the alert didn't need just because they were connected:

Numeric tool-call floors ("MINIMUM 15-20 tool calls", "AT LEAST 3-5 minutes") across four skill files told the model to keep going past the point of a clear root cause.
Forced-first-tool mandates ("your FIRST tool call MUST be jira_search_issues", "MUST search Notion within first 3 calls", "ALWAYS search the knowledge base at the START") contradicted each other and ignored alert content — an OOM alert about a single pod was being told to start with Jira.
Contradictory MUSTs about load_skill: behavioral_rules.md said the model MUST call load_skill before any integration tool, while the background path explicitly said "do NOT call load_skill — skills are pre-loaded". Both fired in the same prompt.

The change set replaces these with completion criteria and situational guidance, and swaps RCA's eager skill pre-loading for on-demand load_skill via a compact connected-integrations index — mirroring how foreground chat already works.

What changed

Commit 1 — prompt rigidity (c84328bc):

Replaced numeric floors in persistence_and_immediate_action.md, background_source_general.md, investigation.md, behavioral_rules.md with evidence-backed stopping criteria.
Softened forced-first mandates in jira/SKILL.md, confluence/SKILL.md, notion/SKILL.md, knowledge_base.md to situational guidance.
Removed the contradictory load_skill MUST from behavioral_rules.md; added a new interactive_load_skill.md segment scoped to interactive chat and RCA.
Added a dedicated rca ephemeral in provider_rules.py so foreground RCA gets read-only language instead of the agent "you CAN and SHOULD create/modify/delete" block.
Deleted the duplicate [RCA INVESTIGATION REQUESTED] HumanMessage prefix in main_chatbot.py — the ForceToolChoice middleware is the single enforcer.
Added a regression test for the middleware shape.

Commit 2 — on-demand RCA skills (3c1a64af):

background.py RCA path now emits registry.build_index(user_id) (~300 tokens) instead of load_skills_for_rca(...) (5–15k tokens of full skill bodies).
Background invariant now includes interactive_load_skill so the model gets the on-demand instruction in RCA too.
Softened the load_skill tool description (dropped "MANDATORY" / "you MUST call this first").
Action mode unchanged — still pre-loads skills, since action runs are write-capable and don't pay for the extra round-trip.

Behavioral verification

Two real RCAs run against the same alert (High CPU on aurora-dast cluster-staging):

	Pre-change (`a96d6d16`)	Post-change (`b86c3b43`)
Opening calls	confluence → KB → jira → cloud_exec → query_datadog	KB → KB → cloud_exec → query_datadog
First metric call	Step 5	Step 4
Jira called at	Step 3	Step 49
Confluence called at	Step 1	Step 51
Wall-clock	4m 58s	3m 45s
Summary quality	Solid, well-cited	Solid, well-cited, more honest about telemetry gaps

The post-change run went metrics-first for a CPU alert, deferred human-context tools to the end, and called load_skill('github') once when it actively needed GitHub workflow — exactly the behavior the change was designed to produce.

Test plan

Static: no remaining MINIMUM/AT LEAST [0-9]+ floor mandates in skills/
Static: no remaining FIRST tool call MUST / MUST.*BEFORE any / ALWAYS search.*START patterns
Static: no MUST call load_skill line in core; "do NOT call load_skill" still present in action mode
Python files parse cleanly (composer.py, provider_rules.py, background.py, cloud_tools.py, main_chatbot.py)
New middleware regression test added (test_trigger_rca_middleware.py)
End-to-end: two live RCAs replayed against patched agent, behavior delta documented above
Shadow-mode replay across 5–10 additional incidents before merge to confirm no quality regression on non-CPU alert types

…ontradictory MUSTs in agent prompts

…onnected-integrations index

coderabbitai · 2026-05-22T07:02:12Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9cfafb37-aee9-4269-9a07-87bb16cd2d33

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sms10221/dev-rca-rigidity-prompt-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…y covered by parametrized case in test_force_tool_choice.py

…skill swap

sonarqubecloud · 2026-05-22T07:22:29Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

isiddharthsingh added 2 commits May 22, 2026 02:05

fix(rca): drop numeric tool-call floors, forced-first mandates, and c…

c84328b

…ontradictory MUSTs in agent prompts

fix(rca): swap pre-loaded skill bodies for on-demand load_skill via c…

3c1a64a

…onnected-integrations index

isiddharthsingh added 2 commits May 22, 2026 03:08

chore(tests): drop test_trigger_rca_middleware.py — regression alread…

c23dc1d

…y covered by parametrized case in test_force_tool_choice.py

chore(rca): drop unused integrations local var orphaned by on-demand …

7ee8fcb

…skill swap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rca): reduce prompt rigidity and let LLM decide when to load integration skills#436

fix(rca): reduce prompt rigidity and let LLM decide when to load integration skills#436
isiddharthsingh wants to merge 4 commits into
mainfrom
sms10221/dev-rca-rigidity-prompt-fixes

isiddharthsingh commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Review skipped

Uh oh!

sonarqubecloud Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isiddharthsingh commented May 22, 2026

Summary

What changed

Behavioral verification

Test plan

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

sonarqubecloud Bot commented May 22, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 22, 2026 •

edited

Loading