Skip to content

fix(rca): reduce prompt rigidity and let LLM decide when to load integration skills#436

Draft
isiddharthsingh wants to merge 4 commits into
mainfrom
sms10221/dev-rca-rigidity-prompt-fixes
Draft

fix(rca): reduce prompt rigidity and let LLM decide when to load integration skills#436
isiddharthsingh wants to merge 4 commits into
mainfrom
sms10221/dev-rca-rigidity-prompt-fixes

Conversation

@isiddharthsingh
Copy link
Copy Markdown
Contributor

Summary

The RCA agent prompt had three classes of rigidity that pushed the model to satisfy floors instead of judging when it had the answer, and to call integrations the alert didn't need just because they were connected:

  • Numeric tool-call floors ("MINIMUM 15-20 tool calls", "AT LEAST 3-5 minutes") across four skill files told the model to keep going past the point of a clear root cause.
  • Forced-first-tool mandates ("your FIRST tool call MUST be jira_search_issues", "MUST search Notion within first 3 calls", "ALWAYS search the knowledge base at the START") contradicted each other and ignored alert content — an OOM alert about a single pod was being told to start with Jira.
  • Contradictory MUSTs about load_skill: behavioral_rules.md said the model MUST call load_skill before any integration tool, while the background path explicitly said "do NOT call load_skill — skills are pre-loaded". Both fired in the same prompt.

The change set replaces these with completion criteria and situational guidance, and swaps RCA's eager skill pre-loading for on-demand load_skill via a compact connected-integrations index — mirroring how foreground chat already works.

What changed

Commit 1 — prompt rigidity (c84328bc):

  • Replaced numeric floors in persistence_and_immediate_action.md, background_source_general.md, investigation.md, behavioral_rules.md with evidence-backed stopping criteria.
  • Softened forced-first mandates in jira/SKILL.md, confluence/SKILL.md, notion/SKILL.md, knowledge_base.md to situational guidance.
  • Removed the contradictory load_skill MUST from behavioral_rules.md; added a new interactive_load_skill.md segment scoped to interactive chat and RCA.
  • Added a dedicated rca ephemeral in provider_rules.py so foreground RCA gets read-only language instead of the agent "you CAN and SHOULD create/modify/delete" block.
  • Deleted the duplicate [RCA INVESTIGATION REQUESTED] HumanMessage prefix in main_chatbot.py — the ForceToolChoice middleware is the single enforcer.
  • Added a regression test for the middleware shape.

Commit 2 — on-demand RCA skills (3c1a64af):

  • background.py RCA path now emits registry.build_index(user_id) (~300 tokens) instead of load_skills_for_rca(...) (5–15k tokens of full skill bodies).
  • Background invariant now includes interactive_load_skill so the model gets the on-demand instruction in RCA too.
  • Softened the load_skill tool description (dropped "MANDATORY" / "you MUST call this first").
  • Action mode unchanged — still pre-loads skills, since action runs are write-capable and don't pay for the extra round-trip.

Behavioral verification

Two real RCAs run against the same alert (High CPU on aurora-dast cluster-staging):

Pre-change (a96d6d16) Post-change (b86c3b43)
Opening calls confluence → KB → jira → cloud_exec → query_datadog KB → KB → cloud_exec → query_datadog
First metric call Step 5 Step 4
Jira called at Step 3 Step 49
Confluence called at Step 1 Step 51
Wall-clock 4m 58s 3m 45s
Summary quality Solid, well-cited Solid, well-cited, more honest about telemetry gaps

The post-change run went metrics-first for a CPU alert, deferred human-context tools to the end, and called load_skill('github') once when it actively needed GitHub workflow — exactly the behavior the change was designed to produce.

Test plan

  • Static: no remaining MINIMUM/AT LEAST [0-9]+ floor mandates in skills/
  • Static: no remaining FIRST tool call MUST / MUST.*BEFORE any / ALWAYS search.*START patterns
  • Static: no MUST call load_skill line in core; "do NOT call load_skill" still present in action mode
  • Python files parse cleanly (composer.py, provider_rules.py, background.py, cloud_tools.py, main_chatbot.py)
  • New middleware regression test added (test_trigger_rca_middleware.py)
  • End-to-end: two live RCAs replayed against patched agent, behavior delta documented above
  • Shadow-mode replay across 5–10 additional incidents before merge to confirm no quality regression on non-CPU alert types

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9cfafb37-aee9-4269-9a07-87bb16cd2d33

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sms10221/dev-rca-rigidity-prompt-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant