Skip to content

Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13

Open
Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Cranot:braided-evolution-prompt
Open

Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13
Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Cranot:braided-evolution-prompt

Conversation

@Cranot
Copy link
Copy Markdown

@Cranot Cranot commented Apr 7, 2026

Summary

  • Adds BRAIDED_EVOLVER_SYSTEM_PROMPT to adaptive_skill/prompts.py -- a 5-step BUILD/TEST alternating prompt for evolution analysis
  • Adds _system_prompt property to AdaptiveSkillEngine that selects prompt based on config.extra["evolver_style"]
  • Default behavior unchanged -- braided prompt only activates with evolver_style: "braided"

What it does

The default evolution prompt analyzes failures then generates fixes (BUILD-BUILD-BUILD-BUILD). The braided prompt alternates structure-building and structure-testing:

Step 1 [BUILD]: Analyze failures, name conservation law (structural trade-off)
Step 2 [TEST]:  Challenge analysis -- what does it conceal? Would fixes break passing tasks?
Step 3 [BUILD]: Cross-reference failure categories, find interaction patterns
Step 4 [TEST]:  Audit existing skills -- helping or hiding the problem?
Step 5 [BUILD]: Mutation plan with root-cause reasoning + regression predictions

Evidence

Tested on synthetic observation data (8 failed tasks across 4 categories, 3 existing skills, 60% pass rate):

Condition Score Root Cause Self-Challenge Cross-Ref Skill Audit Prediction Actionability
Default prompt 6.4 3 2 1 2 2 3
Braided prompt 10.0 5 3 3 3 3 3
Delta +3.6 +2 +1 +2 +1 +1 0

Scored by a custom mutation plan evaluator (6 dimensions, 0-20 scale normalized to 10). The default produced category-level triage with per-category patches. The braided prompt found a unified root cause (working-memory decay under long trajectories) that the category-level analysis concealed, and identified an existing skill that was masking the problem.

Usage

config = EvolveConfig(extra={"evolver_style": "braided"})
engine = AdaptiveSkillEngine(config)
# Everything else unchanged -- same workspace, observations, history, trial

Or in YAML config:

evolver_style: braided

Design principles

Based on research from the AGI-in-md project (330 principles on cognitive compression in LLM prompts, 1000+ experiments). Three principles applied:

  • Braid balance (P303): Alternating build/test phases outperform monotonic sequences. Empirically: braid-balanced avg 8.2 vs monotonic 6.5 (+1.7) across 50+ experiments.
  • Conservation law anchoring (P305): Naming structural trade-offs ("when X increases, Y must decrease") produces categorically deeper analysis than listing problems.
  • Cross-analytical reference (P315): Analyzing one layer's findings through another's lens discovers interaction patterns invisible to parallel independent analysis.

Test plan

  • Import verification (BRAIDED_EVOLVER_SYSTEM_PROMPT loads correctly)
  • Default behavior unchanged (no evolver_style = uses DEFAULT_EVOLVER_SYSTEM_PROMPT)
  • Braided selection works (evolver_style: "braided" = uses BRAIDED_EVOLVER_SYSTEM_PROMPT)
  • End-to-end benchmark comparison (would welcome help testing on actual SWE-bench/MCP-Atlas)

Scope

Minimal change: 2 files, +62/-3 lines. No new dependencies. No breaking changes. The braided prompt is opt-in only.

Adds an alternative system prompt for AdaptiveSkillEngine that structures
evolution analysis as 5 alternating BUILD/TEST phases. Each BUILD phase
generates structure (failure analysis, cross-referencing, mutation plan);
each TEST phase challenges it (what does the analysis conceal? would fixes
break passing tasks?).

Tested on synthetic observation data (8 failed tasks, 4 categories):
braided prompt scored 10.0 vs default 6.4 on a custom mutation plan
evaluator. Main gains: root-cause depth (+2), cross-category interaction
discovery (+2), self-challenge (+1). Actionability tied at 3/3.

Usage:
  config = EvolveConfig(extra={"evolver_style": "braided"})

Default behavior unchanged -- braided prompt only activates when
evolver_style is explicitly set to "braided".

Based on research from the AGI-in-md project (330 principles on cognitive
compression in LLM prompts). Key principles applied:
- P303: Braid balance -- alternating build/test phases outperform monotonic
- P305: Conservation law anchoring -- naming structural trade-offs improves depth
- P315: Cross-analytical reference -- analyzing one layer through another's lens
@Cranot Cranot force-pushed the braided-evolution-prompt branch from af37337 to e8b89d3 Compare April 7, 2026 21:59
@HanqingLu
Copy link
Copy Markdown
Contributor

Since the adaptive_skill has been used by other people to run other benchmarks, in order for this to be a backward compatible changes, suggest to clone a new algorithm and reimplement. Also, could you please share the benchmark numbers (not on your test but on the assigned benchmarks like Terminal-Bench-2, SWE-Bench?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants