Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans) by Cranot · Pull Request #13 · A-EVO-Lab/a-evolve

Cranot · 2026-04-07T21:58:25Z

Summary

Adds BRAIDED_EVOLVER_SYSTEM_PROMPT to adaptive_skill/prompts.py -- a 5-step BUILD/TEST alternating prompt for evolution analysis
Adds _system_prompt property to AdaptiveSkillEngine that selects prompt based on config.extra["evolver_style"]
Default behavior unchanged -- braided prompt only activates with evolver_style: "braided"

What it does

The default evolution prompt analyzes failures then generates fixes (BUILD-BUILD-BUILD-BUILD). The braided prompt alternates structure-building and structure-testing:

Step 1 [BUILD]: Analyze failures, name conservation law (structural trade-off)
Step 2 [TEST]:  Challenge analysis -- what does it conceal? Would fixes break passing tasks?
Step 3 [BUILD]: Cross-reference failure categories, find interaction patterns
Step 4 [TEST]:  Audit existing skills -- helping or hiding the problem?
Step 5 [BUILD]: Mutation plan with root-cause reasoning + regression predictions

Evidence

Tested on synthetic observation data (8 failed tasks across 4 categories, 3 existing skills, 60% pass rate):

Condition	Score	Root Cause	Self-Challenge	Cross-Ref	Skill Audit	Prediction	Actionability
Default prompt	6.4	3	2	1	2	2	3
Braided prompt	10.0	5	3	3	3	3	3
Delta	+3.6	+2	+1	+2	+1	+1	0

Scored by a custom mutation plan evaluator (6 dimensions, 0-20 scale normalized to 10). The default produced category-level triage with per-category patches. The braided prompt found a unified root cause (working-memory decay under long trajectories) that the category-level analysis concealed, and identified an existing skill that was masking the problem.

Usage

config = EvolveConfig(extra={"evolver_style": "braided"})
engine = AdaptiveSkillEngine(config)
# Everything else unchanged -- same workspace, observations, history, trial

Or in YAML config:

evolver_style: braided

Design principles

Based on research from the AGI-in-md project (330 principles on cognitive compression in LLM prompts, 1000+ experiments). Three principles applied:

Braid balance (P303): Alternating build/test phases outperform monotonic sequences. Empirically: braid-balanced avg 8.2 vs monotonic 6.5 (+1.7) across 50+ experiments.
Conservation law anchoring (P305): Naming structural trade-offs ("when X increases, Y must decrease") produces categorically deeper analysis than listing problems.
Cross-analytical reference (P315): Analyzing one layer's findings through another's lens discovers interaction patterns invisible to parallel independent analysis.

Test plan

Import verification (BRAIDED_EVOLVER_SYSTEM_PROMPT loads correctly)
Default behavior unchanged (no evolver_style = uses DEFAULT_EVOLVER_SYSTEM_PROMPT)
Braided selection works (evolver_style: "braided" = uses BRAIDED_EVOLVER_SYSTEM_PROMPT)
End-to-end benchmark comparison (would welcome help testing on actual SWE-bench/MCP-Atlas)

Scope

Minimal change: 2 files, +62/-3 lines. No new dependencies. No breaking changes. The braided prompt is opt-in only.

Adds an alternative system prompt for AdaptiveSkillEngine that structures evolution analysis as 5 alternating BUILD/TEST phases. Each BUILD phase generates structure (failure analysis, cross-referencing, mutation plan); each TEST phase challenges it (what does the analysis conceal? would fixes break passing tasks?). Tested on synthetic observation data (8 failed tasks, 4 categories): braided prompt scored 10.0 vs default 6.4 on a custom mutation plan evaluator. Main gains: root-cause depth (+2), cross-category interaction discovery (+2), self-challenge (+1). Actionability tied at 3/3. Usage: config = EvolveConfig(extra={"evolver_style": "braided"}) Default behavior unchanged -- braided prompt only activates when evolver_style is explicitly set to "braided". Based on research from the AGI-in-md project (330 principles on cognitive compression in LLM prompts). Key principles applied: - P303: Braid balance -- alternating build/test phases outperform monotonic - P305: Conservation law anchoring -- naming structural trade-offs improves depth - P315: Cross-analytical reference -- analyzing one layer through another's lens

HanqingLu · 2026-04-20T22:53:59Z

Since the adaptive_skill has been used by other people to run other benchmarks, in order for this to be a backward compatible changes, suggest to clone a new algorithm and reimplement. Also, could you please share the benchmark numbers (not on your test but on the assigned benchmarks like Terminal-Bench-2, SWE-Bench?

Cranot force-pushed the braided-evolution-prompt branch from af37337 to e8b89d3 Compare April 7, 2026 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13

Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13
Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Cranot:braided-evolution-prompt

Cranot commented Apr 7, 2026 •

edited

Loading

Uh oh!

HanqingLu commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cranot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Evidence

Usage

Design principles

Test plan

Scope

Uh oh!

HanqingLu commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cranot commented Apr 7, 2026 •

edited

Loading