Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13
Open
Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Open
Add braided evolution prompt (BUILD-TEST alternation for deeper mutation plans)#13Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Cranot wants to merge 1 commit intoA-EVO-Lab:mainfrom
Conversation
Adds an alternative system prompt for AdaptiveSkillEngine that structures
evolution analysis as 5 alternating BUILD/TEST phases. Each BUILD phase
generates structure (failure analysis, cross-referencing, mutation plan);
each TEST phase challenges it (what does the analysis conceal? would fixes
break passing tasks?).
Tested on synthetic observation data (8 failed tasks, 4 categories):
braided prompt scored 10.0 vs default 6.4 on a custom mutation plan
evaluator. Main gains: root-cause depth (+2), cross-category interaction
discovery (+2), self-challenge (+1). Actionability tied at 3/3.
Usage:
config = EvolveConfig(extra={"evolver_style": "braided"})
Default behavior unchanged -- braided prompt only activates when
evolver_style is explicitly set to "braided".
Based on research from the AGI-in-md project (330 principles on cognitive
compression in LLM prompts). Key principles applied:
- P303: Braid balance -- alternating build/test phases outperform monotonic
- P305: Conservation law anchoring -- naming structural trade-offs improves depth
- P315: Cross-analytical reference -- analyzing one layer through another's lens
af37337 to
e8b89d3
Compare
Contributor
|
Since the adaptive_skill has been used by other people to run other benchmarks, in order for this to be a backward compatible changes, suggest to clone a new algorithm and reimplement. Also, could you please share the benchmark numbers (not on your test but on the assigned benchmarks like Terminal-Bench-2, SWE-Bench? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BRAIDED_EVOLVER_SYSTEM_PROMPTtoadaptive_skill/prompts.py-- a 5-step BUILD/TEST alternating prompt for evolution analysis_system_promptproperty toAdaptiveSkillEnginethat selects prompt based onconfig.extra["evolver_style"]evolver_style: "braided"What it does
The default evolution prompt analyzes failures then generates fixes (BUILD-BUILD-BUILD-BUILD). The braided prompt alternates structure-building and structure-testing:
Evidence
Tested on synthetic observation data (8 failed tasks across 4 categories, 3 existing skills, 60% pass rate):
Scored by a custom mutation plan evaluator (6 dimensions, 0-20 scale normalized to 10). The default produced category-level triage with per-category patches. The braided prompt found a unified root cause (working-memory decay under long trajectories) that the category-level analysis concealed, and identified an existing skill that was masking the problem.
Usage
Or in YAML config:
Design principles
Based on research from the AGI-in-md project (330 principles on cognitive compression in LLM prompts, 1000+ experiments). Three principles applied:
Test plan
BRAIDED_EVOLVER_SYSTEM_PROMPTloads correctly)evolver_style= usesDEFAULT_EVOLVER_SYSTEM_PROMPT)evolver_style: "braided"= usesBRAIDED_EVOLVER_SYSTEM_PROMPT)Scope
Minimal change: 2 files, +62/-3 lines. No new dependencies. No breaking changes. The braided prompt is opt-in only.