Latest V2 improvement to dev: verifier hardening, QD evolution improvements, and workflow runner isolation by Fosowl · Pull Request #156 · HolobiomicsLab/Mimosa-AI

Fosowl · 2026-06-22T16:23:04Z

Summary

This PR merges the mimosa_v2 development branch into dev, bringing ~85 commits of verifier hardening, quality-diversity evolution improvements, workflow runner isolation, and CLI robustness fixes.

What's Changed

Verifier & Grounding

Per-claim verification scripts now use AST parsing, file-first analysis, and vacuous-antecedent rejection
Source B extracts literal identifiers from goal body; Source C scans image artefacts with PIL; Source E flags dataset-sentinel leakage in selection
Bounded retry on verifier-side failures with per-claim failure handling
Grounding retriever now plans multi-step targeted literature sub-searches (dropped regex cues)
Reduced verifier temperature and shorter prompt gradients for more consistent outputs

Evolution & QD

Failure-fingerprint descriptor centered for QD novelty computation
Genotype-embedding novelty + length penalty in selection
Plateau-based stagnation replaces embedding stagnation in variation engine
Reduced crossover rate for more stable evolution
Per-goal evolution trees rendered into workflow folders

Workflow Runner & Orchestrator

Managed venv isolation: workflow dependencies installed into isolated virtual environments
max_tokens capped at 16384 to prevent token-limit crashes
Random workflow temperature sampling for diversity

CLI & UX

Hardened JSON parsing across all CLI tools (gibberish-tolerant prompts)
Improved error handling in evaluation_cli, memory_chat_cli, and onboard_cli
CSV recovery prompts now respect user-typed values
Execution sandbox stdout truncation fixed

Agent & Misc

Smolagent additional_authorized_imports updated
Agent max steps bumped to 129
Python 3.12 runner specification enforced
Docs synced with multi-agent evolution claims

Breaking Changes

None expected — all changes are additive or internal refactoring.

Testing

Verifier per-claim scripts pass on test goals
Workflow runner venv creation tested locally
CLI error handling verified with malformed inputs
Evolution pipeline runs end-to-end without stagnation false-positives

Contribution checklist

Please confirm the following before requesting review:

[x ] I have read CONTRIBUTING.md, docs/licensing-notes.md, and the repository license information (LICENSE, NOTICE) for this repository (Apache License 2.0).
[x ] I have read docs/cla-process.md and understand the CLA workflow for contributions.
[x ] I confirm that I have the legal right to submit this contribution.
[x ] I have signed the required short Individual Contributor Agreement, or I will complete it if requested.
[x ] If this contribution is made in the course of employment or under institutional IP rules, I understand that maintainers may also request the optional employer authorization.
[x ] I am not knowingly submitting code that is incompatible with this repository’s Apache 2.0 licensing terms and CLA requirements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… no wf executed true

…, fall back to 1.0 if provider rejects it

…ead of system Python

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…mimosa_v2

… counter

…ile split

…adient report

…orkspace

… from escaping local optimum

…) claim-list cache

… set: crossover rate to 0.4 + hard cap at 0.7

…uant floor When the model creator only self-hosts at int4/fp4 (e.g. moonshotai for kimi-k2.7-code), the previous floor dropped their endpoint and routed to a community fp8 reseller. For benchmark reproducibility the authoritative first-party endpoint is preferred over a third-party requantization; the JSON-validity + determinism probe still gates them on merit. Also widen the runtime quantizations filter to admit any quant the precheck approved, so OpenRouter doesn't silently drop the pinned first-party endpoint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nstream

…gimosa_v2 g Please enter a commit message to explain why this merge is necessary,

Fosowl and others added 30 commits June 9, 2026 19:04

feat: extend deps for verifier runner

99391d8

udpate

da79379

fix(workflow factory): max_tokens set to 16384 to avoid max token crash

30f0c14

feat(orchestrator): test main

b2da8c7

feat(verifier): shorter prompt gradient

2b1ab46

feat(qd): centered failure-fingerprint descriptor for QD novelty

2878c02

fix (smolagent_module): parse_memory_output empty string

9d7c904

fix(onboard_cli): robust LLM JSON parsing + gibberish-tolerant prompts

559f534

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(cli): harden evaluation_cli and memory_chat_cli error handling

39ad2cd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(cli's): improved input error handling for all cli

60da1f0

rm : notes file

59d6493

logs: edit log style ; docs: edit docstring ; feat: don't evaluate on…

7eca670

… no wf executed true

merge

13b3a63

feat(workflow_factory,llm_provider): pick random workflow temperature…

a0f9f9d

…, fall back to 1.0 if provider rejects it

fix: bug

159bb70

feat : reduce temp for verifier

94a81fe

docs: ensure 3.12 runner specification

ac38c57

feat(workflow_runner): install workflow deps into a managed venv inst…

f1bdcdd

…ead of system Python

feat(workflow runner): venv

43cf45d

feat(evolution_tree): render per-goal trees into workflow folders

5f47917

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gerge branch 'mimosa_v2' of github.com:HolobiomicsLab/Mimosa-AI into …

45d2681

…mimosa_v2

feat(smolagent): additional_authorized_imports change

c9082a9

fix: reduce crossover rate

5f936b8

refactor(variation_engine): replace embedding stagnation with plateau…

6807c2e

… counter

feat(verifier): bounded retry on verifier-side failures + per-claim f…

bdf45a2

…ile split

gerge branch 'claude/serene-cartwright-ca14ba' into mimosa_v2

c1b00b2

rm : __name__ unused

0c31897

feat(selection,qd): genotype-embedding novelty + length penalty

f03c99b

docs : simplify docstring

e495a8f

fix(verifier): don't pass error to report so it don't propagate to gr…

ee4f248

…adient report

Fosowl and others added 29 commits June 20, 2026 10:44

gerge branch 'claude/busy-gagarin-9e1a28' into mimosa_v2

7481520

refactor(verifier): simplify code

8f8374c

refactor(verifier): drop information bonus from scoring

59f66c5

refactor: docstring of abstracted_textual_gradient

f7e7728

gerge branch 'claude/hungry-ride-c71428' into mimosa_v2

efe86f0

refactor(verifier): split claim-source prompts into registry module

a5177dc

gerge branch 'claude/hungry-ride-c71428' into mimosa_v2

bd3156a

refactor(evaluators*): refactor evaluator code for maintanability

4fe8365

refactor: let error be handled by verifier build in catch

2871f46

refactor: move sources/core/evaluators/* sources/evaluators/

851066e

refactor: rename evaluation folder to benchmark_evaluation

3d95c43

feat : slighter persona for variation engine + refactor verifier code

781153a

feat (variation_engine): mutation driven by intermediary diagnosis

2fcd0e1

rfactor(verifier): RECOVERY_PROMPT_RULES shorten+rename file

9e09a73

feat(verifier): invalidate stale anchor claims and regen vs current w…

837081a

…orkspace

fix: import

26da91c

gerge branch 'claude/upbeat-diffie-d3987b' into mimosa_v2

34d24d0

feat (verifier): specify error handling in prompt btter

fe4de59

audit(run_2/mat_diffusion): seed-anchored verifier prevents evolution…

b9e09ea

… from escaping local optimum

refactor(verifier): replace seed-anchored cache with per-(task,source…

71d47a6

…) claim-list cache

feat (variation_engine): change llm_think_mutation_directive prompt ;…

644d8e7

… set: crossover rate to 0.4 + hard cap at 0.7

rm: audit files

86073cf

feat: pass goal to verifier generator

aae6335

audit(run_3/clintox): verifier-cache fix landed; SR still blocked dow…

6ce2a26

…nstream

fix : memory timelapse

80987ae

rm audit

37ed1c5

fix: anthropic crash on temp > 1

02efaa2

Merge branch 'mimosa_v2' of github.com:HolobiomicsLab/Mimosa-AI into …

e1f4145

…gimosa_v2 g Please enter a commit message to explain why this merge is necessary,

Fosowl merged commit 2057cbc into dev Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Latest V2 improvement to dev: verifier hardening, QD evolution improvements, and workflow runner isolation#156

Latest V2 improvement to dev: verifier hardening, QD evolution improvements, and workflow runner isolation#156
Fosowl merged 85 commits into
devfrom
mimosa_v2

Fosowl commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Fosowl commented Jun 22, 2026

Summary

What's Changed

Verifier & Grounding

Evolution & QD

Workflow Runner & Orchestrator

CLI & UX

Agent & Misc

Breaking Changes

Testing

Contribution checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant