Skip to content

fix(core): rework robustness changes per PR #158 review#159

Open
lfnothias wants to merge 4 commits into
mimosa_v2from
fix/robustness-rework
Open

fix(core): rework robustness changes per PR #158 review#159
lfnothias wants to merge 4 commits into
mimosa_v2from
fix/robustness-rework

Conversation

@lfnothias

Copy link
Copy Markdown
Collaborator

Reworks the robustness changes from the closed PR #158 according to Martin's per-change review. Branched off the latest mimosa_v2; only the accepted/reworked subset is applied, everything rejected is dropped. Suite green (109 passed, 0 failed).

Applied

  • factory.py — reworked per review: MCP discovery retry is now bounded on total wait time (5 min) with exponential backoff and a diagnostic on exhaustion, instead of a fixed 5-attempt count.
  • llm_provider.py — reworked per review: temperature is omitted for all Anthropic models rather than version-gating Opus 4.x.
  • orchestrator.py — sandbox error classification (TIMEOUT / SYNTAX_ERROR / RUNTIME_ERROR).
  • workflow_factory.py — keyword-identifier validation, atomic state_result.json write, post-assemble compile check.
  • evolution_engine.py — ghost WorkflowInfo guard; clear VariationEngine histories at session start.
  • planner.py — verify dependency output files exist before marking a step executable.
  • astra_exporter.py — fail-fast early return on empty trace.
  • tests — repair stale VariationEngine() constructor calls so the suite passes.

Dropped per review

workflow_runner temp-script deletion; the .export_status sidecar in _export_astra (status is derived from run config + artifacts, not a written file); selection.py metrics-refresh; all smolagent_factory changes (max_steps, context-window guard, file lock, TimeoutError re-raise); csv_mode.py changes; workflow_v11.md agent rules; the max_steps / max_context_tokens config fields.

config.py already carries export_astra with round-trip on mimosa_v2, so nothing was added there. The cost/fairness work (caching, model tiers, grounding KB, verifier knobs) is intentionally not in this PR and will follow separately, one feature at a time.

@Fosowl Fosowl left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below MAX_CONTEXT_TOKENS and WORKSPACE_DIR is dead code.

The attempt to add WORKSPACE_DIR indicate some confusion therefore:

Please understand that agent interact with the workspace through 2 means: (A) Their action-as-python code runs inside the workflow runner, which spawns the subprocess with cwd set to Config.workspace_dir ([sources/core/workflow_runner.py:404](sources/core/workflow_runner.py:404)) so agents can use relative paths freely and never need to be prompted with the workspace path. (B) They also call MCP tools, which (when using Toolomics) run inside Docker containers that mount the Toolomics workspace folder as their filesystem root. For the two channels to refer to the same files on disk, workspace_dir in config.py / config_default.json must match the workspace path you passed to Toolomics' ./start.sh e.g. workspace_dir = "/Users/mlg/Documents/CNRS/toolomics/workspace" (just an example). (If you skip Toolomics, the rule generalizes: workspace_dir can be any path, as long as every MCP server with file I/O sees that same path.)

OPENROUTER_PROVIDER = {self.config.openrouter_provider_for(self.config.smolagent_model_id)!r}
AGENT_EXECUTION_TIMEOUT = {self.config.agent_execution_timeout!r}
MAX_CONTEXT_TOKENS = {self.config.max_context_tokens!r}
WORKSPACE_DIR = {self.config.workspace_dir!r}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_CONTEXT_TOKENS = {self.config.max_context_tokens!r}
WORKSPACE_DIR = {self.config.workspace_dir!r}

Dead code, should be removed. The fact Claude tried to pass workspace dir to SmolAgent through the config indicate a confusion about how Mimosa work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants