__test by LucasErcolano · Pull Request #30 · LucasErcolano/MiroFish

LucasErcolano · 2026-06-20T20:06:59Z

test

Resolves issue #20. - Add memory_mode feature flag: baseline | experimental, env/YAML driven, rollback-safe. - Add experiment runner: deterministic run_id, seed control, snapshot config, seed/prompt hashes, results.json export, runs/<case>/<variant>/<seed>/ layout. - Add docs and configs: docs/memory_experimental.md, docs/experiment_harness.md, configs/memory_baseline.yaml, configs/memory_experimental.yaml, configs/experiments/example_case.yaml, configs/experiments/v1_smoke_*.yaml incl. no-report smoke variant. - Add tests: backend/tests/test_memory_mode.py, backend/tests/test_experiment_runner.py, backend/tests/test_experiment_runner_memory.py. - Update backend services/tests for experimental memory integration, spike baseline/rollback behavior, memory metrics logging, and safe backend logger handling. - Update .gitignore for logs/runs/artifacts. - Final pre-merge cleanup: move temporary smoke/log artifacts out of tree; preserve no-report smoke config for simulation path validation. Issue: #20

Add optional wiki audit context layer to ReportAgent that compiles simulation knowledge-base pages into structured context injected into planning and section-generation prompts. Feature is fully opt-in via build_wiki_context_for_report()/wiki_context=None — no change to existing behavior when not activated. Implementation: - backend/app/services/wiki_memory/: new package (WikiStore, WikiCompiler, schemas, templates) for compiling wiki pages into context for report generation - backend/app/services/report_agent.py: add wiki_context param, inject <wiki_audit_context> block into plan_outline and generate_section_react prompts with prior-knowledge labeling - backend/app/api/report.py: integrate wiki context building with graceful degradation (non-fatal on error) - backend/app/services/__init__.py: refactor to lazy-import heavy services, eager-export wiki_memory public API Tests: 116/116 passing (compiler, store, integration, smoke). Docs: docs/wiki_backed_report_memory.md with MVP activation details. Smoke: scripts/real_lite_smoke.py for real-LLM verification.

Route OASIS simulation agents to different LLMs via a YAML model map and record per-call telemetry (tokens, latency, estimated cost) so every agent action is traceable to the model that produced it. Fully opt-in via --model-map; single-model behavior is unchanged without it. - model_router.py: load/validate model map, resolve ModelPolicy per agent (precedence by_agent_id > by_role > default), lazy CAMEL backend build. Secrets via env only (literal api_key rejected); fallback off by default. - llm_telemetry.py: instrument the CAMEL backend INSTANCE (run/arun) — not LLMClient, which is not in the agent LLM path — writing one JSONL record per call with cost estimation and leak flags. - run_reddit_simulation.py: --model-map flag, per-agent routed backends, redacted model_routing_audit.jsonl, round-stamped telemetry. - scripts/export_telemetry.py: standalone CSV + summary export (stdlib only). - configs/model_map_example.yaml + model_prices.yaml, runs/smoke_multimodel/ recipe, docs/multimodel_agents.md. - tests/test_model_routing.py: 21 tests (validation, precedence, secrets, cost, telemetry wrapper). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…LM telemetry (#21) Supersedes the spike's inline agent_configs llm_* routing with the configurable agent_model_map.yaml router + per-call telemetry, as the spike itself called for. Spike evidence docs are preserved. # Conflicts: # backend/scripts/run_reddit_simulation.py

- run_parallel_simulation.py is single-model per platform (not wired): concurrent platforms make a shared sink.current_round racy; full wiring needs per-platform sinks/round contexts. - SDK-internal retries are below the instrumented run()/arun(): one telemetry row per top-level call (final usage or final error). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…#21) Closes the issue's 'Smoke run con 2 modelos reales' checkbox: - 18 LLM calls, 9 per model; agents 0-9 -> gemini-2.5-flash-lite (by_agent_id), default -> gemini-3.1-flash-lite - every call traceable to (model, provider, tokens, cost, round) in llm_telemetry.jsonl; routing audit + CSV/JSONL export committed - adds the no-GPU variant (any multi-model OpenAI-compatible endpoint) alongside the original local-vLLM recipe Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@LucasErcolano

field coverage (#21) Addresses PR #14 review by @LucasErcolano: - Canonical config file section: agent_model_map.yaml (runtime) vs configs/model_map_example.yaml (template) vs smoke evidence maps. - Smoke run section now states the real 2-model run was executed (Gemini, no GPU) and is the final S2 evidence — fixes the stale "deferred" wording that contradicted README.md. - Telemetry: explicit Issue #21 required-field coverage table, retries documented as stable (SDK-internal, not a separate field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…o y preparar rama

…e Deep Search autónomo

…tónomo

…ix.yaml

… PR 666ghj#318+666ghj#600)

… + cascading fallback) with Fusion patch - Replace backend/app/utils/llm_client.py with pr-600 version (15bd114) on top of pr-318 refactor (52c177f): cleaner facade, _chat_raw internal helper, _clean_json_response for markdown fence stripping, repair_truncated_json module-level helper, cascading fallback to LLM_BOOST_* when primary LLM fails. - Re-adapt Fusion patch: inject extra_body={'plugins':[{id:'fusion',...}]} when model ends in '/fusion', cap max_tokens to 4096 (Fusion router rejects > 4096). Panel via OPENROUTER_FUSION_PANEL (CSV) + OPENROUTER_FUSION_JUDGE, or preset via OPENROUTER_FUSION_PRESET (takes priority over panel). - Add Config.LLM_BOOST_* (api_key/base_url/model_name, all None by default). No breaking change: when not set, _has_boost=False and chat_json raises ValueError if primary LLM fails (caller can wrap in try/except). - Add 33 smoke tests under backend/tests/utils/test_llm_client.py covering fence stripping, think-tag stripping, truncation repair, boost fallback, Fusion routing. 28 pass, 5 xfail (document pr-600 gaps in repair_truncated_json phase 1/2 that need 'close final brace if depth_brace>0' upstream fix). Validated end-to-end with current .env: - Fusion ping (gemma+1.2b free + llama-3.1-8b judge): 3.4s, returns valid JSON - Structured JSON (Alice/30/Beijing): 2.2s, correct schema - Graphiti-compatible entity/relationship JSON: 3.0s, correct schema - DeepInfra (SIMULATION_LLM) chat: 0.3s, 'PONG' response - Fusion max_tokens cap: 16000 -> 4096 (verified via mock) - Non-Fusion models: extra_body NOT injected, max_tokens passed through

- Strip think tags in _clean_json_response (Fusion models emit them) - Strip markdown code fences (moved from _chat_raw for defense in depth) - Extract first balanced JSON object/array from prose-prefixed responses (Fusion deliberation models prepend reasoning text before JSON payload) - Prioritize whichever of { or [ appears first in the string - Update tests: rename test_does_not_strip_think_tags -> test_strips_think_tags and add 3 new prose-extraction tests - 31 passed, 5 xfailed

…cktesting evaluations

Cherry-pick of upstream PRs 666ghj#318 + 666ghj#600 (LLMClient structured output + cascading fallback) with Fusion plugin support. Includes: - f5608b5 wip: Fusion plugin support - 23f49fa feat(llm): port PR 666ghj#318+666ghj#600 with Fusion patch - 20077d0 feat: _clean_json_response handles Fusion prose-prefixed JSON Working config validated: Fusion 2+1 (gemini-2.5-flash-lite + llama-3.1-8b panel, gemini-2.5-flash-lite judge). Ontology gen OK. ~$0.002/call.

…ti-core 0.28.2) Upstream graphiti-core 0.28.2 bulk_utils.add_nodes_and_edges_bulk_tx does `entity_data.update(node.attributes or {})` for the Neo4j branch and the resulting Cypher does `SET n = $entity_data`. If any value inside `node.attributes` is a dict (nested), list-of-dicts, datetime, set, UUID, or any other non-primitive, Neo4j rejects it with: Property values can only be of primitive types or arrays thereof. Encountered: Map{} This commit adds a non-invasive flatten pass that runs *before* `self._graphiti._process_episode_data`: - _flatten_for_neo4j(value): coerce dict -> json, list/tuple -> recurse, set/frozenset -> sorted list, datetime -> isoformat, fallback str() - _flatten_attributes(attrs): apply per key, log coerced keys at INFO - _apply_flatten_pass(nodes, edges): in-place assign on the Pydantic models, with object.__setattr__ fallback for frozen configs The original (nested) shape is preserved in memory; only the field assignment at the last moment is flattened. Reads via `EntityNode.attributes` after the call see the flattened primitives. Tests: - backend/tests/test_attribute_flatten.py: 21 unit tests (primitives, nested dicts, datetimes, sets, UUID, Decimal, fallback path) - backend/tests/test_e2e_flatten_fix.py: e2e against real Neo4j, reproduces the bug without the fix and validates save+round-trip with.

…rvability Closes #8, #21. Adds: - backend/app/services/model_router.py: per-agent/role routing with precedence by_agent_id > by_role > default, secrets via env only - backend/app/services/llm_telemetry.py: per-call tokens/latency/cost/ hashes/JSON-validity/round, wrapping the CAMEL backend - backend/scripts/run_reddit_simulation.py: --model-map flag + audit + telemetry JSONL output - scripts/export_telemetry.py: stdlib CSV+JSONL export - configs/model_map_example.yaml, configs/model_prices.yaml - docs/multimodel_agents.md, runs/smoke_multi_model artefacts - tests covering routing precedence, secret hygiene, telemetry.

Closes #20. Adds a persistent local Markdown Wiki as auxiliary audit/evidence context for ReportAgent. Does not replace Zep, GraphRAG, or the existing operational memory stack. Baseline behavior unchanged unless wiki_context is explicitly available. - backend/app/services/wiki_memory/ (WikiStore, WikiCompiler, schemas) - build_wiki_context_for_report() with non-fatal fallback to None - Per-run/case artifacts: agents.md, index.md, timeline.md, sources.md, contradictions.md, entities/*.md, claims/*.md, wiki_meta.json - ReportAgent prompt integration via <wiki_audit_context> tag - docs/wiki_backed_report_memory.md - scripts/real_lite_smoke.py - Unit + integration + smoke tests # Conflicts: # .gitignore

Adds: - backend/app/graph/graphiti_backend.py: semantic dedup (cosine >= 0.85) pre-insert + isolated-node pruning (lurker prevention) - backend/app/services/deep_search.py: Tavily-backed autonomous search with max_date injection to prevent data leakage during backtesting - backend/requirements.txt: tavily-python - Evaluation artefacts for IPC Argentina 2025 backtesting case - PR #27 starts from a pre-SIMULATION_LLM_ base; manually merged the S3 additions (PLANNING_CAPTURE_*, SIMILARITY_THRESHOLD, ENABLE_DEEP_SEARCH, DEEP_SEARCH_*, TAVILY_API_KEY, GEMINI_API_KEY) into the current backend/app/config.py, preserving the SIMULATION_LLM_* block from f20bfd9 and the FLASK_LLM_* / config additions from PRs #14 and #23. # Conflicts: # .gitignore # backend/app/graph/graphiti_backend.py

Joacocade and others added 28 commits May 22, 2026 16:40

feat: spike per-agent local LLM routing

c62d1e1

chore: add Argentina 2025 pilot case artifacts

d68c4d6

chore: keep pilot artifacts isolated to cases

9010e8b

refactor report agent localization guards

b824bce

fix: narrow issue 20 wiki memory scope

58ca7aa

chore(S3): integrar archivos locales de actualización al nuevo entorn…

b111ea7

…o y preparar rama

feat(S3): implementar deduplicación semántica de agentes y pipeline d…

83829b1

…e Deep Search autónomo

feat(S3): integrar Gemini Google Search Grounding para Deep Search au…

53a0b19

…tónomo

chore(S3): optimizar selección de modelos Gemini y añadir config_matr…

04f7529

…ix.yaml

docs(S3): generar reporte final de validación y pruebas A/B

96eb70e

docs(S3): actualizar reporte MAE y fallback de conocimiento de Llama-3.3

3243e5b

feat(S3): integrar Tavily API para Deep Search y actualizar reporte

ca448d5

wip: Fusion plugin support (pre-cherry-pick, pending integration with…

f5608b5

… PR 666ghj#318+666ghj#600)

feat(S3): Implement topological deduplication and deep search with ba…

8284572

…cktesting evaluations

LucasErcolano closed this Jun 20, 2026

LucasErcolano deleted the merge/main-feature-aggregation branch June 20, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

__test#30

__test#30
LucasErcolano wants to merge 28 commits into
mainfrom
merge/main-feature-aggregation

LucasErcolano commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

LucasErcolano commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants