Summary
The agent today wires components together ad-hoc. Each component reaches into globals (asyncio, env vars, module-level singletons, nest_asyncio.apply(), hard-coded paths, hard-coded timeouts) and makes its own assumptions about the OS, Python version, and installed packages. That coupling makes the whole system fragile: a single environment mismatch (Python 3.14 vs nest_asyncio) can break unrelated subsystems with no clear signal, and any per-deployment change requires code edits across many files.
I want to move the agent to a composition-based architecture with a unified config layer: every component is a pure, version-agnostic, OS-agnostic unit that receives a config object at construction time. All environmental decisions - Python version checks, platform detection, package availability, feature flags, timeouts, paths, credentials, model selection, logging - are resolved by the config layer at a single entry point, not scattered inside the components.
The asyncio.wait_for / nest_asyncio / Python 3.14 bug (see compat shim at the top of agent_core/core/impl/action/manager.py) is the concrete example that forced this issue, but it's a symptom, not the problem. The same class of fragility applies to MCP setup, LLM provider switching, sandboxed action execution, scheduler wiring, interface mode (browser/cli/tui), and more.
Context: how we got here
Frankie hit a blocker on Python 3.14.x where every asyncio.wait_for(...) call raised RuntimeError: Timeout should be used inside a task, breaking MCP stdio startup and action execution. Root cause: nest_asyncio.apply() doesn't propagate Python 3.14's task context variable, so asyncio.timeout() can't find the current task.
Debugging was painful because:
- No Python version is recorded in logs.
- The trigger consumer swallowed the failure silently (
except Exception: pass with no log), so the agent looked dead with zero signal.
- The traceback surfaces inside stdlib with no hint that
nest_asyncio is involved.
nest_asyncio itself is only needed because ~10 places in the codebase call asyncio.run() / loop.run_until_complete() from inside an already-running event loop.
I've shipped a band-aid shim that monkey-patches asyncio.wait_for. It works today but silently rewrites a stdlib function, swallows BaseException during cleanup, and hides the real architectural problem. It needs to go away as part of this refactor.
The proposal
1. Components become pure + version-agnostic
Every component - trigger consumer, action manager, action executor, MCP client, LLM interface, memory manager, scheduler, external comms, UI adapters, state manager - is rewritten to:
- Take all of its dependencies via constructor / DI.
- Hold no module-level globals, no
nest_asyncio.apply(), no direct env-var reads.
- Be testable in isolation without spinning up the whole agent.
- Not care about OS, Python version, or package availability.
2. Unified config layer
A single AgentConfig (or similar) object owns everything environmental:
- Python version + runtime capability checks (does
asyncio.timeout work? do we need a shim? is nest_asyncio needed?).
- Platform detection (win32/darwin/linux branching).
- Package availability probes (Node/npm for MCP, tesseract, playwright, etc.).
- Paths (data dir, chroma dir, agent FS, workspace root).
- Timeouts, retry budgets, rate limits.
- LLM providers, models, API keys, base URLs.
- Feature flags (gui_mode, slow_mode, experimental toggles).
- Interface mode and adapter selection.
- Logging setup (level, sinks, format).
Config is built once at startup from settings.json + CLI args + env detection, then handed to the composition root.
3. Single composition entry point
One place - likely a replacement/extension of app/main.py::main_async - builds the config, instantiates every component with it, wires them together, and hands control to the interface. No component constructs its dependencies itself; they all come from the composition root.
This is where version-specific workarounds live - exactly once - gated by the config's capability flags. The asyncio.wait_for shim, for example, becomes if config.needs_wait_for_shim: install_shim() at the composition root and nowhere else.
4. Eliminate nest_asyncio
As part of the component rewrite, every asyncio.run() / loop.run_until_complete() call inside a running loop (app/data/action/task_end.py, app/data/action/send_message_with_attachment.py, app/data/action/integration_management.py, agent_core/core/impl/llm/interface.py:319, agent_core/core/impl/config/watcher.py:240, agent_core/core/impl/skill/manager.py:90, and others) is converted to proper await / asyncio.create_task / asyncio.to_thread patterns. Once those are gone, nest_asyncio can be dropped from requirements.txt / environment.yml - and with it, the shim.
Wins
- One place to do environment/version checks instead of scattered runtime surprises.
- Swap-ability: changing LLM provider, interface mode, or storage backend is a config change, not a code change.
- Testability: every component can be unit-tested with a fake config.
- Debuggability: config object dump at startup = full picture of the runtime environment, no more guessing Python versions from tracebacks.
- Portability: same component code runs on 3.10, 3.11, 3.12, 3.13, 3.14, and whatever comes next - only the config resolver changes.
- The
nest_asyncio / asyncio.wait_for bug disappears as a free side-effect, not as a targeted fix.
Scope / phasing suggestion
This is a multi-week refactor, not a weekend PR. Rough phasing:
- Phase 0 - diagnostics (already partially done): log Python version at startup, log trigger consumer exits, log component init.
- Phase 1 - config skeleton: define
AgentConfig schema, build it from settings.json + env + CLI at a single point, pass it down. No component rewrites yet - just make sure everything flows through one config object.
- Phase 2 - remove
asyncio.run() inside running loop: convert the ~10 offending call sites to proper async. Drop nest_asyncio + shim.
- Phase 3 - component extraction: one subsystem at a time (start with action executor or LLM interface), move globals → constructor args, wire via composition root.
- Phase 4 - docs:
docs/architecture.md describing components + config + composition root.
Out of scope
- Behavioral changes to the agent itself. This is purely structural.
- Breaking the public config surface (
settings.json schema can evolve but shouldn't break existing users in phase 1–2).
- Maybe later move all the configs to be dependant on models rather than json. Example: the onboarding configs and settings json structure shouldn't depend on the file. Would allow to create them if they are missing instead of the agent just crashing
Summary
The agent today wires components together ad-hoc. Each component reaches into globals (
asyncio, env vars, module-level singletons,nest_asyncio.apply(), hard-coded paths, hard-coded timeouts) and makes its own assumptions about the OS, Python version, and installed packages. That coupling makes the whole system fragile: a single environment mismatch (Python 3.14 vsnest_asyncio) can break unrelated subsystems with no clear signal, and any per-deployment change requires code edits across many files.I want to move the agent to a composition-based architecture with a unified config layer: every component is a pure, version-agnostic, OS-agnostic unit that receives a config object at construction time. All environmental decisions - Python version checks, platform detection, package availability, feature flags, timeouts, paths, credentials, model selection, logging - are resolved by the config layer at a single entry point, not scattered inside the components.
The
asyncio.wait_for/nest_asyncio/ Python 3.14 bug (see compat shim at the top ofagent_core/core/impl/action/manager.py) is the concrete example that forced this issue, but it's a symptom, not the problem. The same class of fragility applies to MCP setup, LLM provider switching, sandboxed action execution, scheduler wiring, interface mode (browser/cli/tui), and more.Context: how we got here
Frankie hit a blocker on Python 3.14.x where every
asyncio.wait_for(...)call raisedRuntimeError: Timeout should be used inside a task, breaking MCP stdio startup and action execution. Root cause:nest_asyncio.apply()doesn't propagate Python 3.14's task context variable, soasyncio.timeout()can't find the current task.Debugging was painful because:
except Exception: passwith no log), so the agent looked dead with zero signal.nest_asynciois involved.nest_asyncioitself is only needed because ~10 places in the codebase callasyncio.run()/loop.run_until_complete()from inside an already-running event loop.I've shipped a band-aid shim that monkey-patches
asyncio.wait_for. It works today but silently rewrites a stdlib function, swallowsBaseExceptionduring cleanup, and hides the real architectural problem. It needs to go away as part of this refactor.The proposal
1. Components become pure + version-agnostic
Every component - trigger consumer, action manager, action executor, MCP client, LLM interface, memory manager, scheduler, external comms, UI adapters, state manager - is rewritten to:
nest_asyncio.apply(), no direct env-var reads.2. Unified config layer
A single
AgentConfig(or similar) object owns everything environmental:asyncio.timeoutwork? do we need a shim? isnest_asyncioneeded?).Config is built once at startup from
settings.json+ CLI args + env detection, then handed to the composition root.3. Single composition entry point
One place - likely a replacement/extension of
app/main.py::main_async- builds the config, instantiates every component with it, wires them together, and hands control to the interface. No component constructs its dependencies itself; they all come from the composition root.This is where version-specific workarounds live - exactly once - gated by the config's capability flags. The
asyncio.wait_forshim, for example, becomesif config.needs_wait_for_shim: install_shim()at the composition root and nowhere else.4. Eliminate nest_asyncio
As part of the component rewrite, every
asyncio.run()/loop.run_until_complete()call inside a running loop (app/data/action/task_end.py,app/data/action/send_message_with_attachment.py,app/data/action/integration_management.py,agent_core/core/impl/llm/interface.py:319,agent_core/core/impl/config/watcher.py:240,agent_core/core/impl/skill/manager.py:90, and others) is converted to properawait/asyncio.create_task/asyncio.to_threadpatterns. Once those are gone,nest_asynciocan be dropped fromrequirements.txt/environment.yml- and with it, the shim.Wins
nest_asyncio/asyncio.wait_forbug disappears as a free side-effect, not as a targeted fix.Scope / phasing suggestion
This is a multi-week refactor, not a weekend PR. Rough phasing:
AgentConfigschema, build it fromsettings.json+ env + CLI at a single point, pass it down. No component rewrites yet - just make sure everything flows through one config object.asyncio.run()inside running loop: convert the ~10 offending call sites to proper async. Dropnest_asyncio+ shim.docs/architecture.mddescribing components + config + composition root.Out of scope
settings.jsonschema can evolve but shouldn't break existing users in phase 1–2).