youngbryan97 · youngbryan97 · Apr 20, 2026 · Apr 30, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -926,3 +926,81 @@ The null hypothesis suite proves the architecture is real. Five additional test
 - **Phenomenal Convergence** (`test_phenomenal_convergence.py`, 17 tests): QDT 6-gate protocol -- pre-report quality space geometry, counterfactual state swap, no-report behavioral footprint, perturbational integration, baseline failure verification, phenomenal tethering via architectural anesthesia.
 
 Full results and analysis: [TESTING.md](TESTING.md)
+
+---
+
+## 15. Round-3 Additions (April 2026): Sentrux + Kame + RSI Cooldown + Unattended Training
+
+This section documents the four production additions that landed in the Round-3 cycle. Together they close the loop from "Aura proposes a code change" → "the change is graded against architectural quality and only promoted if it doesn't degrade the codebase" → "Aura speaks while the deeper model thinks" → "training survives a closed laptop and auto-fuses on completion."
+
+### 15.1 Architecture Quality Gate (Aura-Sentrux)
+
+**Files**: `core/architecture_quality/scorer.py`, `core/architecture_quality/gate.py`, `core/architecture_quality/rules.toml`
+
+A native equivalent of [Sentrux](https://github.com/sentrux/sentrux) that gates self-modifications on architectural quality. No code copied — designed and written from scratch in the Aura idiom.
+
+**Five root-cause metrics**, weighted into a single 0–10000 score:
+- modularity (30%): networkx greedy modularity; stdlib package-cohesion fallback if networkx is unavailable
+- acyclicity (30%): 1 − cycle_density via Tarjan SCC over module-level imports
+- depth (10%): normalized DAG depth
+- equality (15%): normalized Gini over file size and import fan-in/out
+- redundancy (15%): AST function-body signature hash duplication
+
+**Wiring**: `core/self_modification/safe_modification.SafeSelfModification.apply_fix` runs `_run_quality_gate` immediately after the Stage-5 quarantine→primary promotion. On gate failure the existing `_rollback(...)` restores the SHA-256-verified backup taken at Stage 1, and a structured rejection record is appended to `data/architecture_quality_rejections.jsonl`. Non-Python changes pass through; gate-internal errors fail-open (`gate_error_allowed`) so a buggy gate cannot brick self-modification.
+
+**Default rules** (`core/architecture_quality/rules.toml`, TOML, parsed via stdlib `tomllib`):
+- `max_score_drop = 200` (out of 10000)
+- `max_new_cycles = 0`
+- `max_new_god_files = 0` (god file = >800 LOC + high fan-in/out)
+- `min_overall_score = 0` (off by default; live tree currently scores 5602/10000)
+
+**Live registration**: `core/service_registration.register_all_services` registers `architecture_quality_gate` as a singleton; resolution calls `install_gate(...)` so the module-level installed-gate hook is populated for the cross-call rejection path.
+
+**Tests**: `tests/test_architecture_quality.py` (6 tests, all green) — score range, synthetic-cycle drop, unchanged-tree pass, regressed-tree reject, end-to-end safe-modification block on architectural regression, dependency-graph parser correctness.
+
+### 15.2 Tandem Speak-While-Thinking (Aura-Kame)
+
+**Files**: `core/brain/llm/tandem_kame.py`, `core/brain/llm/tandem_signal_bus.py`, `core/brain/llm/tandem_router.py`
+
+A native equivalent of Sakana's [Kame](https://pub.sakana.ai/kame/) (paper: arXiv 2510.02327). Maps Aura's existing 7B/14B fast lane to "fast frontend" and the 32B/72B Cortex/Solver to "slow backend." A priority-ordered asyncio pubsub bus carries `OracleSignal`s from the slow lane to the fast lane mid-stream.
+
+**Signal priority** (highest first): `retract` > `handoff` > `correction` > `refine` > `continue`. A `retract` halts the fast stream and switches output to the slow lane; a `correction` splices into the fast stream; a `handoff` yields the slow output without a retract marker.
+
+**Wiring**: `attach_tandem(router, fast, slow)` is opt-in — `core/brain/llm_health_router.py` is untouched. Round-3 service registration calls `attach_tandem` against the resolved llm router so `router.tandem` is reachable from the runtime; tandem mode is then triggered per-call by `should_use_tandem(...)` heuristics (length, intent class, explicit task type).
+
+**Failure modes covered**: solo-mode passthrough when the bus is silent, slow-lane timeout (fast finishes solo), bus subscription priority ordering, fake-fast / fake-slow streaming for tests.
+
+**Tests**: `tests/test_tandem_kame.py` (9 tests, all green).
+
+### 15.3 RSI Loop Hardening: Tiered Sepsis with Cooldown
+
+**File**: `core/self_modification/safe_modification.py`
+
+The previous sepsis registry permanently banned any file whose Ghost-Boot validation failed once. That made the modifiable surface monotonically shrink and turned a single transient false negative into a permanent loss. Round-3 replaces it with a tiered, time-bounded ban:
+
+- **1st strike** within a 3-day observation window: log + record event, no ban
+- **2nd strike**: 24-hour cooldown
+- **3rd strike**: 7-day cooldown
+- Ban check uses absolute expiry timestamps (`bans[file_path] = expires_at`) and migrates legacy permanent entries to a 7-day expiry on first read.
+
+**Effect**: Aura's RSI loop can keep proposing improvements to the same module after a transient failure, but escalating mistakes still degrade the modification surface for that file. The ban check happens early in `validate_proposal` so a file in cooldown short-circuits before backup, branch creation, or quarantine.
+
+### 15.4 Unattended Training (Lid-Close Survivable)
+
+**Files**: `training/run_unattended.sh`, `training/run_unattended.py`, `training/README_UNATTENDED.md`
+
+A wrapper around the existing `training/train_and_fuse.py` pipeline that survives a closed laptop:
+
+- `caffeinate -i -m -s -d` keeps the system awake while the script runs
+- `tee`'d log at `training/logs/unattended_<timestamp>.log`
+- Retry loop (default `MAX_RETRIES=5`, 30-second pause between)
+- SIGTERM/SIGINT writes a final state snapshot before exit so a hard kill is graceful
+- `training/adapters/aura-personality/training_state.json` records `{started_at, last_iter, last_checkpoint_path, last_heartbeat, phase}` after every checkpoint observation; resume-from-latest is automatic on respawn
+- The existing `train_and_fuse.py` auto-fuse + `active.json` manifest publishing is preserved unchanged so the next Aura boot picks up the new fused model with no `.env` edit required.
+
+**Tests**: `tests/test_run_unattended.py` (5 tests, all green).
+
+### 15.5 What This Buys
+
+Together: Aura can propose code changes, have them automatically gated on architectural quality before promotion, speak immediately while a deeper lane refines the answer, recover from a single failed boot validation without permanently losing the file from her self-modification surface, and run multi-hour LoRA training overnight with the lid closed.
+
diff --git a/aura_cleanup.py b/aura_cleanup.py
@@ -1,3 +1,4 @@
+from __future__ import annotations
 #!/usr/bin/env python3
 """Compatibility entrypoint for Aura cleanup.
 
@@ -6,7 +7,6 @@
 cleanup implementation lives under `scripts/one_off/`.
 """
 
-from __future__ import annotations
 
 import runpy
 from pathlib import Path

diff --git a/aura_main.py b/aura_main.py
@@ -349,7 +349,7 @@ async def bootstrap_aura(orchestrator: Any):
     except Exception as exc:
         logger.debug("Memory monitor registration skipped: %s", exc)
     from core.utils.task_tracker import get_task_tracker
-    get_task_tracker().track_task(asyncio.create_task(mem_monitor.start()))
+    get_task_tracker().track_task(mem_monitor.start())
 
     logger.info("🛡️  Task Supervisor active (Memory monitoring enabled).")
 
@@ -506,12 +506,12 @@ async def _main_loop():
         await orchestrator.start()
         if hasattr(orchestrator, "_ensure_inference_gate_ready"):
             await orchestrator._ensure_inference_gate_ready(context="server_boot")
-        asyncio.create_task(orchestrator.run(), name="OrchestratorMainLoop")
+        get_task_tracker().create_task(orchestrator.run(), name="OrchestratorMainLoop")
 
         # 2. Start API Server (v21: Server now runs in Kernel)
         # [STABILITY] Start API after brain is ready to ensure correct ServiceContainer lookups.
         logger.info("🎬 [DEBUG] Pre-starting API server mission...")
-        api_task = asyncio.create_task(_run_api_server(), name="api_server")
+        api_task = get_task_tracker().create_task(_run_api_server(), name="api_server")
         logger.info("🎬 [DEBUG] API server task created successfully.")
 
         # Wait for API server to be TRULY ready (HTTP 200)
@@ -559,8 +559,8 @@ async def _stream_logger(stream, level):
                                 content.append(decoded)
                         return "\n".join(content)
 
-                    out_task = asyncio.create_task(_stream_logger(proc.stdout, "DEBUG"))
-                    err_task = asyncio.create_task(_stream_logger(proc.stderr, "ERROR"))
+                    out_task = get_task_tracker().create_task(_stream_logger(proc.stdout, "DEBUG"))
+                    err_task = get_task_tracker().create_task(_stream_logger(proc.stderr, "ERROR"))
 
                     # Watch for exit
                     while proc.returncode is None:
@@ -591,7 +591,7 @@ async def _stream_logger(stream, level):
                     logger.warning(f"🎨 Restarting GUI in 5s... (Attempt {restart_count}/{max_restarts})")
                     await asyncio.sleep(5.0)
 
-            asyncio.create_task(_gui_reaper_loop(), name="gui_reaper")
+            get_task_tracker().create_task(_gui_reaper_loop(), name="gui_reaper")
             pipe = None # Subprocess doesn't use the actor pipe
         else:
             # Linux/Others can still use the supervised actor
@@ -873,7 +873,7 @@ async def _run_server_with_bootstrap():
                 await orchestrator.start()
                 if hasattr(orchestrator, "_ensure_inference_gate_ready"):
                     await orchestrator._ensure_inference_gate_ready(context="server_boot")
-                asyncio.create_task(orchestrator.run(), name="OrchestratorMainLoop")
+                get_task_tracker().create_task(orchestrator.run(), name="OrchestratorMainLoop")
                 await run_server_async(host, args.port)
             asyncio.run(_run_server_with_bootstrap())
         elif args.desktop:

diff --git a/core/actors/sensory_gate.py b/core/actors/sensory_gate.py
@@ -1,4 +1,5 @@
 
+from core.utils.task_tracker import get_task_tracker
 import asyncio
 import logging
 import multiprocessing
@@ -43,7 +44,7 @@ async def run(self):
         self.bus.start()
 
         # Start heartbeat loop after bus is active
-        asyncio.create_task(self._heartbeat_loop())
+        get_task_tracker().create_task(self._heartbeat_loop())
 
         logger.info("👁️ SensoryGate Actor ready.")
 

diff --git a/core/adaptation/abstraction_engine.py b/core/adaptation/abstraction_engine.py
@@ -4,6 +4,7 @@
 Analyzes specific, successful problem resolutions and distills them into 
 universal, generalized rules for zero-shot application in novel domains.
 """
+from core.runtime.atomic_writer import atomic_write_text
 import asyncio
 import logging
 import json
@@ -27,7 +28,7 @@ def __init__(self, storage_path: str = "data/first_principles.json"):
 
         # Initialize the file if it doesn't exist
         if not self.storage_path.exists():
-            self.storage_path.write_text("[]")
+            atomic_write_text(self.storage_path, "[]")
 
     async def abstract_from_success(self, context: str, successful_resolution: str) -> str:
         """

diff --git a/core/adaptation/adaptive_immunity.py b/core/adaptation/adaptive_immunity.py
@@ -15,9 +15,10 @@
 It can execute only a narrow subset of repair actions through the existing
 autopoiesis engine. Everything sensitive remains governance-gated.
 """
-
 from __future__ import annotations
 
+from core.runtime.atomic_writer import atomic_write_text
+
 import asyncio
 import copy
 import hashlib
@@ -2107,7 +2108,7 @@ def _save_state(self) -> None:
             },
         }
         try:
-            self._state_path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
+            atomic_write_text(self._state_path, json.dumps(payload, indent=2), encoding="utf-8")
         except Exception as exc:
             logger.debug("Adaptive immune state save skipped: %s", exc)
 

diff --git a/core/adaptation/autonomous_resilience.py b/core/adaptation/autonomous_resilience.py
@@ -11,9 +11,9 @@
 The goal is to make her dramatically better at surfacing risk honestly,
 preempting common failures, and turning repair proposals into validated action.
 """
-
 from __future__ import annotations
 
+
 import ast
 import asyncio
 import inspect

diff --git a/core/adaptation/epistemic_humility.py b/core/adaptation/epistemic_humility.py
@@ -11,6 +11,7 @@
 are wrong and autonomously adjust your own operating parameters to compensate.
 """
 
+from core.utils.task_tracker import get_task_tracker
 import asyncio
 from collections import Counter
 import json
@@ -57,7 +58,7 @@ def __init__(self, orchestrator):
     async def start(self):
         if self.running: return
         self.running = True
-        self._task = asyncio.create_task(self._critic_loop(), name="EpistemicHumility.critic_loop")
+        self._task = get_task_tracker().create_task(self._critic_loop(), name="EpistemicHumility.critic_loop")
         logger.info("🙇 Epistemic Humility ONLINE — ready to learn from mistakes.")
 
     async def stop(self):

diff --git a/core/adaptation/self_optimizer.py b/core/adaptation/self_optimizer.py
@@ -4,6 +4,7 @@
 This allows Aura to update her own weights based on captured experiences.
 """
 
+from core.utils.task_tracker import get_task_tracker
 import os
 import json
 import logging
@@ -63,7 +64,7 @@ async def optimize(self, iters: int = 50, batch_size: int = 2) -> Dict[str, Any]
         logger.info("🧠 Nucleus: Starting self-optimization (LoRA) cycle...")
 
         if self.event_bus:
-            asyncio.create_task(self.event_bus.publish("core/optimizer/started", {
+            get_task_tracker().create_task(self.event_bus.publish("core/optimizer/started", {
                 "model": self.base_model_path.name,
                 "iters": iters,
                 "batch_size": batch_size
@@ -131,7 +132,7 @@ async def optimize(self, iters: int = 50, batch_size: int = 2) -> Dict[str, Any]
 
             if process.returncode == 0:
                 if self.event_bus:
-                    asyncio.create_task(self.event_bus.publish("core/optimizer/completed", {
+                    get_task_tracker().create_task(self.event_bus.publish("core/optimizer/completed", {
                         "status": "success",
                         "duration": duration,
                         "samples": len(data)
@@ -152,7 +153,7 @@ async def optimize(self, iters: int = 50, batch_size: int = 2) -> Dict[str, Any]
                     logger.debug('Ignored Exception in self_optimizer.py: %s', _e)
 
                 if self.event_bus:
-                    asyncio.create_task(self.event_bus.publish("core/optimizer/completed", {
+                    get_task_tracker().create_task(self.event_bus.publish("core/optimizer/completed", {
                         "status": "failed",
                         "error": error_msg
                     }))