TenureAI · chengjl19 · Mar 8, 2026 · Mar 8, 2026
diff --git a/.agents/skills/deep-research/SKILL.md b/.agents/skills/deep-research/SKILL.md
@@ -46,6 +46,16 @@ Iterate until evidence quality is sufficient:
 8. Run contradiction/counter-evidence checks.
 9. Synthesize and produce final report.
 
+## Re-entry Policy (Mid-Run)
+
+When called during an ongoing run (not only at run start):
+
+1. Treat invocation as valid and do not require starting a new run by default.
+2. Recompute objective delta versus current stage plan.
+3. If objective changed materially, reset research focus and run fresh query batches.
+4. If objective is similar, perform incremental deep research using existing evidence as baseline.
+5. If skipped due to sufficient evidence freshness, emit `dr_skip_reason` with explicit date windows and source counts.
+
 ## Scoping-to-Planning Handoff Policy
 
 When deep research is used for open-ended scoping (`idea-exploration`), hand off findings to `research-plan` as the required default next step. Skip only if the user explicitly opts out.
@@ -234,10 +244,11 @@ Degrade rules:
 
 ## Memory and Search Policy
 
-1. Memory lookup is optional and situational.
-2. Use memory when likely to reduce repeated search effort.
-3. Use search/deep research directly when topic is new, urgent, or time-sensitive.
-4. If memory is skipped, note reason in report trail.
+1. Global memory bootstrap (from `run-governor` / `research-workflow`) is mandatory for non-trivial runs.
+2. Within deep-research, additional memory retrieval is optional and situational.
+3. Use incremental memory retrieval when it can reduce repeated search effort or contradiction resolution cost.
+4. Use search/deep research directly when topic is new, urgent, or time-sensitive.
+5. If incremental memory retrieval is skipped, note reason in report trail.
 
 ## Type-Aware Reporting Requirements
 

diff --git a/.agents/skills/experiment-execution/SKILL.md b/.agents/skills/experiment-execution/SKILL.md
@@ -81,7 +81,7 @@ Retry behavior should be mode-aware and evidence-driven.
 
 1. Choose control mode: direct SSH, SSH+session manager, scheduler, or existing remote agent.
 2. Declare remote model: remote-native or local-driver.
-3. If project-context has remote profile, confirm reuse policy before launch.
+3. Use remote profile reuse decision from `run-governor`; if missing, request exactly one confirmation via `human-checkpoint`.
 4. Validate connectivity and runtime basics before expensive launch when uncertainty exists.
 
 ## Logging and Failure Handling

diff --git a/.agents/skills/memory-manager/SKILL.md b/.agents/skills/memory-manager/SKILL.md
@@ -85,6 +85,69 @@ Treat stale working state as risk:
 3. Force review before high-resource actions.
 4. Force review after interruptions or unexpected failures.
 
+## Invocation Schedule (Balanced, Non-Aggressive)
+
+1. Mandatory once-per-run operations:
+   - bootstrap `retrieve/init-working` after intake and before planning/execution
+   - close-out writeback before final task completion
+2. Trigger-based operations between bootstrap and close-out:
+   - stage transition
+   - replan
+   - significant failure or new error signature
+   - before high-resource action
+   - before final answer/report handoff
+3. Periodic `working` refresh is required when either is true:
+   - at least 15 minutes since last memory operation
+   - at least 3 execution cycles since last memory operation
+4. Cooldown:
+   - no more than one non-forced memory operation per cycle
+   - skip when state delta is negligible
+5. Anti-overuse policy:
+   - do not write memory after every command/tool call
+   - prefer compact delta updates over full rewrites
+   - skip repeated retrieval if last retrieval is fresh and task/error signature is unchanged
+6. Command-gap fallback:
+   - if 5 consecutive commands/actions complete without a memory update, force one `working` refresh.
+   - treat this as a low-cost sync update (delta-first, concise).
+7. When skipped, log `memory_skip_reason` for auditability.
+
+## Post-Compression Recovery (Required)
+
+When memory is auto-compressed/summarized:
+
+1. Immediately run a `working` re-read before the next execution step.
+2. Rebuild `working` fields from recent evidence:
+   - latest stage report
+   - latest action/observation logs
+   - latest todo diff (`todo_active/todo_done/todo_blocked`)
+3. Publish a compact "post-compression state snapshot" and continue only after snapshot is consistent.
+
+## Layered Retrieval Timing
+
+Use layer-specific retrieval timing to avoid over-calling:
+
+1. `working` retrieve:
+   - mandatory bootstrap
+   - periodic refresh by Invocation Schedule
+   - mandatory after memory compression
+2. `episode` retrieve:
+   - at run start for same project/task_type
+   - at replan or major failure to avoid repeating failed paths
+3. `procedure` retrieve:
+   - before executing a new stage plan
+   - before high-resource or irreversible actions
+   - when repeated failure indicates a known SOP may exist
+4. `insight` retrieve:
+   - during planning/replanning for hypothesis shaping
+   - when evidence conflicts or root cause is unclear
+   - before final report/answer to run contradiction/boundary checks
+5. `persona` retrieve:
+   - once at run start
+   - on interaction mode switch or explicit user preference change
+   - before final user-facing delivery for style/alignment consistency
+6. Retrieval cooldown:
+   - `procedure/insight/persona` at most once per stage unless a new trigger appears.
+
 ## Recovery on Context Drift
 
 If execution becomes repetitive or confused:
@@ -130,3 +193,4 @@ For each memory operation, emit:
 4. `Rationale`
 5. `Evidence`
 6. `Result`
+7. `Trigger` (`bootstrap|stage-change|replan|error|high-resource|periodic|close-out`)
diff --git a/.agents/skills/project-context/SKILL.md b/.agents/skills/project-context/SKILL.md
@@ -46,7 +46,7 @@ Do not ask for all fields at once.
 1. infer task type (`report|sft|rl|eval|generic`)
 2. load existing `context.json` + `secrets.json`
 3. auto-detect non-sensitive environment values where possible
-4. if execution target is `remote`, show stored remote profile and ask whether to reuse it
+4. if execution target is `remote`, consume reuse decision from `run-governor` first; ask only if decision is missing
 5. ask only for missing required fields for the current task
 6. during execution, allow blocker-only delta prompts (e.g. missing API URL/key)
 7. persist immediately for reuse
@@ -64,9 +64,9 @@ If new missing fields appear later, run preflight again and collect only deltas.
 
 Recommended order in research execution:
 
-1. `run-governor` initializes mode and `run_id`
-2. `run-governor` collects `local|remote` target
-3. `project-context` preflight resolves runtime context and remote reuse decision
+1. `run-governor` collects and confirms mode + `local|remote` target
+2. `run-governor` initializes `run_id`
+3. `project-context` preflight resolves runtime context and consumes remote reuse decision
 4. `experiment-execution` runs with resolved context
 5. `project-context` snapshot writes run-scoped frozen context
 

diff --git a/.agents/skills/research-workflow/SKILL.md b/.agents/skills/research-workflow/SKILL.md
@@ -14,16 +14,18 @@ Drive AI R&D tasks with small, testable, evidence-first steps while respecting t
 For non-trivial tasks, run this order:
 
 1. Initialize run policy with `run-governor`.
-2. Understand user objective and current code/evidence state.
-3. Clarify ambiguous requirements through `human-checkpoint`.
-4. Complete intake checkpoint before planning or decomposition.
-5. Run deep research when needed.
-6. Build an execution plan (use `research-plan` for planning-heavy requests).
-7. Confirm plan as required by mode.
-8. Execute with working-memory todo tracking.
-9. Replan on major issues when needed.
-10. Emit stage reports and maintain report index.
-11. Close task, then optionally publish shared memory.
+2. Resolve runtime context with `project-context` before experiment/report/eval execution.
+3. Understand user objective and current code/evidence state.
+4. Clarify ambiguous requirements through `human-checkpoint`.
+5. Complete intake checkpoint before planning or decomposition.
+6. Run one `memory-manager` bootstrap (`retrieve/init-working`).
+7. Run deep research when needed.
+8. Build an execution plan (use `research-plan` for planning-heavy requests).
+9. Confirm plan as required by mode.
+10. Execute with trigger-based working-memory updates.
+11. Replan on major issues when needed.
+12. Emit stage reports and maintain report index.
+13. Close task, write memory close-out, then optionally publish shared memory.
 
 ## Mode-Aware Interaction Policy
 
@@ -50,14 +52,26 @@ Route required user interactions through `human-checkpoint`:
 3. Apply this routing to intake clarification, plan confirmation, replan confirmation, and parameter approvals.
 4. Log channel choice as `interaction_channel=request_user_input|plain-text-fallback` and include `fallback_reason` when used.
 
+## Mid-Run Intent Switch Gate (Mandatory)
+
+On each new user message:
+
+1. Re-evaluate objective and skill routing before executing the next pending action.
+2. If user intent shifts to research/scoping/comparison/root-cause inquiry, activate `deep-research` immediately.
+3. Do not continue stale execution plans when the objective changed materially.
+4. If `deep-research` is skipped, emit `dr_skip_reason` with freshness evidence (date/timestamp and source coverage), then continue.
+5. Cooldown:
+   - no more than one non-forced deep-research call per stage.
+   - bypass cooldown when objective changed, contradiction appears, or high-impact uncertainty remains unresolved.
+
 ## Default Execution Loop
 
 Repeat this loop until completion:
 
 1. Update success criteria.
 2. Collect or refresh evidence.
 3. Plan the smallest useful next action.
-4. Refresh working todo state.
+4. Refresh working todo state only when memory trigger conditions are met.
 5. Act.
 6. Observe outputs.
 7. Evaluate result quality and risk.
@@ -67,17 +81,32 @@ Repeat this loop until completion:
 
 Use these in combination:
 
-1. Treat memory as an optional accelerator, not a hard prerequisite.
-2. Use search/deep research directly when topic is time-sensitive, new, or currently blocked.
-3. For open-ended research/scoping requests, run deep research before giving decomposition or roadmap recommendations.
-4. For unknown errors, use this branch:
+1. `memory-manager` bootstrap is mandatory before planning/execution for non-trivial runs.
+2. Between bootstrap and close-out, memory operations are trigger-based and non-aggressive.
+3. Trigger memory operation when one of the following occurs:
+   - stage transition
+   - replan
+   - significant error or new error signature
+   - memory auto-compression/summarization completed
+   - before high-resource action
+   - before final answer/report handoff
+4. Periodic `working` memory refresh is required when either holds:
+   - at least 15 minutes since last memory operation
+   - at least 3 execution cycles since last memory operation
+5. Command-gap fallback: if 5 consecutive commands/actions finish without a memory update, force one concise `working` refresh.
+6. Cooldown: no more than one non-forced memory operation per cycle.
+7. Avoid per-command memory writes; batch observations into one delta update.
+8. Use search/deep research directly when topic is time-sensitive, new, or currently blocked.
+9. For open-ended research/scoping requests, run deep research before giving decomposition or roadmap recommendations.
+9.1 For mid-run new research requests, run deep research re-entry before further execution.
+10. For unknown errors, use this branch:
    - local evidence triage (logs, stack trace, recent changes)
    - targeted search
    - deep research (debug-investigation) if still unresolved
    - minimal fix validation
-5. If skipping memory before search, record reason in the stage report.
-6. If intake information is missing, trigger `human-checkpoint` before deep research or planning.
-7. If deep research was used for open-ended scoping, hand off to `research-plan` to convert findings into an execution-ready plan. Skip only if the user explicitly opts out.
+11. If skipping memory due to cooldown or low-value delta, record reason in the stage report.
+12. If intake information is missing, trigger `human-checkpoint` before deep research or planning.
+13. If deep research was used for open-ended scoping, hand off to `research-plan` to convert findings into an execution-ready plan. Skip only if the user explicitly opts out.
 
 ## Replanning Policy
 

diff --git a/.agents/skills/run-governor/SKILL.md b/.agents/skills/run-governor/SKILL.md
@@ -78,6 +78,16 @@ Hard constraints:
 6. Confirmation collection must be mediated by `human-checkpoint`.
 7. Any assumption for mode/target is non-compliant, even when likely.
 
+## Memory Bootstrap Gate
+
+Before transitioning from initialization to execution workflow:
+
+1. Set `memory_policy=balanced-triggered` unless user explicitly overrides.
+2. Ensure one `memory-manager` bootstrap operation is complete:
+   - `retrieve` or `init-working` for current project/task context.
+3. If bootstrap is missing, mark status `blocked-awaiting-memory-bootstrap`.
+4. This gate enforces only the bootstrap, not per-step memory writes.
+
 ## Run Identity and Directories
 
 Use one run identifier:
@@ -176,6 +186,7 @@ For each run-governor action, emit:
 7. `Confirmation`: `user_confirmed_mode`, `user_confirmed_execution_target`, and whether initialization is permitted (`YES|NO`)
 8. `Compliance`: `gate_status=pass|blocked`, with blocked reason when applicable
 9. `Interaction`: `interaction_transport` and optional `fallback_reason`
+10. `Memory`: `memory_policy` and `memory_bootstrap_done=YES|NO`
 
 ## Violation Recovery Policy
 

diff --git a/AGENTS.md b/AGENTS.md
@@ -17,6 +17,33 @@ This workspace is for AI research and development tasks (reproduction, debugging
 12. Follow `REPO_CONVENTIONS.md` for artifact placement and commit hygiene.
 13. If a run was initialized before confirmation, stop and run violation recovery: acknowledge, ask whether to keep/clean artifacts, and wait for explicit reconfirmation before continuing.
 
+## Memory Invocation Guardrails (Balanced)
+1. `memory-manager` is mandatory for non-trivial runs, but only as a control-plane step, not per command.
+2. Mandatory calls per non-trivial run:
+   - one bootstrap `retrieve/init-working` before planning or execution
+   - one close-out writeback before task completion
+3. Conditional calls between bootstrap and close-out are trigger-based only:
+   - stage change
+   - replan
+   - significant failure or new error signature
+   - before high-resource action
+   - before final report/answer handoff
+4. Periodic refresh is allowed when either is true:
+   - at least 15 minutes since last memory operation
+   - at least 3 execution cycles since last memory operation
+5. Cooldown rule: do not invoke `memory-manager` more than once in a cycle unless forced by safety/high-resource/failure triggers.
+6. If memory is skipped due to cooldown or low delta, record `memory_skip_reason` in the stage report.
+
+## Deep-Research Re-entry Guardrails
+1. On every new user message, re-run skill routing before continuing prior stage actions.
+2. If the new message contains research-intent signals, `deep-research` MUST be activated even mid-run.
+3. Research-intent signals include (semantic match, Chinese or English):
+   - 调研/研究/对比/综述/文献/证据/机制/根因/为什么/可行性/路线图
+   - research/investigate/compare/survey/literature/evidence/mechanism/root-cause/why/feasibility/roadmap
+4. If skipping `deep-research`, emit `dr_skip_reason` with concrete evidence freshness info (source date / timestamp), not a generic statement.
+5. Cooldown for non-forced deep-research calls:
+   - at most once per stage unless objective changed or new contradiction/high-impact uncertainty appears.
+
 ## Skill Paths
 - `.agents/skills/run-governor`
 - `.agents/skills/research-workflow`