Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions .agents/skills/deep-research/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@ Iterate until evidence quality is sufficient:
8. Run contradiction/counter-evidence checks.
9. Synthesize and produce final report.

## Re-entry Policy (Mid-Run)

When called during an ongoing run (not only at run start):

1. Treat invocation as valid and do not require starting a new run by default.
2. Recompute objective delta versus current stage plan.
3. If objective changed materially, reset research focus and run fresh query batches.
4. If objective is similar, perform incremental deep research using existing evidence as baseline.
5. If skipped due to sufficient evidence freshness, emit `dr_skip_reason` with explicit date windows and source counts.

## Scoping-to-Planning Handoff Policy

When deep research is used for open-ended scoping (`idea-exploration`), hand off findings to `research-plan` as the required default next step. Skip only if the user explicitly opts out.
Expand Down Expand Up @@ -234,10 +244,11 @@ Degrade rules:

## Memory and Search Policy

1. Memory lookup is optional and situational.
2. Use memory when likely to reduce repeated search effort.
3. Use search/deep research directly when topic is new, urgent, or time-sensitive.
4. If memory is skipped, note reason in report trail.
1. Global memory bootstrap (from `run-governor` / `research-workflow`) is mandatory for non-trivial runs.
2. Within deep-research, additional memory retrieval is optional and situational.
3. Use incremental memory retrieval when it can reduce repeated search effort or contradiction resolution cost.
4. Use search/deep research directly when topic is new, urgent, or time-sensitive.
5. If incremental memory retrieval is skipped, note reason in report trail.

## Type-Aware Reporting Requirements

Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/experiment-execution/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Retry behavior should be mode-aware and evidence-driven.

1. Choose control mode: direct SSH, SSH+session manager, scheduler, or existing remote agent.
2. Declare remote model: remote-native or local-driver.
3. If project-context has remote profile, confirm reuse policy before launch.
3. Use remote profile reuse decision from `run-governor`; if missing, request exactly one confirmation via `human-checkpoint`.
4. Validate connectivity and runtime basics before expensive launch when uncertainty exists.

## Logging and Failure Handling
Expand Down
64 changes: 64 additions & 0 deletions .agents/skills/memory-manager/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,69 @@ Treat stale working state as risk:
3. Force review before high-resource actions.
4. Force review after interruptions or unexpected failures.

## Invocation Schedule (Balanced, Non-Aggressive)

1. Mandatory once-per-run operations:
- bootstrap `retrieve/init-working` after intake and before planning/execution
- close-out writeback before final task completion
2. Trigger-based operations between bootstrap and close-out:
- stage transition
- replan
- significant failure or new error signature
- before high-resource action
- before final answer/report handoff
3. Periodic `working` refresh is required when either is true:
- at least 15 minutes since last memory operation
- at least 3 execution cycles since last memory operation
4. Cooldown:
- no more than one non-forced memory operation per cycle
- skip when state delta is negligible
5. Anti-overuse policy:
- do not write memory after every command/tool call
- prefer compact delta updates over full rewrites
- skip repeated retrieval if last retrieval is fresh and task/error signature is unchanged
6. Command-gap fallback:
- if 5 consecutive commands/actions complete without a memory update, force one `working` refresh.
- treat this as a low-cost sync update (delta-first, concise).
7. When skipped, log `memory_skip_reason` for auditability.

## Post-Compression Recovery (Required)

When memory is auto-compressed/summarized:

1. Immediately run a `working` re-read before the next execution step.
2. Rebuild `working` fields from recent evidence:
- latest stage report
- latest action/observation logs
- latest todo diff (`todo_active/todo_done/todo_blocked`)
3. Publish a compact "post-compression state snapshot" and continue only after snapshot is consistent.

## Layered Retrieval Timing

Use layer-specific retrieval timing to avoid over-calling:

1. `working` retrieve:
- mandatory bootstrap
- periodic refresh by Invocation Schedule
- mandatory after memory compression
2. `episode` retrieve:
- at run start for same project/task_type
- at replan or major failure to avoid repeating failed paths
3. `procedure` retrieve:
- before executing a new stage plan
- before high-resource or irreversible actions
- when repeated failure indicates a known SOP may exist
4. `insight` retrieve:
- during planning/replanning for hypothesis shaping
- when evidence conflicts or root cause is unclear
- before final report/answer to run contradiction/boundary checks
5. `persona` retrieve:
- once at run start
- on interaction mode switch or explicit user preference change
- before final user-facing delivery for style/alignment consistency
6. Retrieval cooldown:
- `procedure/insight/persona` at most once per stage unless a new trigger appears.

## Recovery on Context Drift

If execution becomes repetitive or confused:
Expand Down Expand Up @@ -130,3 +193,4 @@ For each memory operation, emit:
4. `Rationale`
5. `Evidence`
6. `Result`
7. `Trigger` (`bootstrap|stage-change|replan|error|high-resource|periodic|close-out`)
8 changes: 4 additions & 4 deletions .agents/skills/project-context/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Do not ask for all fields at once.
1. infer task type (`report|sft|rl|eval|generic`)
2. load existing `context.json` + `secrets.json`
3. auto-detect non-sensitive environment values where possible
4. if execution target is `remote`, show stored remote profile and ask whether to reuse it
4. if execution target is `remote`, consume reuse decision from `run-governor` first; ask only if decision is missing
5. ask only for missing required fields for the current task
6. during execution, allow blocker-only delta prompts (e.g. missing API URL/key)
7. persist immediately for reuse
Expand All @@ -64,9 +64,9 @@ If new missing fields appear later, run preflight again and collect only deltas.

Recommended order in research execution:

1. `run-governor` initializes mode and `run_id`
2. `run-governor` collects `local|remote` target
3. `project-context` preflight resolves runtime context and remote reuse decision
1. `run-governor` collects and confirms mode + `local|remote` target
2. `run-governor` initializes `run_id`
3. `project-context` preflight resolves runtime context and consumes remote reuse decision
4. `experiment-execution` runs with resolved context
5. `project-context` snapshot writes run-scoped frozen context

Expand Down
65 changes: 47 additions & 18 deletions .agents/skills/research-workflow/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,18 @@ Drive AI R&D tasks with small, testable, evidence-first steps while respecting t
For non-trivial tasks, run this order:

1. Initialize run policy with `run-governor`.
2. Understand user objective and current code/evidence state.
3. Clarify ambiguous requirements through `human-checkpoint`.
4. Complete intake checkpoint before planning or decomposition.
5. Run deep research when needed.
6. Build an execution plan (use `research-plan` for planning-heavy requests).
7. Confirm plan as required by mode.
8. Execute with working-memory todo tracking.
9. Replan on major issues when needed.
10. Emit stage reports and maintain report index.
11. Close task, then optionally publish shared memory.
2. Resolve runtime context with `project-context` before experiment/report/eval execution.
3. Understand user objective and current code/evidence state.
4. Clarify ambiguous requirements through `human-checkpoint`.
5. Complete intake checkpoint before planning or decomposition.
6. Run one `memory-manager` bootstrap (`retrieve/init-working`).
7. Run deep research when needed.
8. Build an execution plan (use `research-plan` for planning-heavy requests).
9. Confirm plan as required by mode.
10. Execute with trigger-based working-memory updates.
11. Replan on major issues when needed.
12. Emit stage reports and maintain report index.
13. Close task, write memory close-out, then optionally publish shared memory.

## Mode-Aware Interaction Policy

Expand All @@ -50,14 +52,26 @@ Route required user interactions through `human-checkpoint`:
3. Apply this routing to intake clarification, plan confirmation, replan confirmation, and parameter approvals.
4. Log channel choice as `interaction_channel=request_user_input|plain-text-fallback` and include `fallback_reason` when used.

## Mid-Run Intent Switch Gate (Mandatory)

On each new user message:

1. Re-evaluate objective and skill routing before executing the next pending action.
2. If user intent shifts to research/scoping/comparison/root-cause inquiry, activate `deep-research` immediately.
3. Do not continue stale execution plans when the objective changed materially.
4. If `deep-research` is skipped, emit `dr_skip_reason` with freshness evidence (date/timestamp and source coverage), then continue.
5. Cooldown:
- no more than one non-forced deep-research call per stage.
- bypass cooldown when objective changed, contradiction appears, or high-impact uncertainty remains unresolved.

## Default Execution Loop

Repeat this loop until completion:

1. Update success criteria.
2. Collect or refresh evidence.
3. Plan the smallest useful next action.
4. Refresh working todo state.
4. Refresh working todo state only when memory trigger conditions are met.
5. Act.
6. Observe outputs.
7. Evaluate result quality and risk.
Expand All @@ -67,17 +81,32 @@ Repeat this loop until completion:

Use these in combination:

1. Treat memory as an optional accelerator, not a hard prerequisite.
2. Use search/deep research directly when topic is time-sensitive, new, or currently blocked.
3. For open-ended research/scoping requests, run deep research before giving decomposition or roadmap recommendations.
4. For unknown errors, use this branch:
1. `memory-manager` bootstrap is mandatory before planning/execution for non-trivial runs.
2. Between bootstrap and close-out, memory operations are trigger-based and non-aggressive.
3. Trigger memory operation when one of the following occurs:
- stage transition
- replan
- significant error or new error signature
- memory auto-compression/summarization completed
- before high-resource action
- before final answer/report handoff
4. Periodic `working` memory refresh is required when either holds:
- at least 15 minutes since last memory operation
- at least 3 execution cycles since last memory operation
5. Command-gap fallback: if 5 consecutive commands/actions finish without a memory update, force one concise `working` refresh.
6. Cooldown: no more than one non-forced memory operation per cycle.
7. Avoid per-command memory writes; batch observations into one delta update.
8. Use search/deep research directly when topic is time-sensitive, new, or currently blocked.
9. For open-ended research/scoping requests, run deep research before giving decomposition or roadmap recommendations.
9.1 For mid-run new research requests, run deep research re-entry before further execution.
10. For unknown errors, use this branch:
- local evidence triage (logs, stack trace, recent changes)
- targeted search
- deep research (debug-investigation) if still unresolved
- minimal fix validation
5. If skipping memory before search, record reason in the stage report.
6. If intake information is missing, trigger `human-checkpoint` before deep research or planning.
7. If deep research was used for open-ended scoping, hand off to `research-plan` to convert findings into an execution-ready plan. Skip only if the user explicitly opts out.
11. If skipping memory due to cooldown or low-value delta, record reason in the stage report.
12. If intake information is missing, trigger `human-checkpoint` before deep research or planning.
13. If deep research was used for open-ended scoping, hand off to `research-plan` to convert findings into an execution-ready plan. Skip only if the user explicitly opts out.

## Replanning Policy

Expand Down
11 changes: 11 additions & 0 deletions .agents/skills/run-governor/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,16 @@ Hard constraints:
6. Confirmation collection must be mediated by `human-checkpoint`.
7. Any assumption for mode/target is non-compliant, even when likely.

## Memory Bootstrap Gate

Before transitioning from initialization to execution workflow:

1. Set `memory_policy=balanced-triggered` unless user explicitly overrides.
2. Ensure one `memory-manager` bootstrap operation is complete:
- `retrieve` or `init-working` for current project/task context.
3. If bootstrap is missing, mark status `blocked-awaiting-memory-bootstrap`.
4. This gate enforces only the bootstrap, not per-step memory writes.

## Run Identity and Directories

Use one run identifier:
Expand Down Expand Up @@ -176,6 +186,7 @@ For each run-governor action, emit:
7. `Confirmation`: `user_confirmed_mode`, `user_confirmed_execution_target`, and whether initialization is permitted (`YES|NO`)
8. `Compliance`: `gate_status=pass|blocked`, with blocked reason when applicable
9. `Interaction`: `interaction_transport` and optional `fallback_reason`
10. `Memory`: `memory_policy` and `memory_bootstrap_done=YES|NO`

## Violation Recovery Policy

Expand Down
27 changes: 27 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,33 @@ This workspace is for AI research and development tasks (reproduction, debugging
12. Follow `REPO_CONVENTIONS.md` for artifact placement and commit hygiene.
13. If a run was initialized before confirmation, stop and run violation recovery: acknowledge, ask whether to keep/clean artifacts, and wait for explicit reconfirmation before continuing.

## Memory Invocation Guardrails (Balanced)
1. `memory-manager` is mandatory for non-trivial runs, but only as a control-plane step, not per command.
2. Mandatory calls per non-trivial run:
- one bootstrap `retrieve/init-working` before planning or execution
- one close-out writeback before task completion
3. Conditional calls between bootstrap and close-out are trigger-based only:
- stage change
- replan
- significant failure or new error signature
- before high-resource action
- before final report/answer handoff
4. Periodic refresh is allowed when either is true:
- at least 15 minutes since last memory operation
- at least 3 execution cycles since last memory operation
5. Cooldown rule: do not invoke `memory-manager` more than once in a cycle unless forced by safety/high-resource/failure triggers.
6. If memory is skipped due to cooldown or low delta, record `memory_skip_reason` in the stage report.

## Deep-Research Re-entry Guardrails
1. On every new user message, re-run skill routing before continuing prior stage actions.
2. If the new message contains research-intent signals, `deep-research` MUST be activated even mid-run.
3. Research-intent signals include (semantic match, Chinese or English):
- 调研/研究/对比/综述/文献/证据/机制/根因/为什么/可行性/路线图
- research/investigate/compare/survey/literature/evidence/mechanism/root-cause/why/feasibility/roadmap
4. If skipping `deep-research`, emit `dr_skip_reason` with concrete evidence freshness info (source date / timestamp), not a generic statement.
5. Cooldown for non-forced deep-research calls:
- at most once per stage unless objective changed or new contradiction/high-impact uncertainty appears.

## Skill Paths
- `.agents/skills/run-governor`
- `.agents/skills/research-workflow`
Expand Down