Skip to content

leswww/agent-long-task-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒ English README | ไธญๆ–‡ๆ–‡ๆกฃ

Agent Long Task Skill

Stop AI coding agents from saying โ€œdoneโ€ while leaving half the task unfinished.

Agent Long Task Skill is a transparent, Markdown-first long-task execution protocol for AI coding agents. It helps agents refine messy requirements, create structured task plans, track progress, recover from failures, verify completion, and generate handoff-ready reports.

The Problem

AI coding agents are useful, but long tasks expose predictable failure modes:

  • They miss requirements during multi-step work.
  • They say "done" while leaving bugs, skipped items, or unfinished work.
  • Long prompts are messy, token-heavy, repetitive, and easy to misunderstand.
  • Context loss makes continuation difficult after interruptions.
  • A simple checklist is not enough when tasks require investigation, recovery, verification, and handoff.

The Solution

This Skill defines a full lifecycle:

messy user request
โ†’ requirement refinement and compression
โ†’ structured execution document
โ†’ task list
โ†’ step-by-step execution
โ†’ failure/blocker recovery attempts
โ†’ final verification pass
โ†’ final user-facing report

The agent should first compress a vague or multi-requirement request into a concise execution document. It should preserve the user's intent while removing repetition, ambiguity, and token-heavy wording. Then it maps every requirement to one or more task IDs, executes tasks one by one, records failures and blockers honestly, performs a final verification audit, and reports the real result.

Requirement Refinement

The first output should be a concise execution document, not a long restatement of the user's entire prompt.

  • Do not copy the user's full messy prompt repeatedly.
  • Convert vague language into concise executable requirements.
  • Keep each task short and testable.
  • Keep acceptance criteria specific but not verbose.
  • Preserve traceability from original requirement to task ID.
  • Prefer concise summaries over long explanations.

Core Capabilities

  • Requirement refinement before implementation.
  • A structured task plan with stable task IDs.
  • Status tracking for every item.
  • Acceptance criteria for each task.
  • Failure and blocker recovery attempts within task scope.
  • Final verification before claiming completion.
  • Handoff notes when work is interrupted, blocked, or too large for one pass.
  • Chinese-friendly and multilingual user-facing task output.

Final Verification

Final verification is an audit pass, not a second full execution pass.

During final verification, the agent reviews the task list, changed files, test/build results, acceptance criteria, and visible outcomes. It should not redo every task from the beginning. If verification finds a specific missing, incomplete, or broken item, the agent should perform a targeted fix and update the relevant task status.

Only after every required task is verified may the agent claim the long task is complete. If any task remains pending, in_progress, failed, blocked, or unverified, the agent must not claim full completion.

Who It Is For

Agent Long Task Skill is for:

  • Claude Code users.
  • Codex users.
  • OpenCode users.
  • AI coding agent users.
  • Developers doing multi-file fixes, refactors, productization, bug batches, UI redesigns, continuous delivery cleanup, or long implementation tasks.

The default language is English, but the workflow is useful for Chinese and multilingual teams because user-facing task titles, notes, handoff reports, and final reports can follow the user's language.

Chinese-friendly Task Output

This repository is Markdown-first and does not include a graphical UI. Here, task UI support means the generated task plan, status summary, handoff report, and final report can use localized user-facing labels.

If the user prompt is in Chinese, the agent should produce user-facing task titles, notes, summaries, handoff reports, and final reports in Chinese by default. If the prompt is in English, it should use English by default.

Machine-readable task status values stay stable in English:

pending
in_progress
done
failed
blocked
verified
skipped

English display labels:

{
  "pending": "โšช Pending",
  "in_progress": "๐ŸŸก In Progress",
  "done": "โœ… Done",
  "failed": "๐Ÿ”ด Failed",
  "blocked": "๐ŸŸ  Blocked",
  "verified": "๐Ÿ”ต Verified",
  "skipped": "๐ŸŸฃ Skipped"
}

Chinese display labels:

{
  "pending": "โšช ๅพ…ๅค„็†",
  "in_progress": "๐ŸŸก ่ฟ›่กŒไธญ",
  "done": "โœ… ๅทฒๅฎŒๆˆ",
  "failed": "๐Ÿ”ด ๅคฑ่ดฅ",
  "blocked": "๐ŸŸ  ้˜ปๅกž",
  "verified": "๐Ÿ”ต ๅทฒ้ชŒๆ”ถ",
  "skipped": "๐ŸŸฃ ๅทฒ่ทณ่ฟ‡"
}

Installation

Install as a user-level Claude skill:

Copy this folder to:
~/.claude/skills/agent-long-task-skill/

Install at the project level:

.claude/skills/agent-long-task-skill/

No package manager is required. There is no backend, web app, telemetry, network behavior, install script, or hidden automation.

Usage

Ask your coding agent to use the Skill before a long or messy implementation task:

Use this skill for this long task: fix the settings page, clean up the upload flow, verify the build, and write a final report.

You can also be more explicit:

This is a multi-requirement task. Refine it into a task plan first, then execute and verify item by item.

For Chinese tasks:

่ฟ™ๆ˜ฏไธ€ไธชๅคš้œ€ๆฑ‚้•ฟไปปๅŠกใ€‚ๅ…ˆๆŠŠ้œ€ๆฑ‚ๅŽ‹็ผฉๆ•ด็†ๆˆไปปๅŠก่ฎกๅˆ’๏ผŒๅ†ๆŒ‰ไปปๅŠก ID ้€้กนๆ‰ง่กŒใ€้ชŒๆ”ถๅนถ่พ“ๅ‡บๆœ€็ปˆๆŠฅๅ‘Šใ€‚

Generated Output Files

When used during an agent run, the Skill may generate:

.agent/task-plan.json
.agent/status.json
.agent/handoff.md
.agent/final-report.md

These files are execution artifacts. They are intentionally plain text or JSON so humans can inspect them.

Status Definitions

Use these machine-readable statuses consistently:

pending
in_progress
done
failed
blocked
verified
skipped
  • pending: not started.
  • in_progress: currently being worked on.
  • done: implementation work appears completed, but final verification may still be pending.
  • verified: checked against acceptance criteria and confirmed complete.
  • failed: attempted and failed after reasonable recovery attempts.
  • blocked: cannot proceed due to missing information, dependency, permission, or technical blocker.
  • skipped: intentionally skipped with a stated reason.

Important distinction: done is not the same as verified. A task should only become verified after the final verification pass or after explicit acceptance checking. The agent must not claim full completion while important tasks are only done but not verified.

Failure and Blocker Handling

When a task fails, the agent should:

  • Diagnose the likely cause.
  • Try reasonable safe recovery methods within scope.
  • Avoid infinite retries.
  • Avoid unrelated large rewrites unless necessary.
  • Record what was tried.
  • If unresolved, mark failed and provide a recovery path.

When a task is blocked, the agent should:

  • Mark blocked immediately.
  • Explain the blocker.
  • State what information, dependency, permission, API key, design decision, or user input is needed.
  • Continue with other independent tasks when possible.

Live Task Progress

During execution, the agent keeps a full ordered task board visible in progress updates. This makes long tasks easier to follow because the user can see all tasks, current status, remaining work, failures, blockers, and verification progress at a glance.

This is a Markdown progress board, not a graphical dashboard.

The live task board is different from the final report. It is shown during execution, should stay near the bottom of progress updates when practical, and should be generated from the current task plan state rather than a separate progress list.

Example:

## Live Task Progress

| # | Task | Status |
|---|---|---|
| 1 | Refine requirements and create task plan | ๐Ÿ”ต Verified |
| 2 | Fix homepage course entry points | ๐ŸŸก In Progress |
| 3 | Fix mobile back button behavior | โšช Pending |
| 4 | Run build verification | โšช Pending |

Remaining: 2 | Done: 1 | Verified: 1 | Failed: 0 | Blocked: 0

Screenshots

Live Task Progress

The Skill keeps a full ordered task board visible during long-task execution, so users can quickly see what is done, what is in progress, what is pending, and what is blocked.

Live Task Progress

Final Report

After final verification, the agent returns a compact completion report with task status, key changes, verification performed, and recommended local checks.

Final Report

Final Report Behavior

The final response must include:

  • Completed and verified tasks.
  • Failed tasks.
  • Blocked tasks.
  • Skipped tasks.
  • Files changed.
  • Verification performed.
  • Remaining risks.
  • Next recommended actions.

If all required tasks are verified, the agent may say:

All required tasks have been completed and verified.

If not all required tasks are verified, the agent must say:

The task is not fully complete yet. The following items remain failed, blocked, pending, or unverified.

Final Report Style

After all tasks are attempted and final verification is complete, the agent should return a compact report that is easy to scan. It should not only say "done."

The final report should include:

  • Task status table.
  • Key changes.
  • Verification performed.
  • Failed or blocked items.
  • Recommended local checks.

The status table is required. It should show at least the task number, task title, and user-facing status label:

## Final Report

| # | Task | Status |
|---|---|---|
| 1 | Fix homepage and course entry points | ๐Ÿ”ต Verified |
| 2 | Run build verification | ๐Ÿ”ต Verified |

Final verification must be reflected in the final report. The report should show verified completion status, not just implementation status. Failed, blocked, skipped, pending, or unverified tasks must not be hidden.

Use recommended local checks when the user may need to verify UI, device behavior, build output, runtime behavior, deployment, or account-specific behavior.

Why Markdown-first?

Markdown-first keeps the protocol:

  • Transparent.
  • Auditable.
  • Free of hidden runtime behavior.
  • Free of network access.
  • Easy to adapt.
  • Compatible with multiple agents.

The Skill is designed to be understood by humans first and automated later only if that becomes useful.

Repository Layout

agent-long-task-skill/
โ”œโ”€ README.md
โ”œโ”€ README.zh-CN.md
โ”œโ”€ SKILL.md
โ”œโ”€ LICENSE
โ”œโ”€ docs/
โ”‚  โ””โ”€ images/
โ”‚     โ”œโ”€ live-task-board.png
โ”‚     โ””โ”€ final-report.png
โ”œโ”€ templates/
โ”‚  โ”œโ”€ task-plan.md
โ”‚  โ”œโ”€ acceptance-checklist.md
โ”‚  โ”œโ”€ handoff.md
โ”‚  โ”œโ”€ final-report.md
โ”‚  โ”œโ”€ live-task-board.md
โ”‚  โ””โ”€ zh-CN/
โ”‚     โ”œโ”€ task-plan.md
โ”‚     โ”œโ”€ acceptance-checklist.md
โ”‚     โ”œโ”€ handoff.md
โ”‚     โ”œโ”€ final-report.md
โ”‚     โ””โ”€ live-task-board.md
โ”œโ”€ schemas/
โ”‚  โ””โ”€ agent-task-plan.schema.json
โ””โ”€ examples/
   โ”œโ”€ messy-user-request.md
   โ”œโ”€ messy-user-request.zh-CN.md
   โ”œโ”€ refined-task-plan.json
   โ”œโ”€ refined-task-plan.zh-CN.json
   โ”œโ”€ handoff-example.md
   โ”œโ”€ handoff-example.zh-CN.md
   โ”œโ”€ final-report-example.md
   โ”œโ”€ final-report-example.zh-CN.md
   โ”œโ”€ live-task-board-example.md
   โ””โ”€ live-task-board-example.zh-CN.md

Roadmap

v0.1

  • SKILL.md
  • Templates
  • JSON schema
  • Examples

v0.2

  • Stronger schemas
  • More examples
  • AGENTS.md, CLAUDE.md, and Codex instruction templates

v0.3

  • Optional CLI or dashboard integration
  • Status viewer
  • code-hub compatibility

License

MIT License.

About

A Markdown-first long-task execution skill for AI coding agents to refine requirements, track progress, verify completion, and generate handoff reports.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors