Harness engineering is the discipline of designing environments, repo knowledge, and feedback loops so an AI agent can execute work reliably. The work shifts from "write the code by hand" to "make the intended behavior legible, verifiable, and recoverable."
This repository is organized around three pillars: Legibility (navigation), Autonomy (verification and safe operational CLI access), and Human-AI Interface (the bidirectional channel between humans and the system, mediated by the agent — inbound for request refinement, outbound for operational signal interpretation). The six outcomes in docs/outcomes.md are the concrete proof model.
If the agent cannot discover the structure, purpose, and boundaries of the repository from in-repo artifacts, it will guess. Durable repo knowledge matters more than chat explanations.
- a short root
AGENTS.mdthat acts as a map, not an encyclopedia - subdirectory
AGENTS.mdfiles for complex or high-risk modules - references and examples that live in-repo rather than in people's heads
- explicit UX intent for modules that affect user-visible behavior
Structure context in layers:
- Root
AGENTS.md: project overview, commands, directory map, conventions - Subdirectory
AGENTS.md: local purpose, UX intent, key files, gotchas - Reference docs: deeper examples, history, schemas, and guides
Any change to structure, key commands, conventions, or module boundaries should update the relevant AGENTS.md files. Stale guidance is worse than missing guidance because it teaches the agent the wrong thing confidently.
An agent that can only write code is not enough. It needs trusted ways to tell whether the repository is working now, whether a change passed the right tests, and whether a runtime path is meaningfully usable. Autonomy here includes tests and smoke paths, and—where applicable—using operational CLIs for evidence; it does not require full parity with production.
- baseline audits that describe the current state before remediation
- fast targeted test execution for the code that changed
- isolated or safe environments for verification
- one trusted, non-destructive smoke path
- explicit approval boundaries for higher-risk actions
The user should be able to point to concrete evidence for each north-star outcome. Names, proof criteria, and primary skills are defined only in docs/outcomes.md; in recommended order they are: Validate Current State, Navigate, Self-Test, Smoke Path, Bug Reproduction, and SRE Investigation.
Document three permission tiers in the root AGENTS.md:
- Autonomous: safe read, test, and local verification steps
- Supervised: actions that require review before they take effect
- Restricted: actions that require explicit approval every time
Never leave these boundaries implicit.
Legibility and Autonomy let the agent read and verify. The Human-AI Interface pillar is about the agent being a channel between humans and the system — in both directions. A codebase can be perfectly legible and fully self-verifying and still fail its people if vague requests arrive unchallenged or if production pain never surfaces back to the decision-makers.
The Interface pillar compounds after Legibility and Autonomy are in place. Without AGENTS.md context or trustworthy tests/smoke paths, the agent cannot critique a request or interpret a signal with grounded confidence.
Inbound channel — human → system (interface-ticket-writer)
- Refines vague requests into tickets an agent can one-shot
- Surfaces missing edge cases, assumptions, success criteria before work starts
- A workshop pattern, not an autocomplete
Outbound channel — system → human (interface-sre-agent)
- Reproduce-before-fix discipline for bugs (fixes tie back to evidence the repo can rerun)
- Reads logs, metrics, traces, CI, and cloud CLIs — surfaces ranked hypotheses with evidence
- Relies on
autonomy-sre-auditorhaving first proven that the required CLIs work
A system where humans can critique intake but production signals never reach the user through the agent is half-wired. A system where the agent diagnoses alerts but no one is refining the resulting work into executable tickets is also half-wired. Code-mint keeps both channels visible because agent readiness is not only about code execution; it is also about interpretation at the boundaries where humans and systems meet.
Outcome names, proof criteria, and primary skill mappings are defined in docs/outcomes.md. Track progress and evidence in docs/onboarding-checklist.md. .agents/code-mint-status.json provides a machine-readable index of outcome statuses for cross-repo scanning.
AWS AI-DLC is a useful external reference, but code-mint does not implement or vendor its full lifecycle. Two ideas are adopted directly because they improve audit quality without changing code-mint's outcome model:
| Practice | How code-mint uses it |
|---|---|
| Adaptive depth | Auditors can run at quick, standard, or deep depth depending on repo age, risk, and recency of prior evidence. |
| Calibration | Audit reports name confidence, what was not checked, and what would raise confidence. |
Workspace heritage (Greenfield, Brownfield, Legacy) is also used during onboarding as a lightweight calibration aid. AIDLC construction workflows, opt-in extension systems, and operations-phase artifacts are intentionally not part of this repository.
These standards keep the repo legible as agent throughput increases.
Dependencies should flow one way:
Types → Config → Repository → Service → Runtime → UI
Agents are fast enough to create structural drift quickly. Clear dependency boundaries reduce that risk.
Before creating a new file, the agent should:
- look for an existing home for the logic
- check local
AGENTS.mdguidance - place any new file in the correct module structure
Keep types, config, business rules, and database access in one clear layer each. Duplicated logic creates contradictions that both humans and agents will keep reinforcing.
- user-facing errors should be clear and actionable
- internal failures should include enough context to debug
- error handling patterns should be consistent and documented
- swallowed errors and generic catch-all behavior should be treated as drift
Documentation alone is not enough. The best harnesses gradually promote important guidance into tooling:
AGENTS.mdfor discoverability- rules for persistent context
- linters for naming and architecture constraints
- structural tests for dependency boundaries and coverage expectations
- recurring cleanup work that catches drift before it spreads
When prose keeps getting ignored, encode the constraint directly into the system.