Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .codex/evals/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Evals

Use this directory for repo-local eval definitions that measure whether the AI
workflow and the product behavior are improving or regressing.

Recommended layout:

```text
.codex/evals/
templates/
<feature-name>.md
<feature-name>.log
```

For non-trivial changes, define:

- capability evals for the new behavior
- regression evals for the old behavior that must keep working
- clear pass or fail evidence

22 changes: 22 additions & 0 deletions .codex/evals/templates/feature-delivery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# EVAL: <feature-name>

## Capability evals

- [ ] The intended user-visible behavior works end to end.
- [ ] The relevant Playwright journey passes.
- [ ] The expected log evidence is present.

## Regression evals

- [ ] Existing adjacent behavior still works.
- [ ] No new console or runtime errors appear.
- [ ] Build, lint, typecheck, and tests still pass.

## Evidence

- Plan:
- Playwright artifact path:
- CDP artifact path:
- Log query:
- Notes:

8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.runtime/
.worktrees/
.artifacts/
.idea/
playwright-report/
test-results/
dist/
coverage/
82 changes: 82 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# git-ranker-workflow AGENTS

This repository is the control plane for the `git-ranker` backend and the
`git-ranker-client` frontend. Keep this file short. The system of record lives
in [ARCHITECTURE.md](ARCHITECTURE.md) and [docs/](docs/index.md).

## What this repo owns

- Repository-local knowledge store and operating rules for coding agents
- Cross-repo feature delivery workflow, QA loop, and observability workflow
- ExecPlan conventions for long-running tasks
- Guardrails for frontend/backend coordination across the two submodule repos

## Repo map

- `git-ranker/`: backend repo (submodule)
- `git-ranker-client/`: frontend repo (submodule)
- `ARCHITECTURE.md`: top-level control-plane architecture
- `PLANS.md`: rules for long-running ExecPlans
- `docs/`: knowledge store; treat this as the source of truth
- `scripts/`: lightweight verification and scaffolding helpers
- `harness/`: local observability and QA harness configuration
- `.codex/evals/`: eval definitions and templates

## How to start a task

1. Read [ARCHITECTURE.md](ARCHITECTURE.md).
2. Read [docs/index.md](docs/index.md) and the specific docs for the change
surface.
3. If the request spans multiple files, multiple repos, new behavior, or a
likely multi-hour effort, create an ExecPlan in
`docs/exec-plans/active/<yyyy-mm-dd>-<slug>.md` and follow [PLANS.md](PLANS.md).
4. Restate the request in terms of:
- user-visible outcome
- impacted repos
- acceptance checks
- required Playwright/CDP/Loki evidence
5. Work inside a task-specific isolated runtime footprint under `.runtime/` and
`.worktrees/`.

## System of record

- Product intent: [docs/product-specs/index.md](docs/product-specs/index.md)
- Architectural rules: [docs/design-docs/index.md](docs/design-docs/index.md)
- UX and UI behavior: [docs/DESIGN.md](docs/DESIGN.md),
[docs/FRONTEND.md](docs/FRONTEND.md)
- Backend and data behavior: [docs/BACKEND.md](docs/BACKEND.md),
[docs/SECURITY.md](docs/SECURITY.md), [docs/RELIABILITY.md](docs/RELIABILITY.md)
- Quality and cleanup rules: [docs/QUALITY_SCORE.md](docs/QUALITY_SCORE.md)
- Generated facts: [docs/generated/README.md](docs/generated/README.md)
- Workflow loop: [docs/workflows/feature-delivery-loop.md](docs/workflows/feature-delivery-loop.md),
[docs/workflows/qa-feedback-loop.md](docs/workflows/qa-feedback-loop.md)

## Non-negotiables

- Do not turn `AGENTS.md` into a large manual. Promote durable rules into
`docs/` or scripts.
- Do not implement from vague intent. Convert feature requests into explicit
acceptance criteria first.
- Do not ship a user-visible change without QA evidence from:
- automated tests
- Playwright
- browser inspection via CDP or equivalent
- worktree-local logs in Loki or the configured log backend
- Do not treat Slack, chat history, or memory as source of truth. If it matters
later, check it into the repo.
- Do not handwave cross-repo changes. Contract changes must be reflected in
backend, frontend, docs, and validation steps.

## Delivery loop

1. Intake and clarify the request.
2. Write or update an ExecPlan if the task is non-trivial.
3. Implement in backend/frontend worktrees.
4. Run build, typecheck, lint, and tests.
5. Boot the isolated stack for the task.
6. Run Playwright journeys.
7. Inspect UI, network, console, and DOM with CDP tooling.
8. Query logs, metrics, and traces for the same task runtime.
9. Feed findings back into code, docs, and the ExecPlan.
10. Record outcomes and remaining debt before handoff or merge.

153 changes: 153 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# git-ranker Workflow Architecture

## Purpose

This repository is the orchestration layer for an agent-first development
workflow across two application repositories:

- `git-ranker`: backend system of record for APIs, jobs, persistence, and domain
rules
- `git-ranker-client`: frontend system of record for routes, components, user
flows, and client-side state

The control plane in this repo exists to make the product legible to coding
agents, not to store application logic.

## Current repo facts

The submodules are initialized in this workspace and currently expose these
high-level facts:

- backend: Spring Boot 3.4, Java 21, JPA, Batch, Security, Actuator, Prometheus,
structured JSON logging, Testcontainers, ArchUnit
- frontend: Next.js App Router, React 19, TypeScript, ESLint, React Query,
Zustand, Tailwind, Radix UI

Those facts should shape the workflow and harness choices instead of generic
defaults.

## Core principle

Repository-local knowledge is the system of record. A coding agent should be
able to understand the product, architecture, quality bar, and execution flow
from versioned artifacts in this repository plus the checked-out submodules.

## Control-plane flow

```text
feature request
-> request intake and acceptance contract
-> ExecPlan for non-trivial work
-> backend contract / behavior changes
-> frontend integration / UI changes
-> isolated task runtime
-> Playwright + CDP validation
-> logs / metrics / traces review
-> fix loop
-> PR / merge / debt update
```

## Worktree model

Every non-trivial task should use an isolated runtime footprint keyed by a task
slug, for example `rank-comparison-filtering`.

Expected layout:

```text
.worktrees/
backend/<task-slug>/
frontend/<task-slug>/
.runtime/
<task-slug>/
logs/
traces/
screenshots/
videos/
playwright/
observability/
```

The goal matches OpenAI's harness model:

- one isolated app instance per task
- one isolated observability context per task
- artifacts are disposable once the task is complete

## Knowledge-store layout

```text
AGENTS.md
ARCHITECTURE.md
PLANS.md
docs/
design-docs/
exec-plans/
generated/
product-specs/
references/
workflows/
```

`AGENTS.md` is only the table of contents. The durable knowledge lives in
`docs/`.

## Cross-repo contract

The repositories are versioned independently, but the workflow treats them as a
single product system. A change request must identify which of the following are
affected:

- backend domain rules
- backend API or event contracts
- frontend route or component behavior
- shared product language and acceptance criteria
- reliability, security, or QA evidence

Any contract change must update both sides of the boundary plus the knowledge
store if the change affects future tasks.

## Layering model

The two repos should converge on one directional dependency model:

```text
Types -> Schemas/Contracts -> Repository/Gateway -> Service/Use Case
-> Runtime/Delivery -> UI or HTTP surface

Cross-cutting concerns enter only through Providers:
auth, feature flags, telemetry, configuration, external connectors
```

This is intentionally rigid. Agents move faster when the allowed edges are
obvious and mechanically enforceable.

## QA and observability loop

Every user-visible change is expected to produce:

- automated regression evidence
- a Playwright run over the affected journey
- CDP evidence for DOM, console, network, and screenshot state
- log evidence from the isolated task runtime
- metrics and trace evidence when performance or async flow matters

The recommended local stack is documented in
[docs/workflows/local-observability-stack.md](docs/workflows/local-observability-stack.md).
The implementation provided in `harness/` uses Loki, Prometheus, Tempo, and
Grafana to preserve the same agent-facing query model described by OpenAI:
LogQL, PromQL, and TraceQL.

## What stays out of this repo

- application code that belongs in `git-ranker` or `git-ranker-client`
- private tribal knowledge that should instead be turned into docs
- ad hoc task notes that never graduate into reusable rules

## Current limitations

- the frontend repo does not yet contain committed Playwright or test config
- the harness knows the backend metrics endpoint, but frontend metrics and trace
export wiring are still generic
- repo-specific start scripts and local env bootstrapping still need to be
codified into the harness
83 changes: 83 additions & 0 deletions PLANS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# ExecPlans for git-ranker-workflow

This document adapts OpenAI's `PLANS.md` pattern to a two-repository product
workflow. Use it for any task that is likely to take more than one session,
spans multiple files or repos, changes contracts, or requires non-trivial QA.

## When to create an ExecPlan

Create an ExecPlan when any of the following are true:

- the request spans backend and frontend
- the request changes API, schema, routing, or product behavior
- the work is expected to last more than 30 minutes
- you need a reproducible QA and feedback loop
- you expect to stop and resume later

Store plans in `docs/exec-plans/active/<yyyy-mm-dd>-<slug>.md`.

## Non-negotiable rules

- Every ExecPlan must be self-contained.
- Every ExecPlan must remain a living document.
- Every ExecPlan must let a novice continue from only the working tree and the
plan file.
- Every ExecPlan must describe observable outcomes, not just code edits.
- Every ExecPlan must define the validation loop clearly.

## Repo-specific additions

Every plan in this repository must also include:

- impacted repo list: backend, frontend, or both
- request intake summary in plain language
- contract boundary notes
- exact task runtime slug
- expected Playwright journeys
- expected CDP evidence
- expected Loki or log-backend queries
- rollback or retry notes for each risky step

## Required sections

Every ExecPlan must keep these sections current:

- `Purpose / Big Picture`
- `Progress`
- `Surprises & Discoveries`
- `Decision Log`
- `Outcomes & Retrospective`
- `Context and Orientation`
- `Plan of Work`
- `Concrete Steps`
- `Validation and Acceptance`
- `Idempotence and Recovery`
- `Artifacts and Notes`
- `Interfaces and Dependencies`

## Formatting

The plan file itself should contain one single fenced code block labeled `md`.
Do not nest other fenced blocks inside the plan. Use indentation for commands,
snippets, and transcripts.

## Required execution rhythm

1. Clarify the user's request in product language.
2. Identify impacted repos and documents.
3. Research before implementation.
4. Update the plan before and after every material milestone.
5. Validate behavior in the isolated task runtime.
6. Record the evidence path for screenshots, videos, traces, and logs.
7. Update docs when a new durable rule or system fact is discovered.

## Plan naming

Use a sortable filename:

`docs/exec-plans/active/2026-03-07-rank-comparison-filtering.md`

## Template

Start from `docs/exec-plans/_template.md`.

Loading