Skip to content

0.5.7 - Discovery & classification fixes for monorepo instruction files#20

Merged
cleverhoods merged 13 commits into
mainfrom
0.5.7
May 6, 2026
Merged

0.5.7 - Discovery & classification fixes for monorepo instruction files#20
cleverhoods merged 13 commits into
mainfrom
0.5.7

Conversation

@cleverhoods
Copy link
Copy Markdown
Contributor

Summary

ails check against monorepos misclassified per-package CLAUDE.md / AGENTS.md / GEMINI.md files as main, tripping match: {type: main} rules (size limits, structural checks) on nested files that the agent itself loads on-demand. The bug was reproduced against activepieces/activepieces, an open-source monorepo whose per-package nested CLAUDE.md / AGENTS.md files surfaced the false-positive size findings. Used as a test fixture only - no affiliation.

This release replaces the discovery + classification pipeline, refreshes per-agent configs against verified upstream docs, and adds project-level config knobs for cases the framework might run into some issues modelling it statically (Codex project_doc_fallback_filenames).

What changed

  • Project root for ails check is now itself. No walking up past it looking for .git or .ails/backbone.yml. Files outside the targeted subtree are out of scope. This eliminates the "fixture inside a parent repo leaks parent files" class of bug. engine_helpers._find_project_root retains its walk-up role for cache key derivation only and now recognizes IDE workspace markers (.vscode/, .idea/, .github/) alongside .git / and .ails/backbone.yml.

  • Discovery dispatches by file_type properties.

    • scope: global + loading: session_start + loose leaf pattern (**/X.md or bare X.md) → ancestor walk bounded at cwd.
    • scope: nested (new enum value) → descendant walk excluding cwd. Replaces the previous overload of scope: path_scoped for surfaces whose subtree applicability comes from file LOCATION (subdirectory CLAUDE.md / AGENTS.md / GEMINI.md), not from in-file frontmatter.
    • scope: global + path-prefixed pattern (.github/copilot-instructions.md) → resolve relative to project root.
    • scope: path_scoped (frontmatter-filtered) → descendant walk from cwd.
  • Classification respects pattern shape. _location_matches_mode only enforces cwd's ancestor-chain placement for loose leaf patterns where ancestor-chain disambiguates main from nested_context. Path-prefixed patterns (.github/copilot-instructions.md) are pre-constrained by their prefix and skip the check.

  • Filename matching is case-sensitive per the Codex source (codex-rs/core/src/agents_md.rs's DEFAULT_AGENTS_MD_FILENAME = "AGENTS.md") and the agents.md spec ("Filenames not on this list are ignored for instruction discovery."). Wrong-case copies in skill-asset directories no longer surface as instruction files.

  • Symlinked instruction files dedupe by canonical path. Cursor scans both .cursor/skills/ and .claude/skills/ for cross-agent compatibility; when a project symlinks one to the other (.claude/skills -> ../.agents/skills), the same SKILL.md no longer surfaces twice. Circular symlinks are caught and skipped.

  • depends_on resolves through supersession. When CODEX:S:0003 supersedes CORE:S:0027, dependents on CORE:S:0027 (CORE:S:0030, CORE:G:0006) are now satisfied by CODEX:S:0003 instead of warning that the dependency is "not loaded".

  • Per-agent file_type configs refreshed against verified upstream docs: official source URLs on every file_type, nested_context declared for codex / cursor / copilot / generic, cursor.rules corrected to scope: path_scoped (frontmatter globs: filter), cursor.bugbot_rules corrected to scope: global (BugBot decides applicability), Codex AGENTS.override.md + Windows requirements.toml path + per-directory .codex/config.toml chain modeled, Gemini cross_read for AGENTS.md/CONTEXT.md.

New configuration surfaces

- .ails/config.yml gains two top-level keys:

  agents:                                                                                                                                                                    
    codex:
      fallback_filenames: ["TEAM_GUIDE.md", ".agents.md"]                                                                                                                    
                                                                                                                                                                             
  surfaces:
    cursor.rules:                                                                                                                                                            
      exclude: ["**/draft/**"]                                                                                                                                               
    claude.skills:
      include: [".github/skills/**/SKILL.md"]                                                                                                                                
  • agents..fallback_filenames mirrors Codex project_doc_fallback_filenames so per-project alternative instruction filenames are picked up by the validator without round-tripping through the user's home ~/.codex/config.toml. surfaces..<file_type>.include/.exclude adjusts per-surface globs without modifying the bundled framework configs.

  • .ails/config.local.yml (gitignored) now layers on top of the committed .ails/config.yml for personal/CI-specific overrides — object keys merge recursively, array keys extend, scalar keys are replaced.

  • ails config set now writes .ails/.gitignore listing .gitignore itself and config.local.yml so layered local config stays out of version control by default.

Display

  • The text formatter now distinguishes main (root-level instruction file) from nested (subdirectory copies). The scorecard and group renderer show a separate "Nested" section. Nested file paths display the full relative path (packages/web/CLAUDE.md) rather than the previous web/CLAUDE.md truncation, so users can locate the actual file.

npm packaging

  • The packages/npm/README.md symlink has been replaced with a prepack script (cp ../../README.md ./README.md). The previous symlink approach broke npmjs.com's per-version README display — only the latest version showed any README content. The prepack copy ships a real README in the tarball without a committed duplicate. packages/npm/README.md is now gitignored.

Schema additions

  • framework/schemas/agent.schema.yml and rule.schema.yml: added scope: nested enum value with documented semantics.

  • framework/schemas/project.schema.yml: added agents and surfaces keys.

Documentation

  • docs/configuration.md adds three sections: per-surface include/exclude with concrete examples, Codex fallback filenames, and .ails/config.local.yml layered overrides + .ails/.gitignore explanation. Also available here https://reporails.com/docs

cleverhoods added 13 commits May 6, 2026 03:05
- Replace overloaded scope: path_scoped on subdirectory CLAUDE.md / AGENTS.md /
  GEMINI.md declarations with scope: nested for location-based subtree files
  (no frontmatter filter)
- Add nested_context declarations for codex / cursor / copilot / generic so
  per-package AGENTS.md files in monorepos are surfaced under the agent's
  on-demand loading model rather than being skipped
- Fix cursor.rules to scope: path_scoped (frontmatter-based path filter via
  globs) and cursor.bugbot_rules to scope: global (BugBot decides applicability)
- Restore each agent's source URL annotations on its file_type declarations
- Add Codex AGENTS.override.md, requirements.toml (Linux + Windows paths),
  and per-directory .codex/config.toml chain modeling
- Add Gemini cross_read for AGENTS.md / CONTEXT.md (via context.fileName)
- Cursor commands, plugins, BUGBOT.md, ignore declarations
…iles

Project root for 'ails check <path>' is now <path> itself — no walking up
past it looking for .git or .ails/backbone.yml. Files outside the targeted
subtree are out of scope. _find_project_root retains its walk-up role for
cache key derivation only.

Discovery dispatches by file_type properties: scope: global + loading:
session_start runs an ancestor walk bounded at cwd; scope: nested runs a
descendant walk excluding cwd; everything else uses descendant walk from cwd.

Classification (_location_matches_mode) tags files in cwd's ancestor chain
as the eager file_type (main, override) and files outside as nested_context
/ child_instruction — so size and other 'match: {type: main}' rules fire
only on the actual root instruction file, not on per-package copies.

Filename matching is case-sensitive per Codex source (codex-rs/core/src/
agents_md.rs) and the agents.md spec. Wrong-case copies in skill asset
directories no longer surface as instruction candidates.

depends_on resolves through supersession: when CODEX:S:0003 supersedes
CORE:S:0027, dependents on CORE:S:0027 are satisfied by CODEX:S:0003 instead
of warning that the dependency is not loaded.

Symlinked instruction files dedupe via canonical (resolved) path, so
.cursor/skills -> .agents/skills doesn't surface the same SKILL.md twice.
…onfig.yml

framework/schemas/project.schema.yml grows two new top-level keys:

  surfaces:
    <agent>.<file_type>:
      include: [glob...]
      exclude: [glob...]

Lets users adjust which globs an agent's surface scans without modifying
the bundled framework configs. Patterns under 'include' extend the bundled
list; matches under 'exclude' are dropped after globbing.

  agents:
    <id>:
      fallback_filenames: ["TEAM_GUIDE.md", ".agents.md"]

Mirrors Codex 'project_doc_fallback_filenames' so per-project alternative
instruction filenames are picked up by the validator without round-tripping
through the user's home ~/.codex/config.toml (which is fragile across CI).

src/reporails_cli/core/config.py reads both .ails/config.yml and the new
.ails/config.local.yml (gitignored), deep-merging object keys and extending
array keys so personal/CI-specific overrides layer cleanly on the committed
config.

src/reporails_cli/interfaces/cli/config_command.py writes .ails/.gitignore
listing '.gitignore' itself and 'config.local.yml' whenever 'ails config set'
creates or updates .ails/config.yml — so layered local overrides stay out of
version control by default.

src/reporails_cli/core/results.py adds 'agents' and 'surfaces' fields to the
ProjectConfig dataclass.
The previous text formatter grouped root-level instruction files (CLAUDE.md
at project root) and subdirectory copies (packages/<x>/CLAUDE.md) under a
single 'Main' header — misleading users into thinking nested per-package
files were main file candidates.

display_constants.classify_file now returns 'nested' for subdirectory copies
of main-named files and 'main' only for the root-level file. Filename
matching is case-sensitive (CLAUDE.md, AGENTS.md, GEMINI.md uppercase per
agent specs) so wrong-case copies in skill asset directories don't false-
positive.

display._GROUP_ORDER and scorecard._SURFACE_ORDER add 'nested' as a separate
surface between 'main' and 'rule'.

friendly_name returns the full relative path for nested files
(packages/web/CLAUDE.md) rather than the previous parent/filename truncation
(web/CLAUDE.md), so users can locate the actual file in the tree.
…ack filenames

docs/configuration.md adds three sections:

- Per-surface include/exclude with concrete examples (cursor.rules,
  claude.skills, codex.main)
- Codex fallback filenames via agents.codex.fallback_filenames
- .ails/config.local.yml layered overrides + .ails/.gitignore explanation

Updates frontmatter version (0.5.6 → 0.5.7) and last_updated (2026-05-04 →
2026-05-06).
The previous setup symlinked packages/npm/README.md to the repo root README,
relying on pacote to resolve it during 'npm pack'. That approach broke
npmjs.com's per-version README display — only the latest published version
showed any README content, so each release dropped the prior version's
description from the package listing.

Replace with a prepack script in packages/npm/package.json:

  "scripts": {
    "prepack": "cp ../../README.md ./README.md"
  }

npm runs prepack before 'npm pack' and 'npm publish', so the tarball
contains a real README copied fresh from the repo root every time. No
committed duplicate (drifts), no symlink (broken on npmjs.com).

Add packages/npm/README.md to .gitignore so the local copy doesn't get
committed; the file only exists transiently during 'npm pack' / 'npm
publish' on the dev machine and in CI.
Those directories no longer exist in the cli/ tree. The exclude_dirs list
should reflect what actually needs filtering during discovery.
pyproject.toml, packages/npm/package.json, and the README heading move
together per scripts/check-config-sync.sh.
Add .codex/config.toml to the test fixture so the codex/generic
disambiguation (_disambiguate_codex_generic) consistently keeps codex
detected. Previously the test passed locally because ~/.codex/config.toml
existed in the dev HOME, but failed on a CI runner whose HOME had no
~/.codex/ — codex got disambiguated away and the fallback_filenames
patterns attached to the codex agent never fired.
_location_matches_mode previously required the file's parent directory to
be in cwd's ancestor chain for any scope: global + loading: session_start
file_type. That works for loose patterns like **/CLAUDE.md (where the
ancestor-chain check disambiguates main from nested_context), but breaks
path-prefixed patterns like .github/copilot-instructions.md whose parent
is by definition .github/ (not cwd or an ancestor).

Add _is_loose_leaf_pattern() helper: a pattern is 'loose' if it starts
with **/ or is a bare filename. _location_matches_mode now only enforces
ancestor-chain placement for loose patterns; path-prefixed patterns rely
on the prefix itself to constrain location.

Also expose _first_matching_pattern so classify_files can pass the
specific matched pattern through to _location_matches_mode for this
decision (the previous _matches_any_pattern returned only a bool).

Verified locally with HOME=/tmp/empty + no ONNX model (CI conditions):
test_agent_check_finds_files[copilot] now passes alongside all other
agents.
@cleverhoods cleverhoods merged commit 22c02b6 into main May 6, 2026
14 checks passed
@cleverhoods cleverhoods deleted the 0.5.7 branch May 6, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant