0.5.7 - Discovery & classification fixes for monorepo instruction files#20
Merged
Conversation
- Replace overloaded scope: path_scoped on subdirectory CLAUDE.md / AGENTS.md / GEMINI.md declarations with scope: nested for location-based subtree files (no frontmatter filter) - Add nested_context declarations for codex / cursor / copilot / generic so per-package AGENTS.md files in monorepos are surfaced under the agent's on-demand loading model rather than being skipped - Fix cursor.rules to scope: path_scoped (frontmatter-based path filter via globs) and cursor.bugbot_rules to scope: global (BugBot decides applicability) - Restore each agent's source URL annotations on its file_type declarations - Add Codex AGENTS.override.md, requirements.toml (Linux + Windows paths), and per-directory .codex/config.toml chain modeling - Add Gemini cross_read for AGENTS.md / CONTEXT.md (via context.fileName) - Cursor commands, plugins, BUGBOT.md, ignore declarations
…iles
Project root for 'ails check <path>' is now <path> itself — no walking up
past it looking for .git or .ails/backbone.yml. Files outside the targeted
subtree are out of scope. _find_project_root retains its walk-up role for
cache key derivation only.
Discovery dispatches by file_type properties: scope: global + loading:
session_start runs an ancestor walk bounded at cwd; scope: nested runs a
descendant walk excluding cwd; everything else uses descendant walk from cwd.
Classification (_location_matches_mode) tags files in cwd's ancestor chain
as the eager file_type (main, override) and files outside as nested_context
/ child_instruction — so size and other 'match: {type: main}' rules fire
only on the actual root instruction file, not on per-package copies.
Filename matching is case-sensitive per Codex source (codex-rs/core/src/
agents_md.rs) and the agents.md spec. Wrong-case copies in skill asset
directories no longer surface as instruction candidates.
depends_on resolves through supersession: when CODEX:S:0003 supersedes
CORE:S:0027, dependents on CORE:S:0027 are satisfied by CODEX:S:0003 instead
of warning that the dependency is not loaded.
Symlinked instruction files dedupe via canonical (resolved) path, so
.cursor/skills -> .agents/skills doesn't surface the same SKILL.md twice.
…onfig.yml
framework/schemas/project.schema.yml grows two new top-level keys:
surfaces:
<agent>.<file_type>:
include: [glob...]
exclude: [glob...]
Lets users adjust which globs an agent's surface scans without modifying
the bundled framework configs. Patterns under 'include' extend the bundled
list; matches under 'exclude' are dropped after globbing.
agents:
<id>:
fallback_filenames: ["TEAM_GUIDE.md", ".agents.md"]
Mirrors Codex 'project_doc_fallback_filenames' so per-project alternative
instruction filenames are picked up by the validator without round-tripping
through the user's home ~/.codex/config.toml (which is fragile across CI).
src/reporails_cli/core/config.py reads both .ails/config.yml and the new
.ails/config.local.yml (gitignored), deep-merging object keys and extending
array keys so personal/CI-specific overrides layer cleanly on the committed
config.
src/reporails_cli/interfaces/cli/config_command.py writes .ails/.gitignore
listing '.gitignore' itself and 'config.local.yml' whenever 'ails config set'
creates or updates .ails/config.yml — so layered local overrides stay out of
version control by default.
src/reporails_cli/core/results.py adds 'agents' and 'surfaces' fields to the
ProjectConfig dataclass.
The previous text formatter grouped root-level instruction files (CLAUDE.md at project root) and subdirectory copies (packages/<x>/CLAUDE.md) under a single 'Main' header — misleading users into thinking nested per-package files were main file candidates. display_constants.classify_file now returns 'nested' for subdirectory copies of main-named files and 'main' only for the root-level file. Filename matching is case-sensitive (CLAUDE.md, AGENTS.md, GEMINI.md uppercase per agent specs) so wrong-case copies in skill asset directories don't false- positive. display._GROUP_ORDER and scorecard._SURFACE_ORDER add 'nested' as a separate surface between 'main' and 'rule'. friendly_name returns the full relative path for nested files (packages/web/CLAUDE.md) rather than the previous parent/filename truncation (web/CLAUDE.md), so users can locate the actual file in the tree.
…ack filenames docs/configuration.md adds three sections: - Per-surface include/exclude with concrete examples (cursor.rules, claude.skills, codex.main) - Codex fallback filenames via agents.codex.fallback_filenames - .ails/config.local.yml layered overrides + .ails/.gitignore explanation Updates frontmatter version (0.5.6 → 0.5.7) and last_updated (2026-05-04 → 2026-05-06).
The previous setup symlinked packages/npm/README.md to the repo root README,
relying on pacote to resolve it during 'npm pack'. That approach broke
npmjs.com's per-version README display — only the latest published version
showed any README content, so each release dropped the prior version's
description from the package listing.
Replace with a prepack script in packages/npm/package.json:
"scripts": {
"prepack": "cp ../../README.md ./README.md"
}
npm runs prepack before 'npm pack' and 'npm publish', so the tarball
contains a real README copied fresh from the repo root every time. No
committed duplicate (drifts), no symlink (broken on npmjs.com).
Add packages/npm/README.md to .gitignore so the local copy doesn't get
committed; the file only exists transiently during 'npm pack' / 'npm
publish' on the dev machine and in CI.
Those directories no longer exist in the cli/ tree. The exclude_dirs list should reflect what actually needs filtering during discovery.
pyproject.toml, packages/npm/package.json, and the README heading move together per scripts/check-config-sync.sh.
Add .codex/config.toml to the test fixture so the codex/generic disambiguation (_disambiguate_codex_generic) consistently keeps codex detected. Previously the test passed locally because ~/.codex/config.toml existed in the dev HOME, but failed on a CI runner whose HOME had no ~/.codex/ — codex got disambiguated away and the fallback_filenames patterns attached to the codex agent never fired.
_location_matches_mode previously required the file's parent directory to be in cwd's ancestor chain for any scope: global + loading: session_start file_type. That works for loose patterns like **/CLAUDE.md (where the ancestor-chain check disambiguates main from nested_context), but breaks path-prefixed patterns like .github/copilot-instructions.md whose parent is by definition .github/ (not cwd or an ancestor). Add _is_loose_leaf_pattern() helper: a pattern is 'loose' if it starts with **/ or is a bare filename. _location_matches_mode now only enforces ancestor-chain placement for loose patterns; path-prefixed patterns rely on the prefix itself to constrain location. Also expose _first_matching_pattern so classify_files can pass the specific matched pattern through to _location_matches_mode for this decision (the previous _matches_any_pattern returned only a bool). Verified locally with HOME=/tmp/empty + no ONNX model (CI conditions): test_agent_check_finds_files[copilot] now passes alongside all other agents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ails checkagainst monorepos misclassified per-package CLAUDE.md / AGENTS.md / GEMINI.md files as main, tripping match: {type: main} rules (size limits, structural checks) on nested files that the agent itself loads on-demand. The bug was reproduced against activepieces/activepieces, an open-source monorepo whose per-package nestedCLAUDE.md/AGENTS.mdfiles surfaced the false-positive size findings. Used as a test fixture only - no affiliation.This release replaces the discovery + classification pipeline, refreshes per-agent configs against verified upstream docs, and adds project-level config knobs for cases the framework might run into some issues modelling it statically (Codex project_doc_fallback_filenames).
What changed
Project root for ails check is now itself. No walking up past it looking for .git or .ails/backbone.yml. Files outside the targeted subtree are out of scope. This eliminates the "fixture inside a parent repo leaks parent files" class of bug. engine_helpers.
_find_project_rootretains its walk-up role for cache key derivation only and now recognizes IDE workspace markers (.vscode/, .idea/, .github/) alongside .git / and .ails/backbone.yml.Discovery dispatches by file_type properties.
Classification respects pattern shape. _location_matches_mode only enforces cwd's ancestor-chain placement for loose leaf patterns where ancestor-chain disambiguates main from nested_context. Path-prefixed patterns (.github/copilot-instructions.md) are pre-constrained by their prefix and skip the check.
Filename matching is case-sensitive per the Codex source (codex-rs/core/src/agents_md.rs's DEFAULT_AGENTS_MD_FILENAME = "AGENTS.md") and the agents.md spec ("Filenames not on this list are ignored for instruction discovery."). Wrong-case copies in skill-asset directories no longer surface as instruction files.
Symlinked instruction files dedupe by canonical path. Cursor scans both .cursor/skills/ and .claude/skills/ for cross-agent compatibility; when a project symlinks one to the other (.claude/skills -> ../.agents/skills), the same SKILL.md no longer surfaces twice. Circular symlinks are caught and skipped.
depends_on resolves through supersession. When CODEX:S:0003 supersedes CORE:S:0027, dependents on CORE:S:0027 (CORE:S:0030, CORE:G:0006) are now satisfied by CODEX:S:0003 instead of warning that the dependency is "not loaded".
Per-agent file_type configs refreshed against verified upstream docs: official source URLs on every file_type, nested_context declared for codex / cursor / copilot / generic, cursor.rules corrected to scope: path_scoped (frontmatter globs: filter), cursor.bugbot_rules corrected to scope: global (BugBot decides applicability), Codex AGENTS.override.md + Windows requirements.toml path + per-directory .codex/config.toml chain modeled, Gemini cross_read for AGENTS.md/CONTEXT.md.
New configuration surfaces
- .ails/config.yml gains two top-level keys:
agents..fallback_filenames mirrors Codex project_doc_fallback_filenames so per-project alternative instruction filenames are picked up by the validator without round-tripping through the user's home ~/.codex/config.toml. surfaces..<file_type>.include/.exclude adjusts per-surface globs without modifying the bundled framework configs.
.ails/config.local.yml (gitignored) now layers on top of the committed .ails/config.yml for personal/CI-specific overrides — object keys merge recursively, array keys extend, scalar keys are replaced.
ails config set now writes .ails/.gitignore listing .gitignore itself and config.local.yml so layered local config stays out of version control by default.
Display
npm packaging
Schema additions
framework/schemas/agent.schema.yml and rule.schema.yml: added scope: nested enum value with documented semantics.
framework/schemas/project.schema.yml: added agents and surfaces keys.
Documentation