Add knowledge base for coding agents#402
Add knowledge base for coding agents#402andre15silva wants to merge 1 commit intosiddharthvaddem:mainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds comprehensive documentation infrastructure across Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes The Python scripts introduce two independent but similar implementations with dual-mode operation (CLI + hook), path normalization, JSON parsing with fallback handling, and markdown generation. While the logic isn't super dense, the heterogeneous mix of two parallel scripts, four markdown/JSON documentation files, rules entries, and structural patterns warrant careful cross-checking for consistency between Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70d39cdf28
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if len(insights) > 15: | ||
| insights = random.sample(insights, 15) |
There was a problem hiding this comment.
Make insight selection deterministic
_render_insights randomly selects 15 entries whenever a file has more than 15 insights, which makes get_context.py output nondeterministic for the same input path (e.g., src/components/video-editor/VideoEditor.tsx has 29 insights). Because this context is meant to drive required review/edit behavior, random omission causes flaky, non-reproducible guidance and can hide relevant pitfalls or tests on any given run; use a stable ordering/truncation strategy instead of random.sample.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
.codex/docs/get_context.py (3)
90-118: non-deterministic output from random samplingusing
random.samplewithout seeding means repeated runs on the same file produce different outputs. this could be confusing for agents or users expecting reproducibility. if intentional for variety, maybe worth a comment explaining the design choice?if you want determinism, could seed based on file path or just use
insights[:15]for consistency.deterministic alternative (if desired)
def _render_insights(file_data): insights = file_data.get("insights", []) if not insights: return "" if len(insights) > 15: - insights = random.sample(insights, 15) + insights = insights[:15] # take first 15 for deterministic output lines = ["### Historical Insights"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.codex/docs/get_context.py around lines 90 - 118, The _render_insights function uses random.sample to pick up to 15 insights which makes output non-deterministic; change this to produce deterministic output by either using a stable selection (e.g., take the first 15 via insights[:15]) or deterministically seeding the sampler using a stable key (e.g., a hash of file path/id) before calling random.sample; update the code around random.sample in _render_insights to implement one of these approaches and add a brief comment explaining the choice.
285-290: same random.sample non-determinism herefolder insights also uses random sampling - same comment applies about reproducibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.codex/docs/get_context.py around lines 285 - 290, The sampling in _render_folder_insights currently calls random.sample on the insights list which is non-deterministic; make the sampling reproducible by using a deterministic RNG or deterministic selection. Replace the direct random.sample(insights, 15) call with a deterministic approach (e.g., instantiate a local random.Random(seed) and call its sample method, or sort+slice by a stable key) and derive the seed from stable input such as folder_data.get("id") or a provided global seed so repeated runs produce the same 15 insights; ensure you update references to random.sample to use the deterministic RNG or selection method while keeping the rest of _render_folder_insights unchanged.
143-147: nit: sorting lambda is kinda fragile
str(x).isdigit()won't handle negative numbers or floats. probably fine here since line numbers should be positive ints, but iflnever comes through as something unexpected (empty string, None coerced to string earlier), this could behave weirdly.lowkey fine for now, just flagging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.codex/docs/get_context.py around lines 143 - 147, The sorting lambda used when building line_nums (involving by_file and the comprehension for line_nums) is fragile because it uses str(x).isdigit() and int(x) fallback, which mishandles negative numbers, floats, empty strings or None; update the key function to robustly coerce ln to an integer when possible and otherwise use a safe fallback (e.g., try/except around int(ln) with a default like float('inf') or 0) so sorted(by_file[f], key=...) reliably orders numeric line numbers and places non-numeric entries consistently; adjust the code that builds line_nums and the lambda reference so it calls the new safe conversion helper inline or via a small helper function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/docs/get_context.py:
- Around line 94-95: The selection of "insights" is non-deterministic because
random.sample is used; replace that with a deterministic strategy in
get_context.py by ordering the insights consistently (e.g., sort by a stable key
such as timestamp, score, filename, or an explicit priority field) and then
truncating with slicing (insights = sorted_insights[:15]) instead of random
sampling; apply the same deterministic change to the other occurrence around the
insights handling at the second location (lines ~289-290) so outputs are
repeatable across runs.
- Around line 498-518: The hook uses a contradictory "decision" field; update
the output behavior in get_context.py around the get_file_context call so you do
not emit the misleading "decision" key: remove the "decision" entries from both
the failure branch (where you currently print {"decision":"block", "reason":"No
information found..."}) and the success branch (where you print
{"decision":"block", ... "hookSpecificOutput": {...}}), and instead only emit a
clear "reason" string for the failure and the "reason" plus "hookSpecificOutput"
with file_context for the success case (leave get_file_context, file_context,
and hookSpecificOutput intact).
In @.codex/docs/get_context.py:
- Around line 520-522: The except block "except (json.JSONDecodeError,
ValueError, OSError): pass" silently swallows failures; update it to at minimum
log the exception (e.g., using logging.getLogger(__name__).debug/exception or
sys.stderr.write) so malformed JSON or unexpected OSErrors are visible in debug
runs, or re-raise non-expected exceptions; specifically modify the except block
in get_context.py (the "except (json.JSONDecodeError, ValueError, OSError)"
handler) to log the exception details and context (the raw input) before
continuing or to re-raise after logging for truly unexpected errors.
- Around line 500-506: The current branch builds an output dict with "decision":
"block" but a misleading "Proceed as normal" reason; change the behavior so that
when not success you either emit an empty JSON object to allow silent
continuation or provide a real block message—e.g., replace the output assignment
that sets {"decision":"block",...} with an empty dict (print(json.dumps({})) and
sys.exit(0)) to silently proceed, or keep "decision":"block" but change the
"reason" to a real corrective message; update the code that constructs/prints
the output variable (the output dict and the print/json.dumps call) accordingly.
In `@AGENTS.md`:
- Around line 204-205: The example line uses backticks nested inside an inline
code span which breaks rendering; update the example in AGENTS.md by replacing
the inline code span containing backticks with a fenced code block (triple
backticks) and move the example line `- `filename` - short description. Key:
`construct1`, `construct2` [CATEGORY]` into that block so the inner backticks
are shown literally and the markdown renders cleanly; ensure the surrounding
explanatory text and any details/summary HTML remain intact.
In `@CLAUDE.md`:
- Around line 204-205: The example entry using nested backticks (`- `filename` -
short description. Key: `construct1`, `construct2` [CATEGORY]`) renders poorly;
replace the inline nested code with a fenced code block example and remove the
inner backticks so the sample is shown verbatim. Locate the section that starts
"The file is organized by directory (## headers), with each file listed as:" and
replace the inline example with a fenced triple-backtick block containing the
example list line(s) (e.g., - filename - short description. Key: construct1,
construct2 [CATEGORY]) so the Markdown renders cleanly and is easier to scan.
---
Nitpick comments:
In @.codex/docs/get_context.py:
- Around line 90-118: The _render_insights function uses random.sample to pick
up to 15 insights which makes output non-deterministic; change this to produce
deterministic output by either using a stable selection (e.g., take the first 15
via insights[:15]) or deterministically seeding the sampler using a stable key
(e.g., a hash of file path/id) before calling random.sample; update the code
around random.sample in _render_insights to implement one of these approaches
and add a brief comment explaining the choice.
- Around line 285-290: The sampling in _render_folder_insights currently calls
random.sample on the insights list which is non-deterministic; make the sampling
reproducible by using a deterministic RNG or deterministic selection. Replace
the direct random.sample(insights, 15) call with a deterministic approach (e.g.,
instantiate a local random.Random(seed) and call its sample method, or
sort+slice by a stable key) and derive the seed from stable input such as
folder_data.get("id") or a provided global seed so repeated runs produce the
same 15 insights; ensure you update references to random.sample to use the
deterministic RNG or selection method while keeping the rest of
_render_folder_insights unchanged.
- Around line 143-147: The sorting lambda used when building line_nums
(involving by_file and the comprehension for line_nums) is fragile because it
uses str(x).isdigit() and int(x) fallback, which mishandles negative numbers,
floats, empty strings or None; update the key function to robustly coerce ln to
an integer when possible and otherwise use a safe fallback (e.g., try/except
around int(ln) with a default like float('inf') or 0) so sorted(by_file[f],
key=...) reliably orders numeric line numbers and places non-numeric entries
consistently; adjust the code that builds line_nums and the lambda reference so
it calls the new safe conversion helper inline or via a small helper function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0f34e98b-cc0d-486a-9f48-298a78225166
📒 Files selected for processing (11)
.claude/docs/files.json.claude/docs/folders.json.claude/docs/general_index.md.claude/docs/get_context.py.codex/docs/files.json.codex/docs/folders.json.codex/docs/general_index.md.codex/docs/get_context.py.codex/rules/default.rulesAGENTS.mdCLAUDE.md
| if len(insights) > 15: | ||
| insights = random.sample(insights, 15) |
There was a problem hiding this comment.
Make the context output deterministic.
random.sample(...) means the same file or folder can return different “critical” context on different runs. For a mandatory pre-edit tool, that’s lowkey risky: agents can miss different pitfalls each time. Prefer stable ordering + truncation, or an explicit priority field.
possible deterministic fix
- if len(insights) > 15:
- insights = random.sample(insights, 15)
+ insights = insights[:15]- if len(insights) > 15:
- insights = random.sample(insights, 15)
+ insights = insights[:15]Also applies to: 289-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/docs/get_context.py around lines 94 - 95, The selection of
"insights" is non-deterministic because random.sample is used; replace that with
a deterministic strategy in get_context.py by ordering the insights consistently
(e.g., sort by a stable key such as timestamp, score, filename, or an explicit
priority field) and then truncating with slicing (insights =
sorted_insights[:15]) instead of random sampling; apply the same deterministic
change to the other occurrence around the insights handling at the second
location (lines ~289-290) so outputs are repeatable across runs.
| success, file_context = get_file_context(file_path, docs_dir) | ||
|
|
||
| if not success: | ||
| output = { | ||
| "decision": "block", | ||
| "reason": "No information found for this file in files.json. Proceed as normal.", | ||
| } | ||
| print(json.dumps(output)) | ||
| sys.exit(0) | ||
|
|
||
| # Output JSON with additional context for Claude | ||
| output = { | ||
| "decision": "block", | ||
| "reason": "Relevant information about the file you are reading was found. Appending it to the conversation.", | ||
| "hookSpecificOutput": { | ||
| "hookEventName": "PostToolUse", | ||
| "additionalContext": file_context, | ||
| }, | ||
| } | ||
|
|
||
| print(json.dumps(output)) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the valid response fields for Claude Code PostToolUse hooks, and what does decision: "block" do versus allowing the tool result to continue?
💡 Result:
Valid JSON response fields for Claude Code PostToolUse hooks
Universal fields (work for all hook events):
continue(boolean, defaulttrue) — iffalse, Claude stops processing entirely after the hook runs (takes precedence over any event-specific decisions) [1]stopReason(string) — message shown to the user whencontinue: false(not shown to Claude) [1]suppressOutput(boolean, defaultfalse) — omits stdout from the debug log [1]systemMessage(string) — warning message shown to the user [1]
Top-level decision fields used by PostToolUse:
decision— only supported value is"block"; omitdecisionto allow normal flow [1]reason— explanation shown to Claude whendecision: "block"[2]
PostToolUse-specific fields:
hookSpecificOutput: { hookEventName: "PostToolUse", additionalContext: string }— adds extra context for Claude to consider [2]updatedMCPToolOutput— MCP tools only; replaces the tool’s output with the provided value [2]
What decision: "block" does (vs allowing the tool result to continue)
In a PostToolUse hook, decision: "block" does not undo or prevent the tool call (the tool already succeeded); it prompts Claude with your reason as feedback, effectively forcing Claude to address the issue before continuing its normal reasoning/next actions. If you omit decision, the run proceeds normally and reason is ignored. [2][1]
Sources: [1] https://code.claude.com/docs/en/hooks (JSON output / decision control table) • [2] https://code.claude.com/docs/en/hooks (PostToolUse decision control section)
Hook response semantics are contradictory here.
decision: "block" doesn't actually stop tool execution—the tool already ran. According to Claude Code docs, it prompts Claude with the reason as feedback before it continues reasoning. So using decision: "block" with reason "Proceed as normal" is kinda backwards. Either drop decision entirely to truly proceed without intervention, or if you're keeping decision: "block", rethink the reason since it's supposed to present an issue Claude needs to address, not reassure it to move on.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/docs/get_context.py around lines 498 - 518, The hook uses a
contradictory "decision" field; update the output behavior in get_context.py
around the get_file_context call so you do not emit the misleading "decision"
key: remove the "decision" entries from both the failure branch (where you
currently print {"decision":"block", "reason":"No information found..."}) and
the success branch (where you print {"decision":"block", ...
"hookSpecificOutput": {...}}), and instead only emit a clear "reason" string for
the failure and the "reason" plus "hookSpecificOutput" with file_context for the
success case (leave get_file_context, file_context, and hookSpecificOutput
intact).
| if not success: | ||
| output = { | ||
| "decision": "block", | ||
| "reason": "No information found for this file in files.json. Proceed as normal.", | ||
| } | ||
| print(json.dumps(output)) | ||
| sys.exit(0) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Claude Code hooks PostToolUse decision types block proceed
💡 Result:
For Claude Code PostToolUse hooks, there aren’t multiple “decision types” like block vs proceed.
- The only supported
decisionvalue is"block"(must be paired with areason). InPostToolUse, this does not undo the tool call (it already ran); it just prompts Claude with your feedback (reason/additionalContext). [1] - To proceed/allow, you omit
decisionentirely (or exit 0 with no JSON). There is nodecision: "proceed". [1] - Separately, there’s a universal top-level field
continue(defaults totrue). If you set"continue": false, Claude stops processing after the hook runs (takes precedence over event-specific decision fields). [2]
Minimal examples
Proceed (default):
{}Block-with-feedback (PostToolUse):
{ "decision": "block", "reason": "Run tests / fix lint errors before continuing." }Stop everything:
{ "continue": false, "stopReason": "Build failed; stopping." }Sources: Claude Code Hooks docs (Decision control + PostToolUse decision control). [1][2]
"block" decision when no context found is contradictory
returning "decision": "block" with reason "Proceed as normal" is semantically backwards. according to the PostToolUse protocol, "block" is meant to prompt Claude with feedback after the tool already ran — not to proceed silently. if the intent is to let things continue without feedback, you should return {} (empty JSON) instead, or omit the decision field entirely.
the current code signals a block while saying proceed, which is kinda cursed. either return empty to silently proceed, or keep "block" but provide actual corrective feedback for Claude.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.codex/docs/get_context.py around lines 500 - 506, The current branch builds
an output dict with "decision": "block" but a misleading "Proceed as normal"
reason; change the behavior so that when not success you either emit an empty
JSON object to allow silent continuation or provide a real block message—e.g.,
replace the output assignment that sets {"decision":"block",...} with an empty
dict (print(json.dumps({})) and sys.exit(0)) to silently proceed, or keep
"decision":"block" but change the "reason" to a real corrective message; update
the code that constructs/prints the output variable (the output dict and the
print/json.dumps call) accordingly.
| except (json.JSONDecodeError, ValueError, OSError): | ||
| # Not valid JSON or not hook input | ||
| pass |
There was a problem hiding this comment.
silent exception swallowing could mask real errors
catching json.JSONDecodeError, ValueError, OSError and just passing means if there's a legitimate error in hook input processing (like malformed JSON that's almost valid), it silently falls through to the usage error.
might want to at least log to stderr when this happens in debug scenarios, or only catch specific expected failure modes.
suggestion: log unexpected errors
- except (json.JSONDecodeError, ValueError, OSError):
- # Not valid JSON or not hook input
- pass
+ except json.JSONDecodeError:
+ # No stdin or not valid JSON - expected in standalone mode
+ pass
+ except (ValueError, OSError) as e:
+ # Unexpected error during hook processing
+ print("Warning: hook processing failed: {}".format(e), file=sys.stderr)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (json.JSONDecodeError, ValueError, OSError): | |
| # Not valid JSON or not hook input | |
| pass | |
| except json.JSONDecodeError: | |
| # No stdin or not valid JSON - expected in standalone mode | |
| pass | |
| except (ValueError, OSError) as e: | |
| # Unexpected error during hook processing | |
| print("Warning: hook processing failed: {}".format(e), file=sys.stderr) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.codex/docs/get_context.py around lines 520 - 522, The except block "except
(json.JSONDecodeError, ValueError, OSError): pass" silently swallows failures;
update it to at minimum log the exception (e.g., using
logging.getLogger(__name__).debug/exception or sys.stderr.write) so malformed
JSON or unexpected OSErrors are visible in debug runs, or re-raise non-expected
exceptions; specifically modify the except block in get_context.py (the "except
(json.JSONDecodeError, ValueError, OSError)" handler) to log the exception
details and context (the raw input) before continuing or to re-raise after
logging for truly unexpected errors.
| The file is organized by directory (## headers), with each file listed as: | ||
| `- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]` |
There was a problem hiding this comment.
The example line won’t render right.
This nests backticks inside an inline code span, so the markdown comes out kinda cursed and the example is hard to read. A fenced block is cleaner here.
nit: cleaner markdown
The file is organized by directory (## headers), with each file listed as:
-`- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]`
+
+```text
+- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]
+```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The file is organized by directory (## headers), with each file listed as: | |
| `- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]` | |
| The file is organized by directory (## headers), with each file listed as: | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` around lines 204 - 205, The example line uses backticks nested
inside an inline code span which breaks rendering; update the example in
AGENTS.md by replacing the inline code span containing backticks with a fenced
code block (triple backticks) and move the example line `- `filename` - short
description. Key: `construct1`, `construct2` [CATEGORY]` into that block so the
inner backticks are shown literally and the markdown renders cleanly; ensure the
surrounding explanatory text and any details/summary HTML remain intact.
| The file is organized by directory (## headers), with each file listed as: | ||
| `- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]` |
There was a problem hiding this comment.
The example formatting is broken here.
Nested backticks inside the outer inline code span won’t render as intended, so the “shape” of the index entry is harder to scan than it should be. A fenced example block is cleaner.
nit: cleaner markdown
The file is organized by directory (## headers), with each file listed as:
-`- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]`
+
+```text
+- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]
+```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The file is organized by directory (## headers), with each file listed as: | |
| `- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]` | |
| The file is organized by directory (## headers), with each file listed as: | |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 205-205: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 205-205: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 205-205: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CLAUDE.md` around lines 204 - 205, The example entry using nested backticks
(`- `filename` - short description. Key: `construct1`, `construct2` [CATEGORY]`)
renders poorly; replace the inline nested code with a fenced code block example
and remove the inner backticks so the sample is shown verbatim. Locate the
section that starts "The file is organized by directory (## headers), with each
file listed as:" and replace the inline example with a fenced triple-backtick
block containing the example list line(s) (e.g., - filename - short description.
Key: construct1, construct2 [CATEGORY]) so the Markdown renders cleanly and is
easier to scan.
What does this PR do?
If you use or plan to use AI coding agents on this repo, this PR will make them noticeably more effective.
The core contribution is the knowledge base — files under
.claude/docs/and.codex/docs/that capture repo-specific knowledge extracted from the codebase and its history:When an agent opens a file, it calls
.claude/docs/get_context.py <file>to retrieve the relevant context before making changes. TheAGENTS.mdandCLAUDE.mdfiles are entry points that instruct agents to do this.For example, before touching
src/components/video-editor/VideoEditor.tsx, an agent would immediately know:For detailed results and evaluation artifacts, see:
We'd love to hear whether this is useful for your workflow, and we're happy to address any concerns (the content, the delivery method, or anything else). Happy to discuss here or at codeset.ai.
These files are plain text, do not affect runtime behavior, and introduce no dependency on Codeset services. They can be safely ignored or removed at any time.
Summary by CodeRabbit
Documentation
Chores
Note: These are internal documentation and development tooling updates with no visible impact on the application.