Add deps_grep and deps_read tools for searching dependency jars#147
Add deps_grep and deps_read tools for searching dependency jars#147
Conversation
📝 WalkthroughWalkthroughAdds three read-only MCP tools—deps-grep, deps-read, deps-list—plus supporting modules for classpath/jar inspection, sources downloading/caching, jar utilities, and a shared shell binary probe; also refactors grep availability to use the new shell helper. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Tool as deps-grep Tool
participant Core as deps-grep Core
participant ClassPath as Classpath Resolver
participant Sources as Sources Manager
participant Jars as Jar Inspector
participant Search as Search Engine
User->>Tool: execute(pattern, opts)
Tool->>Tool: validate-inputs(project-dir, pattern)
Tool->>Core: deps-grep(project-dir, pattern, opts)
Core->>ClassPath: cached-base-jars(project-dir)
ClassPath-->>Core: jar-list
alt needs Java sources
Core->>Sources: ensure-sources-jars!(jar-list)
Sources-->>Core: jars-with-sources
end
Core->>Jars: list-jar-entries(jar)
Jars-->>Core: entries
Core->>Jars: filter-entries(entries, glob/type)
loop each entry
Core->>Search: search-jar-entry(jar, entry, pattern, opts)
alt rg available
Search->>Search: search-jar-entry-rg (ripgrep)
else
Search->>Search: search-jar-entry-fallback (Clojure regex)
end
Search-->>Core: match-results
end
Core-->>Tool: aggregated-results
Tool->>Tool: format-results(mode)
Tool-->>User: formatted-output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 173-183: Replace the bash pipeline in the search-jar-entry block:
instead of building cmd and calling (shell/sh "bash" "-c" cmd), run unzip
separately to capture unzip-result (using jar-path and entry-path), then invoke
ripgrep by calling shell/sh with "rg" and rg-opts and pattern, passing the unzip
output via the :in keyword argument (use apply/concat to construct args) so
shell/sh receives :in as a proper keyword; reference rg-opts, pattern, jar-path,
entry-path and the surrounding let where cmd/result are defined to locate and
modify the code accordingly.
In `@src/clojure_mcp/tools/deps_read/core.clj`:
- Around line 44-59: Validate the :offset and :limit options at the start of the
function that accepts [jar-path entry-path & {:keys [offset limit
max-line-length] :or {offset 0 max-line-length 2000}}]: ensure offset is a
non-negative integer (>= 0) and if limit is provided it is a positive integer (>
0); if either check fails throw an ex-info with a clear message and include the
invalid value(s) in the ex-info map (e.g., {:offset offset :limit limit}) so
callers get a clear error instead of confusing negative indexing or silent empty
output.
In `@src/clojure_mcp/tools/deps_sources/core.clj`:
- Around line 32-35: The regex in the re-find call fails on Windows because
jar-path may contain backslashes; normalize jar-path by converting backslashes
to forward slashes before matching (e.g., create a normalized-path from jar-path
and use it in the re-find), then proceed with the existing destructuring (match
-> [_ group-path artifact version jar-name]) and group-id computation
(str/replace group-path "/" ".") so Windows .m2/repository paths are parsed
correctly.
🧹 Nitpick comments (2)
src/clojure_mcp/tools/deps_read/core.clj (1)
54-68: Avoid reading entire entry when only a slice is requested.Line 54 reads the full entry into memory and only then applies :offset/:limit, which can be costly for large sources. Consider streaming the entry (e.g., via
java.util.zip.ZipFile+line-seq) and stopping once the limit is reached.src/clojure_mcp/tools/deps_grep/tool.clj (1)
69-105: Consider renamingcountbinding to avoid shadowingclojure.core/count.The destructured
countbinding on line 73 shadowsclojure.core/count, requiring explicit qualification on line 82. A clearer approach would be to rename the binding.♻️ Suggested rename to avoid shadowing
(defmethod tool-system/format-results :deps-grep [_ result] (if (:error result) {:result [(:error result)] :error true} - (let [{:keys [results count truncated]} result] + (let [{:keys [results truncated] match-count :count} result] {:result [(cond ;; Count mode (contains? result :count) - (str "Found " count " matches" + (str "Found " match-count " matches" (when truncated " (truncated)")) ;; Files with matches mode (and (seq results) (not (contains? (first results) :matches))) - (str "Found " (clojure.core/count results) " files with matches" + (str "Found " (count results) " files with matches"
- deps_grep: Search patterns in dependency jars on the classpath - Uses `clojure -Spath` to resolve exact dependency jars (cached) - Searches inside jars with unzip + regex matching - Supports glob/type filters, output modes (content/files/count) - Returns jar:entry paths for use with deps_read - deps_read: Read files from inside dependency jars - Takes file_path in jar:entry format (from deps_grep results) - Supports offset/limit for large files (mirrors read_file API) - Returns content with line numbers
- Format line numbers with arrow (→) to match Read tool style - Remove header from deps_read output (just content with line numbers) - deps_grep now also searches *-sources.jar files when available - Enables searching Java source code in dependencies
- New deps-sources namespace for Maven coordinate parsing and source jar downloading - Downloads sources from Maven Central to ~/.clojure-mcp/deps_cache/ - Negative cache tracks jars without sources to avoid repeated download attempts - Only fetches Java sources when --type java or --glob "*.java" is specified - Memoizes jar lists by [project-dir java-sources?] for fast subsequent lookups - Parallel downloads using pmap for performance
- Use ripgrep for searching with context/multiline support, fallback to Clojure regex - Fix path separator for Windows compatibility (use File/pathSeparator) - Add shared binary-available? utility in clojure-mcp.utils.shell - Only cache 404s in negative cache, not transient network errors - Add binary availability checks with helpful error messages - Document external dependencies (clojure, unzip, rg, curl) - Consolidate binary checking across grep, deps_grep, deps_sources
- binary-available? now accepts optional probe args (defaults to --help) - Use -Sdescribe for clojure, -v for unzip
deps_grep now requires a library parameter (Maven group or group/artifact) to scope searches to specific dependencies, avoiding bulk source jar downloads. deps_list lets users discover available library coordinates.
b5d8784 to
b96e455
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 147-161: The glob-to-regex logic in glob-matches? currently only
escapes '.' and leaves other regex metacharacters unescaped, so update the
conversion to quote non-wildcard parts using java.util.regex.Pattern/quote (or
an equivalent escaping routine) before applying glob-specific replacements;
i.e., construct pattern-regex by first quoting the pattern, then unquoting or
replacing the quoted wildcard tokens so '*' becomes '.*' and handle the {a,b}
alternation by expanding or replacing the quoted brace section with a proper
'(a|b)' group, ensuring all other characters remain safely quoted; change
references in the function glob-matches? (pattern-regex construction and its use
in re-pattern) accordingly.
In `@src/clojure_mcp/tools/deps_list/core.clj`:
- Around line 23-24: The pattern compilation in the let binding (symbol
pattern-re using re-pattern on (:pattern opts)) can throw
java.util.regex.PatternSyntaxException for invalid user input; wrap the
re-pattern call in a try/catch that catches PatternSyntaxException and re-raises
a clearer error (e.g., via (throw (ex-info ... {:pattern p} e)) or returns a
user-friendly message) so callers see which pattern failed; update the binding
of pattern-re to perform this guarded compilation and include the offending
pattern string from opts in the error context.
In `@src/clojure_mcp/tools/deps_list/tool.clj`:
- Around line 1-6: The namespace clojure-mcp.tools.deps-list.tool is calling
clojure.string/join without declaring it; update the ns :require to include
[clojure.string :as string] and change the bare qualified call to use
string/join where used (e.g., in the function or expression around the core
usage that currently calls clojure.string/join). Ensure the new alias is used
consistently in this namespace (clojure-mcp.tools.deps-list.tool) for any other
clojure.string functions.
🧹 Nitpick comments (5)
src/clojure_mcp/tools/deps_sources/core.clj (3)
89-98: Indentation inconsistency — innerletis under-indented.Lines 92–93 (outer
letbody) are at 6-space indent, but the innerleton line 94 drops to 4-space. This breaks the expected 2-space indentation alignment within the outerletbody. As per coding guidelines: "Use 2-space indentation in Clojure code and maintain whitespace in edited forms."♻️ Proposed fix
(log/debug "Downloading sources jar:" url) (.mkdirs dest-dir) - (let [;; Download and capture HTTP status code - result (shell/sh "curl" "--silent" "--location" + (let [;; Download and capture HTTP status code + result (shell/sh "curl" "--silent" "--location"
147-150:cached-sources-pathcalled twice — theletbinding in thecondtest is not visible in the result expression.The
leton line 148 creates a localcachedbinding for the existence check, but line 150 recomputescached-sources-pathbecausecondtest/result pairs don't share bindings. Restructure to avoid the redundant computation.♻️ Proposed fix
- ;; Check our cache - (let [cached (cached-sources-path coords)] - (.exists cached)) - (.getAbsolutePath (cached-sources-path coords)) + ;; Check our cache + (.exists (cached-sources-path coords)) + (.getAbsolutePath (cached-sources-path coords))Or better, use an
if-let/letto bind once:- ;; Check our cache - (let [cached (cached-sources-path coords)] - (.exists cached)) - (.getAbsolutePath (cached-sources-path coords)) - - ;; Download from Maven Central - :else - (download-sources-jar! coords))))))) + ;; Check our cache or download + :else + (let [cached (cached-sources-path coords)] + (if (.exists cached) + (.getAbsolutePath cached) + (download-sources-jar! coords))))))))
59-71:load-no-sources-setreads without locking whilesave-no-sources!writes withlocking.Under
pmapparallelism inensure-sources-jars!, concurrent reads and writes to the same file can produce partial reads. The practical impact is low (worst case: a redundant 404 download), but for correctness you could use the samelockingmonitor inload-no-sources-setor load the set once before thepmapcall inensure-sources-jars!.src/clojure_mcp/tools/deps_grep/core.clj (2)
193-198: Extraneous|in character class[:|-].Ripgrep output separators are
:(match) and-(context). The|in the character class is unnecessary and slightly misleading. Use[:\-]or[:-]instead.♻️ Proposed fix
- (re-matches #"(\d+)([:|-])(.*)$" line)] + (re-matches #"(\d+)([:-])(.*)$" line)]
244-316: Core search loop with mutable atoms is functional but complex.The
deps-grepfunction uses three atoms (all-results,result-count,limit-reached) coordinated across nesteddoseqloops. This works correctly for single-threaded execution, but the cognitive complexity is notable. Consider extracting the per-jar search into a separate function and usingreduce/reducedfor early termination, which would be more idiomatic Clojure. This is optional — the current approach is correct.
Replace unzip/bash/curl shell dependencies with pure Java implementations for cross-platform compatibility. Add shared jar-utils namespace using java.util.zip.ZipFile for jar reading. Add Java HTTP fallback for source jar downloads with proper connection cleanup. Fix Maven path regex to handle Windows backslashes.
- Validate offset/limit in deps_read (reject negative values) - Escape regex metacharacters in glob-to-regex conversion - Wrap invalid regex pattern in deps_list with helpful error - Add missing clojure.string require in deps_list tool
Summary
deps_greptool - Search for patterns inside dependency jar files on the classpathdeps_readtool - Read files from inside dependency jars (usingjar-path:entry-pathformat)--type java)binary-available?utility - Consolidated binary checking across grep, deps_grep, deps_sourcesFeatures
~/.clojure-mcp/deps_cache/with negative cache for 404sRequirements
clojureCLI,unziprg(ripgrep) for context/multiline,curlfor Java source downloadsTest plan
Summary by CodeRabbit
New Features
Utilities
Chores