Skip to content

Add deps_grep and deps_read tools for searching dependency jars#147

Merged
bhauman merged 8 commits intomainfrom
deps-grep-read-tools
Feb 15, 2026
Merged

Add deps_grep and deps_read tools for searching dependency jars#147
bhauman merged 8 commits intomainfrom
deps-grep-read-tools

Conversation

@bhauman
Copy link
Owner

@bhauman bhauman commented Jan 22, 2026

Summary

  • New deps_grep tool - Search for patterns inside dependency jar files on the classpath
  • New deps_read tool - Read files from inside dependency jars (using jar-path:entry-path format)
  • Lazy Java source downloading - Automatically fetches source jars from Maven Central when searching Java files (--type java)
  • Shared binary-available? utility - Consolidated binary checking across grep, deps_grep, deps_sources

Features

  • Uses ripgrep for searching with context/multiline support, falls back to Clojure regex
  • Downloads sources to ~/.clojure-mcp/deps_cache/ with negative cache for 404s
  • Platform-safe path separator for Windows compatibility
  • Memoized jar lists for fast subsequent searches

Requirements

  • Required: clojure CLI, unzip
  • Optional: rg (ripgrep) for context/multiline, curl for Java source downloads

Test plan

  • All 285 tests pass
  • Tested deps_grep with Clojure files
  • Tested deps_grep with Java files (source download)
  • Tested ripgrep fallback path
  • Verified negative cache only stores 404s
  • Verified binary availability checks

Summary by CodeRabbit

  • New Features

    • Search inside dependency jars with filters, multiple output modes, context and optional source-jar support.
    • Browse and read files inside dependency jars with pagination, line numbers and entry parsing.
    • Enumerate and filter resolved project dependencies.
  • Utilities

    • Memoized binary-availability check for external tooling.
    • Jar inspection and sources-jar discovery/downloading with caching and negative-cache for missing sources.
  • Chores

    • Registered new dependency tools as read-only tools.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Adds three read-only MCP tools—deps-grep, deps-read, deps-list—plus supporting modules for classpath/jar inspection, sources downloading/caching, jar utilities, and a shared shell binary probe; also refactors grep availability to use the new shell helper.

Changes

Cohort / File(s) Summary
Tool Registration
src/clojure_mcp/tools.clj
Appends read-only tool symbols: clojure-mcp.tools.deps-grep.tool/deps-grep-tool, clojure-mcp.tools.deps-read.tool/deps-read-tool, clojure-mcp.tools.deps-list.tool/deps-list-tool.
deps-grep (core + tool)
src/clojure_mcp/tools/deps_grep/core.clj, src/clojure_mcp/tools/deps_grep/tool.clj
New feature: classpath resolution/caching, sources discovery/ensure, jar entry listing, ripgrep-based search with fallback, result aggregation, tool-system registration (schema, validation, execute, format).
deps-read (core + tool)
src/clojure_mcp/tools/deps_read/core.clj, src/clojure_mcp/tools/deps_read/tool.clj
New feature: list/read jar entries, offset/limit and max-line-length handling, jar:entry parsing, and tool registration for validation/execution/formatting.
deps-list (core + tool)
src/clojure_mcp/tools/deps_list/core.clj, src/clojure_mcp/tools/deps_list/tool.clj
New feature: resolve classpath jars, parse Maven coords, optional pattern filtering, sort and return dependency list; includes tool registration and formatting.
Sources downloader & cache
src/clojure_mcp/tools/deps_sources/core.clj
New module to locate/download -sources.jar files with local cache, negative (404) cache, URL construction, curl/Java download logic, and bulk resolution utilities.
Deps common jar utilities
src/clojure_mcp/tools/deps_common/jar_utils.clj
New Java-based jar helpers: list-jar-entries and read-jar-entry with try/catch and resource management.
Shell utilities
src/clojure_mcp/utils/shell.clj
New memoized binary-available? helper that probes binaries via shell execution and caches results.
Grep tool refactor
src/clojure_mcp/tools/grep/core.clj
Replaces in-file binary-availability cache with delegation to clojure-mcp.utils.shell/binary-available? and updates imports.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Tool as deps-grep Tool
    participant Core as deps-grep Core
    participant ClassPath as Classpath Resolver
    participant Sources as Sources Manager
    participant Jars as Jar Inspector
    participant Search as Search Engine

    User->>Tool: execute(pattern, opts)
    Tool->>Tool: validate-inputs(project-dir, pattern)
    Tool->>Core: deps-grep(project-dir, pattern, opts)
    Core->>ClassPath: cached-base-jars(project-dir)
    ClassPath-->>Core: jar-list
    alt needs Java sources
        Core->>Sources: ensure-sources-jars!(jar-list)
        Sources-->>Core: jars-with-sources
    end
    Core->>Jars: list-jar-entries(jar)
    Jars-->>Core: entries
    Core->>Jars: filter-entries(entries, glob/type)
    loop each entry
        Core->>Search: search-jar-entry(jar, entry, pattern, opts)
        alt rg available
            Search->>Search: search-jar-entry-rg (ripgrep)
        else
            Search->>Search: search-jar-entry-fallback (Clojure regex)
        end
        Search-->>Core: match-results
    end
    Core-->>Tool: aggregated-results
    Tool->>Tool: format-results(mode)
    Tool-->>User: formatted-output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇
I hopped through jars and tiny caches bright,
I sniffed for sources, peered at every line,
I chased each pattern in the bytewise night,
New tools now help the code and me align,
A rabbit cheers for deps and search divine.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main addition: two new tools (deps_grep and deps_read) for searching and reading dependency JAR files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch deps-grep-read-tools

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/clojure_mcp/tools/deps_read/core.clj (2)

7-7: Unused require: taoensso.timbre.

log is never referenced in this file. Remove to keep imports clean.

Proposed fix
  (:require
   [clojure.string :as str]
   [clojure.java.io :as io]
-  [clojure-mcp.tools.deps-common.jar-utils :as jar-utils]
-  [taoensso.timbre :as log]))
+  [clojure-mcp.tools.deps-common.jar-utils :as jar-utils]))

114-118: Minor: defaults for offset and max-line-length are duplicated.

Lines 116/118 re-specify the same defaults (0 and 2000) that read-jar-entry already declares in its :or map on line 38. If these ever drift, you'll get inconsistent behavior depending on the call path. Consider passing through only the keys that are present in opts instead.

Suggested approach
  ([jar-path entry-path opts]
-  (read-jar-entry jar-path entry-path
-                  :offset (or (:offset opts) 0)
-                  :limit (:limit opts)
-                  :max-line-length (or (:max-line-length opts) 2000))))
+  (apply read-jar-entry jar-path entry-path
+         (mapcat identity (select-keys opts [:offset :limit :max-line-length])))))
src/clojure_mcp/tools/deps_list/core.clj (1)

4-7: clojure.string is required but never used in this file.

The alias str for clojure.string is declared but no str/… qualified call appears anywhere in this namespace. The bare str calls on lines 25 and 27 resolve to clojure.core/str.

♻️ Proposed fix
 (ns clojure-mcp.tools.deps-list.core
   "Core implementation for listing project dependencies.
    Resolves the classpath and parses Maven coordinates from jar paths."
   (:require
-   [clojure.string :as str]
    [clojure-mcp.tools.deps-grep.core :as deps-grep]
    [clojure-mcp.tools.deps-sources.core :as deps-sources]))

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 173-183: Replace the bash pipeline in the search-jar-entry block:
instead of building cmd and calling (shell/sh "bash" "-c" cmd), run unzip
separately to capture unzip-result (using jar-path and entry-path), then invoke
ripgrep by calling shell/sh with "rg" and rg-opts and pattern, passing the unzip
output via the :in keyword argument (use apply/concat to construct args) so
shell/sh receives :in as a proper keyword; reference rg-opts, pattern, jar-path,
entry-path and the surrounding let where cmd/result are defined to locate and
modify the code accordingly.

In `@src/clojure_mcp/tools/deps_read/core.clj`:
- Around line 44-59: Validate the :offset and :limit options at the start of the
function that accepts [jar-path entry-path & {:keys [offset limit
max-line-length] :or {offset 0 max-line-length 2000}}]: ensure offset is a
non-negative integer (>= 0) and if limit is provided it is a positive integer (>
0); if either check fails throw an ex-info with a clear message and include the
invalid value(s) in the ex-info map (e.g., {:offset offset :limit limit}) so
callers get a clear error instead of confusing negative indexing or silent empty
output.

In `@src/clojure_mcp/tools/deps_sources/core.clj`:
- Around line 32-35: The regex in the re-find call fails on Windows because
jar-path may contain backslashes; normalize jar-path by converting backslashes
to forward slashes before matching (e.g., create a normalized-path from jar-path
and use it in the re-find), then proceed with the existing destructuring (match
-> [_ group-path artifact version jar-name]) and group-id computation
(str/replace group-path "/" ".") so Windows .m2/repository paths are parsed
correctly.
🧹 Nitpick comments (2)
src/clojure_mcp/tools/deps_read/core.clj (1)

54-68: Avoid reading entire entry when only a slice is requested.

Line 54 reads the full entry into memory and only then applies :offset/:limit, which can be costly for large sources. Consider streaming the entry (e.g., via java.util.zip.ZipFile + line-seq) and stopping once the limit is reached.

src/clojure_mcp/tools/deps_grep/tool.clj (1)

69-105: Consider renaming count binding to avoid shadowing clojure.core/count.

The destructured count binding on line 73 shadows clojure.core/count, requiring explicit qualification on line 82. A clearer approach would be to rename the binding.

♻️ Suggested rename to avoid shadowing
 (defmethod tool-system/format-results :deps-grep [_ result]
   (if (:error result)
     {:result [(:error result)]
      :error true}
-    (let [{:keys [results count truncated]} result]
+    (let [{:keys [results truncated] match-count :count} result]
       {:result [(cond
                   ;; Count mode
                   (contains? result :count)
-                  (str "Found " count " matches"
+                  (str "Found " match-count " matches"
                        (when truncated " (truncated)"))

                   ;; Files with matches mode
                   (and (seq results) (not (contains? (first results) :matches)))
-                  (str "Found " (clojure.core/count results) " files with matches"
+                  (str "Found " (count results) " files with matches"

Bruce Hauman added 6 commits February 14, 2026 18:36
- deps_grep: Search patterns in dependency jars on the classpath
  - Uses `clojure -Spath` to resolve exact dependency jars (cached)
  - Searches inside jars with unzip + regex matching
  - Supports glob/type filters, output modes (content/files/count)
  - Returns jar:entry paths for use with deps_read

- deps_read: Read files from inside dependency jars
  - Takes file_path in jar:entry format (from deps_grep results)
  - Supports offset/limit for large files (mirrors read_file API)
  - Returns content with line numbers
- Format line numbers with arrow (→) to match Read tool style
- Remove header from deps_read output (just content with line numbers)
- deps_grep now also searches *-sources.jar files when available
- Enables searching Java source code in dependencies
- New deps-sources namespace for Maven coordinate parsing and source jar downloading
- Downloads sources from Maven Central to ~/.clojure-mcp/deps_cache/
- Negative cache tracks jars without sources to avoid repeated download attempts
- Only fetches Java sources when --type java or --glob "*.java" is specified
- Memoizes jar lists by [project-dir java-sources?] for fast subsequent lookups
- Parallel downloads using pmap for performance
- Use ripgrep for searching with context/multiline support, fallback to Clojure regex
- Fix path separator for Windows compatibility (use File/pathSeparator)
- Add shared binary-available? utility in clojure-mcp.utils.shell
- Only cache 404s in negative cache, not transient network errors
- Add binary availability checks with helpful error messages
- Document external dependencies (clojure, unzip, rg, curl)
- Consolidate binary checking across grep, deps_grep, deps_sources
- binary-available? now accepts optional probe args (defaults to --help)
- Use -Sdescribe for clojure, -v for unzip
deps_grep now requires a library parameter (Maven group or group/artifact)
to scope searches to specific dependencies, avoiding bulk source jar
downloads. deps_list lets users discover available library coordinates.
@bhauman bhauman force-pushed the deps-grep-read-tools branch from b5d8784 to b96e455 Compare February 15, 2026 01:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/clojure_mcp/tools/deps_grep/core.clj`:
- Around line 147-161: The glob-to-regex logic in glob-matches? currently only
escapes '.' and leaves other regex metacharacters unescaped, so update the
conversion to quote non-wildcard parts using java.util.regex.Pattern/quote (or
an equivalent escaping routine) before applying glob-specific replacements;
i.e., construct pattern-regex by first quoting the pattern, then unquoting or
replacing the quoted wildcard tokens so '*' becomes '.*' and handle the {a,b}
alternation by expanding or replacing the quoted brace section with a proper
'(a|b)' group, ensuring all other characters remain safely quoted; change
references in the function glob-matches? (pattern-regex construction and its use
in re-pattern) accordingly.

In `@src/clojure_mcp/tools/deps_list/core.clj`:
- Around line 23-24: The pattern compilation in the let binding (symbol
pattern-re using re-pattern on (:pattern opts)) can throw
java.util.regex.PatternSyntaxException for invalid user input; wrap the
re-pattern call in a try/catch that catches PatternSyntaxException and re-raises
a clearer error (e.g., via (throw (ex-info ... {:pattern p} e)) or returns a
user-friendly message) so callers see which pattern failed; update the binding
of pattern-re to perform this guarded compilation and include the offending
pattern string from opts in the error context.

In `@src/clojure_mcp/tools/deps_list/tool.clj`:
- Around line 1-6: The namespace clojure-mcp.tools.deps-list.tool is calling
clojure.string/join without declaring it; update the ns :require to include
[clojure.string :as string] and change the bare qualified call to use
string/join where used (e.g., in the function or expression around the core
usage that currently calls clojure.string/join). Ensure the new alias is used
consistently in this namespace (clojure-mcp.tools.deps-list.tool) for any other
clojure.string functions.
🧹 Nitpick comments (5)
src/clojure_mcp/tools/deps_sources/core.clj (3)

89-98: Indentation inconsistency — inner let is under-indented.

Lines 92–93 (outer let body) are at 6-space indent, but the inner let on line 94 drops to 4-space. This breaks the expected 2-space indentation alignment within the outer let body. As per coding guidelines: "Use 2-space indentation in Clojure code and maintain whitespace in edited forms."

♻️ Proposed fix
       (log/debug "Downloading sources jar:" url)
       (.mkdirs dest-dir)
-    (let [;; Download and capture HTTP status code
-          result (shell/sh "curl" "--silent" "--location"
+      (let [;; Download and capture HTTP status code
+            result (shell/sh "curl" "--silent" "--location"

147-150: cached-sources-path called twice — the let binding in the cond test is not visible in the result expression.

The let on line 148 creates a local cached binding for the existence check, but line 150 recomputes cached-sources-path because cond test/result pairs don't share bindings. Restructure to avoid the redundant computation.

♻️ Proposed fix
-            ;; Check our cache
-            (let [cached (cached-sources-path coords)]
-              (.exists cached))
-            (.getAbsolutePath (cached-sources-path coords))
+            ;; Check our cache
+            (.exists (cached-sources-path coords))
+            (.getAbsolutePath (cached-sources-path coords))

Or better, use an if-let/let to bind once:

-            ;; Check our cache
-            (let [cached (cached-sources-path coords)]
-              (.exists cached))
-            (.getAbsolutePath (cached-sources-path coords))
-
-            ;; Download from Maven Central
-            :else
-            (download-sources-jar! coords)))))))
+            ;; Check our cache or download
+            :else
+            (let [cached (cached-sources-path coords)]
+              (if (.exists cached)
+                (.getAbsolutePath cached)
+                (download-sources-jar! coords))))))))

59-71: load-no-sources-set reads without locking while save-no-sources! writes with locking.

Under pmap parallelism in ensure-sources-jars!, concurrent reads and writes to the same file can produce partial reads. The practical impact is low (worst case: a redundant 404 download), but for correctness you could use the same locking monitor in load-no-sources-set or load the set once before the pmap call in ensure-sources-jars!.

src/clojure_mcp/tools/deps_grep/core.clj (2)

193-198: Extraneous | in character class [:|-].

Ripgrep output separators are : (match) and - (context). The | in the character class is unnecessary and slightly misleading. Use [:\-] or [:-] instead.

♻️ Proposed fix
-                                              (re-matches #"(\d+)([:|-])(.*)$" line)]
+                                              (re-matches #"(\d+)([:-])(.*)$" line)]

244-316: Core search loop with mutable atoms is functional but complex.

The deps-grep function uses three atoms (all-results, result-count, limit-reached) coordinated across nested doseq loops. This works correctly for single-threaded execution, but the cognitive complexity is notable. Consider extracting the per-jar search into a separate function and using reduce/reduced for early termination, which would be more idiomatic Clojure. This is optional — the current approach is correct.

Bruce Hauman added 2 commits February 15, 2026 11:21
Replace unzip/bash/curl shell dependencies with pure Java implementations
for cross-platform compatibility. Add shared jar-utils namespace using
java.util.zip.ZipFile for jar reading. Add Java HTTP fallback for source
jar downloads with proper connection cleanup. Fix Maven path regex to
handle Windows backslashes.
- Validate offset/limit in deps_read (reject negative values)
- Escape regex metacharacters in glob-to-regex conversion
- Wrap invalid regex pattern in deps_list with helpful error
- Add missing clojure.string require in deps_list tool
@bhauman bhauman merged commit 821050b into main Feb 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant