fix(cuga_lite): cap bind_tools count and shortlist over the limit by haroldship · Pull Request #203 · cuga-project/cuga-agent

haroldship · 2026-05-11T12:42:36Z

Summary

Adds a provider-safe cap on the number of tools sent to LLM.bind_tools() from cuga_lite, fixing a hard 400 from Groq/OpenAI when the candidate set exceeds the provider's per-request limit (e.g. m3 hockey: 206 tools, mode=all).
When the cap is exceeded, the existing LLM shortlister (PromptUtils.find_tools machinery) ranks the candidates against the first user message and the top-K tools are bound (reserving 1 slot for find_tools when include_find_tools=True).
When shortlisting is impossible (no query, shortlister failure, empty ranking, or hallucinated names), raise an actionable RuntimeError. Silent truncation would corrupt benchmark results comparing native tool-calling vs text-mode — failing loudly is intentional.

Closes #202

New settings

`advanced_features.cuga_lite_bind_tools_max_count`

Default: 128 (matches Groq/OpenAI).
Set to 0 (or negative) to disable the cap for permissive backends (WatsonX, Anthropic via LiteLLM) — also avoids the shortlister LLM round-trip entirely.
Override via DYNACONF_ADVANCED_FEATURES__CUGA_LITE_BIND_TOOLS_MAX_COUNT.

`advanced_features.cuga_lite_bind_tools_pad_to_cap`

Default: false. When the shortlister returns fewer tools than the cap allows, the default behavior is to bind only those tools (often 1–4 on the existing system prompt).
Set to true to pad the shortlister output with the remaining candidates so the cap is fully utilized. Useful for research scenarios that explicitly want mode=all to bind as many tools as the provider will accept.
Off by default because cuga_lite is a code-execution agent: padding pushes the model toward native tool_calls mode, which the code-mode flow doesn't fully exercise (measured on m3 hockey: 0 tool calls with padding vs 5–7 without).
Override via DYNACONF_ADVANCED_FEATURES__CUGA_LITE_BIND_TOOLS_PAD_TO_CAP.

Operational notes

Latency / cost: When the candidate count exceeds max_count, applying the cap incurs one extra LLM round-trip (the shortlister) per call_model invocation. To avoid the round-trip on permissive backends, set cuga_lite_bind_tools_max_count=0.
Loud failures (intentional): When shortlisting cannot run safely — empty user query, shortlister exception, empty ranking, or hallucinated tool names that don't match any candidate — a RuntimeError is raised rather than silently truncating. Benchmark/research runs comparing native tool-calling vs text-mode therefore can't degrade silently; downstream callers should expect to see this error and either widen the cap, fix the shortlister LLM, or refine the user query.

Where the change lives

src/cuga/backend/cuga_graph/nodes/cuga_lite/bind_tools/ (new subpackage)
- cap.py — bind_tools_max_count_from_settings, bind_tools_pad_to_cap_from_settings, apply_bind_tools_cap_and_merge, plus private helpers (_resolve_find_tools_overlay, _build_ranking_pool, _run_shortlister, _materialize_shortlist, _maybe_pad_to_cap).
- __init__.py — re-exports the three public names used by cuga_lite_graph.py.
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
- resolve_model_with_bind_tools stays as orchestration — it imports from bind_tools and uses a local _cap_merge_bound(bound) closure to DRY the four mode={all,apps,tools,apps_and_tools} call sites.
- resolve_model_with_bind_tools gains an optional query: Optional[str] kwarg.
- The top-level except Exception no longer swallows the actionable RuntimeError from the cap helper.
- call_model passes _first_user_message_text(state.chat_messages) through as query.
src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py
- New PromptUtils.shortlist_tool_names(query, all_tools, all_apps, llm, top_k, instructions) — returns ranked tool names. Internally reuses the existing ShortListerOutputLite chain (./prompts/shortlister/system.jinja2); injects a top_k instruction.
- New private PromptUtils._build_shortlister_payload(...) — shared payload builder between find_tools and shortlist_tool_names so the two callers can't drift.
src/cuga/config.py
- New validators: cuga_lite_bind_tools_max_count (default 128) and cuga_lite_bind_tools_pad_to_cap (default False).
src/cuga/settings.toml
- Documents both new settings inline with the existing cuga_lite_bind_tools_* block.

Backward compatibility

Users with ≤128 tools see no behavior change (cap path is a no-op, no shortlister call).
Users previously hitting the Groq 400 now get a working shortlisted bind.
Users explicitly wanting old behavior on permissive backends: set CUGA_LITE_BIND_TOOLS_MAX_COUNT=0.
Users explicitly wanting padding on top of shortlisting: set CUGA_LITE_BIND_TOOLS_PAD_TO_CAP=true.

Test plan

pytest tests/unit/test_cuga_lite_bind_tools.py -q — 18 passed (existing + new):
- shortlist runs when over cap (mode=all)
- actionable RuntimeError raised when over cap and no query
- actionable RuntimeError raised when shortlister hallucinates names
- no shortlist call when under cap
- cap=0 disables the cap and binds everything
- include_find_tools reserves a slot (top_k = cap − 1) and find_tools is appended after shortlisting
- pad_to_cap=true pads with remaining candidates up to the cap
ruff check + ruff format --check on all touched files: clean
End-to-end m3 hockey eval (./scripts/eval.sh --benchmark m3 --capability m3_task_2 --domain hockey --max-samples 1 with CUGA_LITE_BIND_TOOLS_MODE=all) — to be exercised by reporter

Summary by CodeRabbit

New Features
- Provider-safe configurable cap on bound tools (default 128, 0 disables).
- LLM-driven shortlisting of top-K tools when candidates exceed the cap; can reserve and always include a discovery tool.
- Shortlisting uses the initial user message for relevance and supports optional padding to fill to the cap.
Bug Fixes
- Replaces silent truncation with clear errors when safe shortlisting cannot run.
Tests
- Expanded tests for cap behavior, shortlisting, padding, discovery-tool inclusion, and shortlister failures.

`bind_tools` cap + `pad_to_cap` cross-provider verification

m3 hockey task (task_2, --max-samples 1) with CUGA_LITE_BIND_TOOLS_MODE=all, cap=128. pad_to_cap per m3.env at run time. Updated by scripts/m3_pad_to_cap_verify.sh.

provider	ran	no exceptions	cap path exercised	padded	pass/total	tool_calls	duration	cuga-agent HEAD	when (UTC)
`groq`	yes	yes	yes	yes (127 padded → 128 bound)	0 / 1 (0.0%)	7/1	47.9s	`6af34316`	2026-05-11T16:24Z
`litellm`	yes	yes	yes	yes (127 padded → 128 bound)	0 / 1 (0.0%)	0/1	--	`acdbf18b`	2026-05-11T16:43Z
`watsonx`	yes	yes	yes	yes (127 padded → 128 bound)	0 / 1 (0.0%)	1/1	184.2s	`acdbf18b`	2026-05-11T17:14Z

`resolve_model_with_bind_tools` blindly passed every candidate tool to `LLM.bind_tools()`, which fails hard on Groq/OpenAI (max 128 tools) for benchmarks like m3-hockey (206 tools). Add `cuga_lite_bind_tools_max_count` (default 128). When the candidate list exceeds the cap, run the existing LLM shortlister against the first user message and bind the top-K most relevant tools (reserving 1 slot for find_tools when `include_find_tools=True`). When shortlisting is impossible (no query / shortlister failure / empty ranking), raise an actionable RuntimeError naming the count, cap, and remedy — silent truncation would corrupt research comparing native tool-calling against text-mode. WatsonX / LiteLLM-to-Anthropic users on permissive backends can disable the cap via `DYNACONF_ADVANCED_FEATURES__CUGA_LITE_BIND_TOOLS_MAX_COUNT=0`. Closes #202 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T12:51:38Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a configurable provider-safe cap on native tool binding (default 128). When candidates exceed the cap, an LLM shortlister ranks tools using the first user message as query to pick a top-K subset; the code reserves/merges find_tools when configured and raises errors when safe shortlisting cannot run.

Changes

Tool Binding Cap & LLM Shortlisting

Layer / File(s)	Summary
Configuration Schema `src/cuga/config.py`	Dynaconf validator adds `advanced_features.cuga_lite_bind_tools_max_count` with default value of 128.
Tool Shortlisting Utility `src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py`	New `PromptUtils._build_shortlister_payload` serializer and `PromptUtils.shortlist_tool_names` async method rank tools for a query using the shortlister LLM chain, returning deduplicated top-K tool names with support for custom instructions and optional LLM override. `find_tools` now reuses the shared payload builder.
Settings Reader Helpers `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`	`_bind_tools_max_count_from_settings()` and `_bind_tools_pad_to_cap_from_settings()` read cap and padding opt-in from settings with defensive parsing/fallbacks.
Capped Tool Binding & Merge `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`	`_apply_bind_tools_cap_and_merge(...)` enforces `max_count`, invokes `PromptUtils.shortlist_tool_names` when over-cap (reserving capacity for `find_tools` if enabled), optionally pads to the cap, merges `find_tools`, and raises `RuntimeError` on unsafe shortlisting.
resolve_model_with_bind_tools Signature & Doc `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`	Function signature extended to accept optional `query: Optional[str]` parameter; `max_count` computed via settings helper; internal logic delegates capping/merging to `_apply_bind_tools_cap_and_merge`.
RuntimeError Re-raising `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`	Exception handler in `resolve_model_with_bind_tools` explicitly re-raises `RuntimeError` from capping/shortlisting logic so cap failures propagate as actionable errors; other exceptions continue to be warned and fallbacked.
call_model Node Integration `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`	`call_model` node now extracts first user message text via `_first_user_message_text()` and passes it as `query` argument to `resolve_model_with_bind_tools`, enabling shortlisting context when cap is exceeded.
Unit Tests & Imports `tests/unit/test_cuga_lite_bind_tools.py`	New async unit tests validate over-cap shortlisting, over-cap without query raising, under-cap behaviour, disabled-cap binding-all, padding opt-in behavior, and reserved slot for `find_tools`. Import consolidation includes `patch` and other mocks.

Sequence Diagram(s)

sequenceDiagram
  participant CallModel
  participant ResolveModel as resolve_model_with_bind_tools
  participant ApplyCap as _apply_bind_tools_cap_and_merge
  participant Shortlist as PromptUtils.shortlist_tool_names
  participant LLM
  participant Model
  CallModel->>ResolveModel: call with query (first user message)
  ResolveModel->>ApplyCap: candidate_tools, max_count, query
  alt candidates exceed cap
    ApplyCap->>Shortlist: query, all_tools, all_apps, top_k
    Shortlist->>LLM: shortlister prompt + serialized payload
    LLM-->>Shortlist: ranked tool names
    Shortlist-->>ApplyCap: top-K tool names
    ApplyCap->>ApplyCap: reserve slot for find_tools, pad to cap if enabled
    ApplyCap-->>ResolveModel: capped + merged tool list
  else candidates within cap
    ApplyCap-->>ResolveModel: all tools
  end
  ResolveModel->>Model: bind_tools(selected_tools)
  Model-->>ResolveModel: bound model

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

cuga-project/cuga-agent#202: Main changes implement the provider-safe cap and LLM shortlisting behavior described in this issue.

Possibly related PRs

cuga-project/cuga-agent#67: Related changes to the cuga-lite shortlister/find_tools flow and fallback behavior.
cuga-project/cuga-agent#178: Related modifications to cuga_lite tool-binding and resolve_model integration.

Suggested reviewers

sami-marreed
gjt-prog

Poem

🐰 I’m a rabbit with a ranking hat,
When tools overflow I pick just that,
A query guides which names to keep,
I merge find_tools and guard the heap,
Provider-safe and snug in my lap.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main change: adding a provider-safe cap on bind_tools count and shortlisting tools when the cap is exceeded.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bind-tools-cap-shortlist-202

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py (1)

434-477: ⚡ Quick win

Share the shortlister payload builder with find_tools.

This method already diverges from PromptUtils.find_tools: it rebuilds the same prompt/serialization flow but drops _response_schemas and _param_constraints. That means bind-time shortlisting can rank a different top-K than the existing find_tools path, and the two implementations will keep drifting.

♻️ Suggested direction

+    `@staticmethod`
+    def _build_shortlister_payload(
+        all_tools: List[StructuredTool],
+        all_apps: List[AppDefinition],
+    ) -> tuple[Dict[str, Any], Dict[str, Any]]:
+        tools_as_dict: Dict[str, Any] = {}
+        for tool in all_tools:
+            tool_dict = tool.model_dump()
+            ...
+            if hasattr(tool, "func"):
+                if hasattr(tool.func, "_response_schemas"):
+                    tool_dict["_response_schemas"] = tool.func._response_schemas
+                if hasattr(tool.func, "_param_constraints"):
+                    tool_dict["_param_constraints"] = tool.func._param_constraints
+            tools_as_dict[tool.name] = tool_dict
+        apps_as_dict = {app.name: app.model_dump() for app in all_apps}
+        return tools_as_dict, apps_as_dict
+
     async def find_tools(...):
-        tools_as_dict = {}
-        ...
-        apps_as_dict = {app.name: app.model_dump() for app in all_apps}
+        tools_as_dict, apps_as_dict = PromptUtils._build_shortlister_payload(all_tools, all_apps)

     async def shortlist_tool_names(...):
-        tools_as_dict: Dict[str, Any] = {}
-        ...
-        apps_as_dict = {app.name: app.model_dump() for app in all_apps}
+        tools_as_dict, apps_as_dict = PromptUtils._build_shortlister_payload(all_tools, all_apps)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py` around lines 434
- 477, This shortlister duplicates the payload-building logic from
PromptUtils.find_tools but omits _response_schemas and _param_constraints
causing divergence; refactor to reuse the same builder (or call the existing
PromptUtils.find_tools payload/serialization helper) instead of reserializing
inline: locate the tools serialization block that builds tools_as_dict (and the
prompt created via create_chat_prompt_from_templates / ShortListerOutputLite /
BaseAgent.get_chain) and replace it with the shared payload builder used by
find_tools so the produced payload includes _response_schemas and
_param_constraints and both paths remain consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 327-361: The code treats find_tools_tool inconsistently for
counting vs reserving and can produce max_count+1 tools; fix by treating
find_tools_tool as already counted if bound already contains it: when computing
cap checks (the branch with cap_disabled or len(bound) <= max_count) and when
computing reserve, first derive ft_name from find_tools_tool (ft_name =
getattr(find_tools_tool,"name","") ) and test membership against
{getattr(t,"name","") for t in bound}; only append via _append_find_tools when
ft_name is non-empty and not already present in bound, and set reserve = 1 only
if find_tools_tool is non-None and ft_name not in bound so target_k correctly
reflects available shortlist slots.

---

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py`:
- Around line 434-477: This shortlister duplicates the payload-building logic
from PromptUtils.find_tools but omits _response_schemas and _param_constraints
causing divergence; refactor to reuse the same builder (or call the existing
PromptUtils.find_tools payload/serialization helper) instead of reserializing
inline: locate the tools serialization block that builds tools_as_dict (and the
prompt created via create_chat_prompt_from_templates / ShortListerOutputLite /
BaseAgent.get_chain) and replace it with the shared payload builder used by
find_tools so the produced payload includes _response_schemas and
_param_constraints and both paths remain consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90c0edbd-8f6e-4bd0-9b35-3aeba1c92a7d

📥 Commits

Reviewing files that changed from the base of the PR and between b5b72b2 and 83fd812.

📒 Files selected for processing (4)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py
src/cuga/config.py
tests/unit/test_cuga_lite_bind_tools.py

Two follow-ups discovered while exercising the cap path on the m3 hockey benchmark (206 tools, Groq llama-3, mode=all): 1. Loguru log strings used %-style placeholders so the cap exceeded / shortlisted-to messages printed literal "%s" / "%d". Switch to {}. 2. The existing shortlister LLM system prompt is biased toward "the most relevant" tools and typically returns 1-4 entries even when asked for K=128. Padding the result with the remaining tools (provider order) does fill the cap, but binding ~128 native tools pushes the Groq llama model into native tool_calls mode, which cuga_lite's code-execution flow does not fully exercise — measured regression from 5-7 tool calls / pass to 0 tool calls / fail on the same task. Add `cuga_lite_bind_tools_pad_to_cap` (default False) and keep the shortlister's natural ranked prefix as the bound set. Result on the m3 hockey task: 0/1 to 1/1 pass, 17s, 4 tool calls in code-mode. Users explicitly chasing "bind every slot the provider allows" can flip the setting on for that experiment.

Per coderabbit review on #203: the under-cap fast path and the over-cap reserve logic treated find_tools inconsistently. - When len(bound) == max_count and include_find_tools=True with a non-overlapping find_tools, the fast path returned max_count+1 tools and the provider rejected the request. - When the overlay path had already placed find_tools into bound and include_find_tools=True, reserve=1 dropped a real shortlist slot even though find_tools was already counted in bound. Track whether find_tools is already in bound, compute an effective_count for the under-cap check, and only set reserve=1 when find_tools genuinely needs to be appended. Add two regression tests for the boundary cases.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

356-390: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep find_tools guaranteed when include_find_tools=True.

When find_tools comes in via the overlay, need_to_append_find_tools becomes False. In the over-cap path that means reserve=0 and _append_find_tools() is a no-op, so the shortlister can rank find_tools out of the final list entirely. That breaks the include_find_tools=True contract and also makes max_count=1 fail even though binding only find_tools would still fit.

Proposed fix

-    find_tools_already_in_bound = bool(find_tools_name) and any(
+    find_tools_already_in_bound = bool(find_tools_name) and any(
         getattr(t, "name", "") == find_tools_name for t in bound
     )
-    need_to_append_find_tools = find_tools_tool is not None and not find_tools_already_in_bound
+    keep_find_tools = include_find_tools and find_tools_tool is not None
+    ranking_pool = (
+        [t for t in bound if getattr(t, "name", "") != find_tools_name]
+        if keep_find_tools and find_tools_already_in_bound
+        else bound
+    )

     def _append_find_tools(tools: List[StructuredTool]) -> List[StructuredTool]:
-        if not need_to_append_find_tools:
+        if not keep_find_tools or find_tools_tool is None:
             return tools
         if find_tools_name in {getattr(t, "name", "") for t in tools}:
             return tools
         return [*tools, find_tools_tool]

     cap_disabled = max_count <= 0
-    effective_count = len(bound) + (1 if need_to_append_find_tools else 0)
+    effective_count = len(ranking_pool) + (1 if keep_find_tools else 0)
     if cap_disabled or effective_count <= max_count:
-        return _append_find_tools(bound)
+        return _append_find_tools(ranking_pool)

-    reserve = 1 if need_to_append_find_tools else 0
+    reserve = 1 if keep_find_tools else 0
     target_k = max_count - reserve
     if target_k <= 0:
-        raise RuntimeError(
-            f"cuga_lite_bind_tools_max_count={max_count} is too small to fit even find_tools "
-            f"(reserve={reserve}). Raise the cap."
-        )
+        return _append_find_tools([])

     ...
-            all_tools=bound,
+            all_tools=ranking_pool,
             ...
-    by_name = {getattr(t, "name", ""): t for t in bound}
+    by_name = {getattr(t, "name", ""): t for t in ranking_pool}

Also applies to: 414-460

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
356 - 390, The shortlister can drop find_tools when it's already present in
bound because reserve is only set based on need_to_append_find_tools; to
guarantee include_find_tools=True you must treat that tool as reserved even if
it was placed in-band. Change the reserve logic to reserve a slot whenever
include_find_tools is true (i.e. set reserve = 1 if include_find_tools else 0)
and ensure effective_count/shortlist decisions continue to reflect
need_to_append_find_tools for counting new appends; keep _append_find_tools and
find_tools_already_in_bound as-is so the append only happens when needed but the
shortlister cannot evict the in-band find_tools thanks to the new reserve
behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 397-403: The logger.warning call inside the block where all_apps
is populated from tool_provider.get_apps() uses printf-style "%s" formatting
which doesn't work with Loguru; change the call in the try/except that catches
Exception e to use Loguru's {} style (e.g., logger.warning("bind_tools cap:
tool_provider.get_apps() failed: {}", e)) so the exception message renders
correctly; update the logging invocation near the
all_apps/tool_provider/get_apps handling to use {} formatting and keep the same
message content.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 356-390: The shortlister can drop find_tools when it's already
present in bound because reserve is only set based on need_to_append_find_tools;
to guarantee include_find_tools=True you must treat that tool as reserved even
if it was placed in-band. Change the reserve logic to reserve a slot whenever
include_find_tools is true (i.e. set reserve = 1 if include_find_tools else 0)
and ensure effective_count/shortlist decisions continue to reflect
need_to_append_find_tools for counting new appends; keep _append_find_tools and
find_tools_already_in_bound as-is so the append only happens when needed but the
shortlister cannot evict the in-band find_tools thanks to the new reserve
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cde90c26-803b-4a75-9706-57b02fca574f

📥 Commits

Reviewing files that changed from the base of the PR and between b4b154c and 6af3431.

📒 Files selected for processing (2)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
tests/unit/test_cuga_lite_bind_tools.py

Per coderabbit nitpick on #203: shortlist_tool_names duplicated the tool serialization from find_tools but dropped _response_schemas and _param_constraints, so bind-time ranking could diverge from runtime find_tools ranking on the same tool set. Extract PromptUtils._build_shortlister_payload as the single source of truth for serializing (all_tools, all_apps) into the prompt-friendly dicts. Both callers now produce identical payloads including the response-schema and param-constraint fields the LLM uses to rank.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py (1)

225-234: ⚡ Quick win

Narrow the blind exception in schema serialization.

On Line 232, except Exception can mask unexpected bugs and silently strip tool schema from ranking input. Catch expected serialization errors explicitly so unexpected failures surface.

Proposed diff

             if hasattr(tool, 'args_schema') and tool.args_schema:
                 try:
                     if hasattr(tool.args_schema, 'schema'):
                         tool_dict['args_schema'] = tool.args_schema.schema()
                     elif hasattr(tool.args_schema, 'model_json_schema'):
                         tool_dict['args_schema'] = tool.args_schema.model_json_schema()
                     else:
                         tool_dict['args_schema'] = {}
-                except Exception as e:
+                except (AttributeError, TypeError, ValueError) as e:
                     logger.debug(f"Failed to serialize args_schema for tool {tool.name}: {e}")
                     tool_dict['args_schema'] = {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py` around lines 225
- 234, The try/except around args_schema serialization in the prompt_utils code
is too broad (it catches Exception) and can hide unexpected bugs; change the
except to only catch expected serialization-related errors (for example: except
(AttributeError, TypeError, ValueError) as e:) when accessing
tool.args_schema.schema() or model_json_schema(), log the error with
logger.debug (including the exception) and set tool_dict['args_schema'] = {},
and let any other exceptions propagate so unexpected failures surface; refer to
the block that checks hasattr(tool.args_schema, 'schema') / 'model_json_schema'
and the logger.debug call that mentions tool.name to locate the code to modify.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py`:
- Around line 427-428: The guard that returns [] currently checks only top_k and
all_tools but misses whitespace-only queries; update the shortlist entry point
(the function containing the if top_k <= 0 or not all_tools) to treat a query of
None or a query that is all whitespace as empty by trimming (e.g., query =
(query or "").strip()) and then checking if not query before proceeding to call
the LLM; use the existing variables query, top_k, and all_tools so the function
returns [] when the trimmed query is empty, preserving the current behavior for
top_k and all_tools.

---

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py`:
- Around line 225-234: The try/except around args_schema serialization in the
prompt_utils code is too broad (it catches Exception) and can hide unexpected
bugs; change the except to only catch expected serialization-related errors (for
example: except (AttributeError, TypeError, ValueError) as e:) when accessing
tool.args_schema.schema() or model_json_schema(), log the error with
logger.debug (including the exception) and set tool_dict['args_schema'] = {},
and let any other exceptions propagate so unexpected failures surface; refer to
the block that checks hasattr(tool.args_schema, 'schema') / 'model_json_schema'
and the logger.debug call that mentions tool.name to locate the code to modify.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 424758d9-3d74-4181-97e2-5287f364951e

📥 Commits

Reviewing files that changed from the base of the PR and between 6af3431 and acdbf18.

📒 Files selected for processing (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py

haroldship · 2026-05-12T03:46:25Z

This does NOT seem to improve M3, just fixes an issue with bind_tools where too many tools causes errors from LLM invoke.

Here are some results for hockey domain in capabilty 2, which are similar to previous:

============================== FINAL SUMMARY ==============================

================================================================================
EVALUATION COMPLETE — Vakra LLM judges (correctness × groundedness × exact-match)
================================================================================
Scoring mode:          live-mcp
Scored samples:        10
Pass / total:          7 / 10  (70.0%)
Mean dialogue score:   0.7000
Min  dialogue score:   0.0000
Max  dialogue score:   1.0000

Per-task results:
  ✓ 308738b8195d-5bd16a8893c5       score=1.00  tool_calls=5/1
  ✓ 308738b8195d-18bbc5dc131d       score=1.00  tool_calls=1/1
  ✓ 308738b8195d-05ec22ea67ac       score=1.00  tool_calls=3/1
  ✗ 308738b8195d-3e3d6db9ab3d       score=0.00  tool_calls=5/1
    turn 0 score=0.0 pred="The 9th oldest hockey player is **Cully Wilson**, and his position is **D**."
      ✗ exactmatch    score=0.0  Missing expected tool_responses. Missing: ['{\n  "positions": [\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      null\n    ],\n    [\n      "R"\n    ]...
      ✗ answer        score=0.0  The ground truth indicates the position is "r", but the prediction says the position is "d", which does not match. Therefore the prediction is incorrect.
  ✓ 308738b8195d-b11eeb38eded       score=1.00  tool_calls=1/1
  ✓ 308738b8195d-91d22b875555       score=1.00  tool_calls=1/1
  ✓ 308738b8195d-02960c8c0d16       score=1.00  tool_calls=1/1
  ✓ 308738b8195d-8b4b06bc3398       score=1.00  tool_calls=2/1
  ✗ 308738b8195d-9296bdde9ace       score=0.00  tool_calls=0/1
    turn 0 score=0.0 pred="I’ll run a series of broader `find_tools` queries to see if any hidden event‑level endpoint exists."
      ✗ exactmatch    score=0.0  Missing expected tool_responses. Missing: ['{\n  "count": 0\n}']
      ✗ answer        score=0.0  The prediction does not answer the question about the number of teams; it is unrelated text, so it receives a score of 0.
  ✗ 308738b8195d-56faa9f6bbd2       score=0.00  tool_calls=5/1
    turn 0 score=0.0 pred="One coach was recorded as having a temporary (interim) term in 2007. This count comes from the `task_2_hockey_get_count_coaches_by_year_and_notes` endpoint using the note “interim”; alternative not..."
      ✓ exactmatch    score=1.0  All expected tool_responses are present in actual.
      ✗ groundedness  score=0.0  The response mentions a specific API endpoint and maps zero counts to particular note types, neither of which are stated in the document, making those claims ungrounded.

Full judge details: /Users/harold/workspace/cuga-internal-evaluation/benchmarks/m3/results/_vakra/results.json
================================================================================

Four findings addressed in one commit: 1. (major) include_find_tools=True did not guarantee find_tools survived shortlisting when the overlay had already placed it into bound. The shortlister LLM is free to drop any tool from its ranking input. Fix: pull find_tools out of the ranking pool when present, always reserve a cap slot when include_find_tools=True, and append find_tools back after ranking. As a bonus, max_count=1 + include_find_tools=True now binds just find_tools instead of raising. 2. (minor) logger.warning("...get_apps() failed: %s", e) used printf-style formatting, which loguru does not interpolate — the actual error reason was lost. Switch to "{}". 3. (minor) shortlist_tool_names did not guard whitespace-only queries; the guard at the top only checked top_k and all_tools. A whitespace query would still hit the LLM and produce arbitrary rankings, bypassing the "no query" failure path in the caller. Add `if not query.strip(): return []`. 4. (nit) _build_shortlister_payload caught blind Exception when serializing args_schema, masking unexpected bugs. Narrow to (AttributeError, TypeError, ValueError). New regression tests: - find_tools survives shortlisting when in overlay bound (cap path) - max_count=1 + include_find_tools=True binds only find_tools

haroldship · 2026-05-12T03:58:37Z

@CodeRabbit review

coderabbitai · 2026-05-12T03:58:47Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 440-465: If ranked_names is non-empty but none match ranking_pool,
treat this as a hard failure: after building shortlisted from ranked_names (the
loop that uses by_name and seen_short), check if ranked_names and shortlisted is
empty and raise an exception (or return a clear error) before running the
padding logic (_bind_tools_pad_to_cap_from_settings) or calling
_append_find_tools; update the block that constructs shortlisted (and references
ranking_pool, ranked_names, by_name, seen_short) to perform this validation and
throw a descriptive error so we never silently proceed with padding or find-tool
append when the shortlist filtering produced no valid tools.
- Around line 350-371: The overlay-injected find_tools must be detected and
removed from the bound list regardless of include_find_tools; update
_indexed_tools_for_native_bind() / the block that sets find_tools_tool so that
you always extract candidate =
tools_context_ref.get("_lc_bind_tools_find_tools") and compute find_tools_name
and find_tools_already_in_bound unconditionally, then modify bound to strip any
tool whose name == find_tools_name unless keep_find_tools is True; after that,
build ranking_pool from the (possibly stripped) bound and only reserve/append
the find_tools_tool when keep_find_tools is True so it does not consume a capped
slot during ranking (refer to symbols: include_find_tools, find_tools_tool,
find_tools_name, find_tools_already_in_bound, keep_find_tools, bound,
ranking_pool).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a9fd89c-6a30-42d6-a4e7-c80283d67033

📥 Commits

Reviewing files that changed from the base of the PR and between acdbf18 and ded870c.

📒 Files selected for processing (3)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py
tests/unit/test_cuga_lite_bind_tools.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py

A. Overlay-injected find_tools must respect include_find_tools=False. `_indexed_tools_for_native_bind` can place find_tools into `bound` independently of config. Previously, when `include_find_tools=False`, the code never even *read* the find_tools candidate from tools_context_ref, so the overlay-injected copy slipped through — consuming a capped slot, leaking through the shortlister, and defeating the user's explicit opt-out. Fix: resolve the find_tools candidate unconditionally; if `include_find_tools=False` and it's already in `bound`, strip it before any cap math runs. B. LLM-hallucinated shortlist names must fail loudly. The existing guard catches `ranked_names == []` but not the case where the LLM returns names that don't exist in `ranking_pool` (zero matches after the dictionary lookup). Without the new check, we'd silently pad with arbitrary `ranking_pool` prefix or return just find_tools — recreating exactly the silent degradation this cap path exists to prevent. Fix: raise RuntimeError when `shortlisted` is empty after the lookup loop, with diagnostic context. Two regression tests added: - overlay-injected find_tools stripped when include_find_tools=False - shortlister returning hallucinated names raises before pad / append 16/16 tests pass.

haroldship · 2026-05-12T06:39:46Z

@CodeRabbit review

coderabbitai · 2026-05-12T06:39:52Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/unit/test_cuga_lite_bind_tools.py (1)

472-579: ⚡ Quick win

Add regression test for overlong shortlister output.

Please add one case where PromptUtils.shortlist_tool_names returns more than top_k valid names, then assert the final bound list is still capped. This guards the provider-safe cap contract against non-compliant LLM output.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_cuga_lite_bind_tools.py` around lines 472 - 579, Add a
regression test that simulates PromptUtils.shortlist_tool_names returning more
valid tool names than the requested top_k and assert the binding still respects
the cap; specifically, create an async fake_shortlist that returns a list longer
than top_k, patch "PromptUtils.shortlist_tool_names" to use it, call
resolve_model_with_bind_tools with a configured
_bind_tools_max_count_from_settings (e.g., 4) and include_find_tools as needed,
then assert model.bind_tools was called and that the bound list (from
model.bind_tools.call_args) contains at most the capped number of tools (and
reserves a slot for "find_tools" if include_find_tools=True), referencing
PromptUtils.shortlist_tool_names and resolve_model_with_bind_tools to locate the
behavior to test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 449-457: Shortlister collects tools from ranked_names into
shortlisted (using by_name, seen_short and StructuredTool) but never enforces
target_k/max_count; update the loop or post-process shortlisted to limit its
length to target_k (or max_count) before returning so it never contains more
than the requested number of tools, e.g., stop adding once shortlisted reaches
target_k or truncate shortlisted to target_k after the loop to ensure
provider-side bind limits are respected.

---

Nitpick comments:
In `@tests/unit/test_cuga_lite_bind_tools.py`:
- Around line 472-579: Add a regression test that simulates
PromptUtils.shortlist_tool_names returning more valid tool names than the
requested top_k and assert the binding still respects the cap; specifically,
create an async fake_shortlist that returns a list longer than top_k, patch
"PromptUtils.shortlist_tool_names" to use it, call resolve_model_with_bind_tools
with a configured _bind_tools_max_count_from_settings (e.g., 4) and
include_find_tools as needed, then assert model.bind_tools was called and that
the bound list (from model.bind_tools.call_args) contains at most the capped
number of tools (and reserves a slot for "find_tools" if
include_find_tools=True), referencing PromptUtils.shortlist_tool_names and
resolve_model_with_bind_tools to locate the behavior to test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3f39a6d-7c3d-4910-bd5f-0450fff229fc

📥 Commits

Reviewing files that changed from the base of the PR and between ded870c and 11feace.

📒 Files selected for processing (2)

src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
tests/unit/test_cuga_lite_bind_tools.py

) Defense-in-depth: the build-shortlisted loop in _apply_bind_tools_cap_and_merge trusted PromptUtils.shortlist_tool_names to truncate its own output at top_k. That trust is fragile — a custom shortlister, a future refactor, or a mocked path could return more names than requested, in which case shortlisted grew past target_k, the pad block was skipped (len already ≥ target_k), and the final bound list exceeded max_count, re-triggering the provider 400 the cap exists to prevent. Add an explicit `if len(shortlisted) >= target_k: break` so the call site enforces its own cap contract regardless of the shortlister's compliance. Two regression tests added: - overlong shortlister output clamped to target_k (mode=all, no find_tools) - same clamp with include_find_tools=True reserving 1 slot

sami-marreed · 2026-05-13T10:48:35Z

why tests failing here

haroldship · 2026-05-13T11:57:31Z

why tests failing here

Reran and they pass. They are unreliable as the LLM doesn't return consistent responses. All I did was rerun them. And the failure was in a totally different part of cuga (policies)

sami-marreed · 2026-05-14T09:47:50Z

Modularity (`cuga_lite_graph`)

The bind-tools cap fix looks right for strict provider limits, but cuga_lite_graph.py picks up a lot of logic (~180 lines in _apply_bind_tools_cap_and_merge plus settings helpers). Please pull this into a dedicated package so the graph file stays readable:

Placement: Under src/cuga/backend/cuga_graph/nodes/cuga_lite/ you already have subfolders (executors/, reflection/, prompts/, tests/). None of those fit “cap + shortlist + merge for bind_tools” well (executors/ is runtime/sandbox). Add a new subpackage, e.g.:

src/cuga/backend/cuga_graph/nodes/cuga_lite/bind_tools/

bind_tools/__init__.py — re-export only what cuga_lite_graph.py needs.
bind_tools/cap.py (split further if it grows) — _bind_tools_max_count_from_settings, _bind_tools_pad_to_cap_from_settings, _apply_bind_tools_cap_and_merge, and small private helpers (resolve/strip find_tools overlay, build ranking_pool, run shortlist + validate names, optional pad-to-cap).

Keep resolve_model_with_bind_tools in cuga_lite_graph.py as orchestration (call into bind_tools).

DRY: The four _apply_bind_tools_cap_and_merge(...) call sites repeat the same kwargs — factor a tiny closure or _cap_merge_bound(bound) that closes over query, tool_provider, llm, max_count, include_find_tools, tools_context_ref, mode.

prompt_utils.py already factored _build_shortlister_payload / shortlist_tool_names; matching that structure on the graph side (via bind_tools/) keeps responsibilities aligned.

Other notes (worth fixing, non-blocking)

Docs vs code: advanced_features.cuga_lite_bind_tools_pad_to_cap is in code + validators but the PR blurb mostly highlights cuga_lite_bind_tools_max_count. Document pad_to_cap (what it does, default off, env override) and add settings.toml / sample config if that's how operators discover advanced_features.*.
Operational: Over-cap paths add a shortlister LLM round-trip on bind — intentional; a one-line docstring or ops note on latency/cost helps.
Loud failures: RuntimeError when shortlisting can't run is intentional; just ensure benchmark/user docs mention it.

…203) Addresses Sami's review on #203 (modularity + DRY + docs). - New subpackage src/cuga/backend/cuga_graph/nodes/cuga_lite/bind_tools/ - cap.py: holds bind_tools_max_count_from_settings, bind_tools_pad_to_cap_from_settings, apply_bind_tools_cap_and_merge, plus small private helpers (_resolve_find_tools_overlay, _build_ranking_pool, _run_shortlister, _materialize_shortlist, _maybe_pad_to_cap). - __init__.py: re-exports only what cuga_lite_graph.py needs. - cuga_lite_graph.py: resolve_model_with_bind_tools stays as orchestration. The four mode={all,apps,tools,apps_and_tools} branches now share a local _cap_merge_bound(bound) closure (DRY ask), and import the helpers from bind_tools/. - Docstring on resolve_model_with_bind_tools now documents cuga_lite_bind_tools_pad_to_cap and adds an Operational cost paragraph (one extra LLM round-trip per call_model when cap exceeded; intentional RuntimeError when shortlisting cannot run safely). - settings.toml: documents both cuga_lite_bind_tools_max_count and cuga_lite_bind_tools_pad_to_cap inline with the other bind_tools_* knobs (env overrides + latency/RuntimeError semantics). - Tests: patch paths updated for the new module layout; pad_to_cap patches now target bind_tools.cap directly since the call site moved with the function. 18 tests pass.

haroldship · 2026-05-14T12:18:08Z

Thanks @sami-marreed — addressed all of this in c20968b (just pushed).

Modularity + DRY (your main asks)

New subpackage at the exact path you suggested:

src/cuga/backend/cuga_graph/nodes/cuga_lite/bind_tools/
├── __init__.py   # re-exports only what cuga_lite_graph.py needs
└── cap.py        # all the moved logic

cap.py holds the three module-level helpers (bind_tools_max_count_from_settings, bind_tools_pad_to_cap_from_settings, apply_bind_tools_cap_and_merge) and decomposes the previously-monolithic _apply_bind_tools_cap_and_merge into the small private helpers you outlined:

_resolve_find_tools_overlay — find/strip the overlay-injected find_tools candidate
_build_ranking_pool — pull find_tools out of the pool when we must guarantee it survives
_run_shortlister — invoke PromptUtils.shortlist_tool_names and raise on failure / empty ranking
_materialize_shortlist — map ranker output back to StructuredTool, defense-in-depth clamp to target_k, raise on hallucinated names
_maybe_pad_to_cap — opt-in padding

cuga_lite_graph.py:resolve_model_with_bind_tools now stays as orchestration. The four mode={all,apps,tools,apps_and_tools} call sites repeated the same kwargs, so I factored a local _cap_merge_bound(bound) closure (your DRY suggestion) — each branch is now one await _cap_merge_bound(bound).

Other notes — all done

Docs vs code (pad_to_cap):
- PR body now has a dedicated "New settings" subsection for cuga_lite_bind_tools_pad_to_cap (default, env override, why off by default with the m3-hockey measurement).
- src/cuga/settings.toml now documents both cuga_lite_bind_tools_max_count and cuga_lite_bind_tools_pad_to_cap inline next to the other cuga_lite_bind_tools_* knobs, including the env vars.
Operational (latency/cost):
- Added an "Operational notes" section to the PR body: when the candidate count exceeds max_count, applying the cap incurs one extra LLM round-trip (the shortlister) per call_model. Permissive backends can skip the round-trip with cuga_lite_bind_tools_max_count=0.
- Same note added to the resolve_model_with_bind_tools docstring and to cap.py's module docstring.
Loud failures (RuntimeError):
- PR body now explicitly documents the RuntimeError contract — when it fires (empty query / shortlister exception / empty ranking / hallucinated names) and how callers should react. Mentioned in both the bullet summary and "Operational notes".

Verification

pytest tests/unit/test_cuga_lite_bind_tools.py → 18 passed
ruff check + ruff format --check on all touched files → clean

(Test patch paths needed two adjustments: bind_tools_max_count_from_settings is still patchable at cuga_lite_graph.* since the call site lives there, but bind_tools_pad_to_cap_from_settings patches now target bind_tools.cap.* because that call site moved with the helper.)

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

haroldship added 2 commits May 11, 2026 16:11

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/prompt_utils.py

haroldship requested a review from sami-marreed May 11, 2026 17:22

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated

Conversation

haroldship commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New settings

advanced_features.cuga_lite_bind_tools_max_count

advanced_features.cuga_lite_bind_tools_pad_to_cap

Operational notes

Where the change lives

Backward compatibility

Test plan

Summary by CodeRabbit

bind_tools cap + pad_to_cap cross-provider verification

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

haroldship commented May 12, 2026

Uh oh!

haroldship commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

haroldship commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sami-marreed commented May 13, 2026

Uh oh!

haroldship commented May 13, 2026

Uh oh!

sami-marreed commented May 14, 2026

Modularity (cuga_lite_graph)

Other notes (worth fixing, non-blocking)

Uh oh!

haroldship commented May 14, 2026

Modularity + DRY (your main asks)

Other notes — all done

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haroldship commented May 11, 2026 •

edited

Loading

`advanced_features.cuga_lite_bind_tools_max_count`

`advanced_features.cuga_lite_bind_tools_pad_to_cap`

`bind_tools` cap + `pad_to_cap` cross-provider verification

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Modularity (`cuga_lite_graph`)