Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a new standalone Nemotron-powered fraud-detection voice agent example with a custom Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/Voice
participant STT as Deepgram STT
participant Agent as Agent Edge
participant LLM as NemotronLLM
participant Tools as Tool Functions
participant TTS as Deepgram TTS
User->>STT: voice input
STT->>Agent: transcript
Agent->>LLM: messages + tool definitions
LLM->>LLM: strip <think>, parse <tool_call> blocks
alt tool_call found
LLM->>Tools: execute tool_call (repeatable, ≤5 rounds)
Tools-->>LLM: tool result (JSON)
LLM->>LLM: append <tool_response>, re-invoke for final reply
end
LLM->>Agent: stream events (chunks, complete, timings)
Agent->>TTS: response text
TTS->>User: synthesized speech
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
plugins/openai/examples/nemotron_example/nemotron_example.py (2)
76-96: Remove embedded dead code from prompt string.Within this prompt, a ghost stirs—an entire duplicate of
INSTRUCTIONSlies entombed in triple quotes, serving no purpose but confusion. This vestigial fragment should be excised.🧹 Proposed fix
SUMMARIZE_PROMPT = ( "Rewrite the following as something a real person would say on the phone — casual, warm, natural. " "Two to three short sentences, around 25-35 words. No jargon, no IDs, no markdown, no em-dashes, no bullet points.\n\n" "Keep all key info about actions taken (frozen card, cancelled charge, replacement card, refund timeline). " "Example tone: 'Okay so I've frozen your card and cancelled that charge. The money should be back in a couple of days. Want me to send you a new card?'\n\n" "Do not use any special characters that only make sense in writing like brackets.\n" - - "here's the prompt i gave to the llm that generates the text you review:\n\n" - -""" INSTRUCTIONS = ( - "detailed thinking off\n\n" - "You are a bank fraud phone agent on a live phone call. Your output goes directly to TTS.\n\n" - "When you spot a suspicious transaction, explain WHY — e.g. 'There's a large charge in Miami but you live in London.' " - "Suspicious means: different city from the customer's home, or unusually large compared to their other transactions.\n" - "If you flag a transaction, the card should be frozen immediately to prevent further fraud, but only after you have explicitly confirmed with the customer that they did not make the transaction.\n\n" - "If you freeze a card due to a transaction, you should also offer to issue a replacement card. If the customer confirms, issue a virtual card immediately and a physical card that arrives in 3-5 business days.\n\n" - "If a transaction is fraudulent, you should also cancel the charge to return the money to the customer's account. This typically takes 24-48 hours.\n\n" - "The user cannot see what you can see. They are on the phone." - "RULES:\n" - "- MAXIMUM 15 words per response. One short sentence. Responses over 15 words get cut off.\n" - "- NEVER take action (flag, freeze, cancel) without the customer explicitly confirming.\n" - "- After taking any action, always tell the customer what you did.\n" - "- Work one step at a time: look up info, briefly tell the customer what you found, ask what to do.\n" - "- Do NOT list data, read IDs, dates, or dollar amounts aloud.\n" - "- Speak casually like a real person. No markdown, no bullet points.\n" -) -""" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 76 - 96, The INSTRUCTIONS prompt string contains an extraneous duplicated block wrapped in triple quotes (a dead-code fragment) that should be removed; locate the INSTRUCTIONS variable in nemotron_example.py and delete the entire triple-quoted duplicate so only the intended INSTRUCTIONS string literal remains, ensuring quotes and parentheses around INSTRUCTIONS remain balanced and no other prompt text is altered.
207-211: Replaceprint()statements with logger calls.These debug artifacts, raw and unfiltered, should not persist. Use
logger.debug()for development tracing—consistent with the module's logging pattern.♻️ Proposed fix
- print(f">>> THINK TAG FOUND at pos {think_end} / {len(full_text)} chars") + logger.debug("THINK TAG FOUND at pos %d / %d chars", think_end, len(full_text)) full_text = full_text[think_end + len(THINK_END_TAG) :].lstrip() - print(f">>> AFTER STRIP: {full_text[:200]}") + logger.debug("AFTER STRIP: %s", full_text[:200]) else: - print(f">>> NO THINK TAG. Raw: {full_text[:300]}") + logger.debug("NO THINK TAG. Raw: %s", full_text[:300])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 207 - 211, Replace the debug print statements that expose raw text with logger.debug calls: change the three print(...) occurrences around THINK_END_TAG handling (the messages showing think_end position, AFTER STRIP preview, and NO THINK TAG raw preview) to logger.debug(...) using the same formatted strings; ensure the module-level logger (e.g., logger = logging.getLogger(__name__)) is used or imported so the logging calls follow the module's logging pattern and do not leak raw debug output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/openai/examples/nemotron_example/nemotron_example.py`:
- Around line 428-436: The call to agent.create_call in join_call has its
arguments reversed; Edge.create_call expects the call_id as the first positional
argument and call_type as a keyword. Fix join_call by calling agent.create_call
with call_id first and pass call_type as a keyword (e.g.,
agent.create_call(call_id, call_type=call_type, **kwargs)) so the values map
correctly to Edge.create_call's signature; update the join_call function
accordingly.
- Line 322: The line TOOL_CALLS_LOG.write_text("") in create_agent() truncates
the persistent tool-call log on every agent creation; change this to either
append new entries (open and append or use write_text(existing + new) behavior)
or remove the truncation entirely and instead create a timestamped/log-rotated
filename when initializing TOOL_CALLS_LOG so previous records are preserved;
update the code paths that write to TOOL_CALLS_LOG to use append semantics or
the new timestamped file creation logic and ensure create_agent() no longer
clears the file.
- Around line 173-182: The try/except in the Nemotron call is too broad and
skips emitting observability events; replace the bare except with specific
exceptions raised by self._client.chat.completions.create (e.g.,
HTTPError/Timeout/Error types from the Nemotron client or underlying http
library) and handle them explicitly, emit the LLMRequestStartedEvent before the
call and emit an LLMErrorEvent with the exception details on failure, log the
error with logger.exception including the exception, and ensure you still return
LLMResponseEvent(original=None, text="") after emitting the error; keep the
successful path calling _process_streaming_response.
---
Nitpick comments:
In `@plugins/openai/examples/nemotron_example/nemotron_example.py`:
- Around line 76-96: The INSTRUCTIONS prompt string contains an extraneous
duplicated block wrapped in triple quotes (a dead-code fragment) that should be
removed; locate the INSTRUCTIONS variable in nemotron_example.py and delete the
entire triple-quoted duplicate so only the intended INSTRUCTIONS string literal
remains, ensuring quotes and parentheses around INSTRUCTIONS remain balanced and
no other prompt text is altered.
- Around line 207-211: Replace the debug print statements that expose raw text
with logger.debug calls: change the three print(...) occurrences around
THINK_END_TAG handling (the messages showing think_end position, AFTER STRIP
preview, and NO THINK TAG raw preview) to logger.debug(...) using the same
formatted strings; ensure the module-level logger (e.g., logger =
logging.getLogger(__name__)) is used or imported so the logging calls follow the
module's logging pattern and do not leak raw debug output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d0f569cb-351d-4639-b03f-61534ff5bc87
📒 Files selected for processing (3)
plugins/openai/examples/nemotron_example/__init__.pyplugins/openai/examples/nemotron_example/nemotron_example.pyplugins/openai/examples/nemotron_example/pyproject.toml
7eb2f02 to
bff56ee
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
plugins/openai/examples/nemotron_example/nemotron_example.py (2)
464-472:⚠️ Potential issue | 🔴 CriticalArguments reversed in
create_call()— the call will drown in confusion.Per
Edge.create_call()signature, it expects(call_id: str, **kwargs)withcall_typeas a keyword argument. The current invocation passes them positionally in reverse order.🐛 Proposed fix
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: """Join the call and start the agent.""" - call = await agent.create_call(call_type, call_id) + call = await agent.create_call(call_id, call_type=call_type) logger.info("Starting Fraud Detection Agent...")#!/bin/bash # Verify the Edge.create_call signature to confirm argument order ast-grep --pattern $'async def create_call(self, call_id: str, $$$)'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 464 - 472, The call to agent.create_call in join_call is passing arguments in the wrong order; Edge.create_call expects call_id first and call_type as a keyword. Update the join_call implementation to call agent.create_call(call_id, call_type=call_type) (or pass call_type in kwargs) so the positional call_id and keyword call_type match the Edge.create_call signature; modify the invocation inside the async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) to use agent.create_call(call_id, call_type=call_type).
150-180:⚠️ Potential issue | 🟠 MajorMissing observability events — the request begins in silence, errors vanish untracked.
The base class emits
LLMRequestStartedEventbefore the API call andLLMErrorEventon failure. This override emits neither, breaking observability for downstream consumers tracking request lifecycle and error rates.🔧 Proposed fix
+from vision_agents.core.llm import events +from vision_agents.core.llm.events import LLMRequestStartedEvent + ... + # Emit request started event + self.events.send( + LLMRequestStartedEvent( + plugin_name=PLUGIN_NAME, + model=request_kwargs["model"], + streaming=stream, + ) + ) + request_start_time = time.perf_counter() try: response = await self._client.chat.completions.create(**request_kwargs) except (APIConnectionError, APIStatusError) as e: logger.exception("Failed to get a response from Nemotron") + self.events.send( + events.LLMErrorEvent( + plugin_name=PLUGIN_NAME, + error_message=str(e), + event_data=e, + ) + ) return LLMResponseEvent(original=None, text="")See
chat_completions_llm.py:162-189for the expected event emission pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 150 - 180, This override of _create_response_internal suppresses the base-class observability events; before calling self._client.chat.completions.create(...) emit an LLMRequestStartedEvent (including model, messages, tools, and max_tokens) as the base does, and in the except block catch (APIConnectionError, APIStatusError) create and emit an LLMErrorEvent containing the exception and request metadata (then log the exception and return LLMResponseEvent(original=None, text="") as currently done); ensure emitted events use the same fields/shape as the base implementation and that _process_streaming_response is still called on success.
🧹 Nitpick comments (3)
plugins/openai/examples/nemotron_example/README.md (1)
19-25: Add a language specifier to the fenced code block.The
.envexample block lacks a language identifier. Usedotenvorshellfor syntax highlighting consistency.✏️ Suggested fix
-``` +```dotenv BASETEN_API_KEY=... STREAM_API_KEY=... STREAM_API_SECRET=... DEEPGRAM_API_KEY=... OPENAI_API_KEY=...</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@plugins/openai/examples/nemotron_example/README.mdaround lines 19 - 25,
Update the fenced code block that lists BASETEN_API_KEY, STREAM_API_KEY,
STREAM_API_SECRET, DEEPGRAM_API_KEY, and OPENAI_API_KEY to include a language
specifier (e.g.,dotenvorshell) after the opening triple backticks so the
.env example gets proper syntax highlighting; edit the README's fenced block
containing those environment variables and add the language identifier (for
example changetodotenv).</details> </blockquote></details> <details> <summary>plugins/openai/examples/nemotron_example/nemotron_example.py (2)</summary><blockquote> `77-89`: **Module-level client instantiation may fail at import time.** The `_summarizer` client is created unconditionally when the module loads. If `OPENAI_API_KEY` is absent, this will raise before the script reaches `main`. For an example script this is tolerable, but consider lazy initialization if this pattern spreads. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 77 - 89, The module-level AsyncOpenAI instantiation stored in _summarizer can raise at import time if OPENAI_API_KEY is missing; change to lazy initialization by moving creation into a helper (e.g., _get_summarizer) or inside _summarize_for_speech so AsyncOpenAI is instantiated on first use and cached (use a module-level None check for _summarizer then instantiate), and update _summarize_for_speech to call _get_summarizer() (or create if None) before using chat.completions.create; reference symbols: _summarizer, _summarize_for_speech, AsyncOpenAI. ``` </details> --- `182-261`: **Streaming semantics altered — chunks accumulate in darkness before release.** The base class emits `LLMResponseChunkEvent` incrementally as tokens arrive; this override collects everything, strips thinking, then emits one large chunk. This is intentional for Nemotron's XML parsing, but downstream consumers expecting true streaming for low-latency TTS may experience increased time-to-first-audio. Consider documenting this behavior in the docstring. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/nemotron_example.py` around lines 182 - 261, The _process_streaming_response method deviates from the base class streaming semantics by buffering all incoming chunks, stripping THINK/tool XML and markdown, then emitting a single aggregated LLMResponseChunkEvent and completion event (affecting low-latency TTS consumers); update the method docstring for _process_streaming_response to clearly state this behavior (that it accumulates chunks, strips THINK tags and tool XML, enforces a max tool recursion via _tool_depth, and emits only a final chunk), note the impact on time-to-first-audio/time-to-first-token for downstream TTS consumers, and optionally mention how to change behavior (e.g., make streaming vs buffered behavior configurable) so maintainers and integrators can find and adjust it. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In@plugins/openai/examples/nemotron_example/nemotron_example.py:
- Around line 464-472: The call to agent.create_call in join_call is passing
arguments in the wrong order; Edge.create_call expects call_id first and
call_type as a keyword. Update the join_call implementation to call
agent.create_call(call_id, call_type=call_type) (or pass call_type in kwargs) so
the positional call_id and keyword call_type match the Edge.create_call
signature; modify the invocation inside the async def join_call(agent: Agent,
call_type: str, call_id: str, **kwargs) to use agent.create_call(call_id,
call_type=call_type).- Around line 150-180: This override of _create_response_internal suppresses the
base-class observability events; before calling
self._client.chat.completions.create(...) emit an LLMRequestStartedEvent
(including model, messages, tools, and max_tokens) as the base does, and in the
except block catch (APIConnectionError, APIStatusError) create and emit an
LLMErrorEvent containing the exception and request metadata (then log the
exception and return LLMResponseEvent(original=None, text="") as currently
done); ensure emitted events use the same fields/shape as the base
implementation and that _process_streaming_response is still called on success.
Nitpick comments:
In@plugins/openai/examples/nemotron_example/nemotron_example.py:
- Around line 77-89: The module-level AsyncOpenAI instantiation stored in
_summarizer can raise at import time if OPENAI_API_KEY is missing; change to
lazy initialization by moving creation into a helper (e.g., _get_summarizer) or
inside _summarize_for_speech so AsyncOpenAI is instantiated on first use and
cached (use a module-level None check for _summarizer then instantiate), and
update _summarize_for_speech to call _get_summarizer() (or create if None)
before using chat.completions.create; reference symbols: _summarizer,
_summarize_for_speech, AsyncOpenAI.- Around line 182-261: The _process_streaming_response method deviates from the
base class streaming semantics by buffering all incoming chunks, stripping
THINK/tool XML and markdown, then emitting a single aggregated
LLMResponseChunkEvent and completion event (affecting low-latency TTS
consumers); update the method docstring for _process_streaming_response to
clearly state this behavior (that it accumulates chunks, strips THINK tags and
tool XML, enforces a max tool recursion via _tool_depth, and emits only a final
chunk), note the impact on time-to-first-audio/time-to-first-token for
downstream TTS consumers, and optionally mention how to change behavior (e.g.,
make streaming vs buffered behavior configurable) so maintainers and integrators
can find and adjust it.In
@plugins/openai/examples/nemotron_example/README.md:
- Around line 19-25: Update the fenced code block that lists BASETEN_API_KEY,
STREAM_API_KEY, STREAM_API_SECRET, DEEPGRAM_API_KEY, and OPENAI_API_KEY to
include a language specifier (e.g.,dotenvorshell) after the opening
triple backticks so the .env example gets proper syntax highlighting; edit the
README's fenced block containing those environment variables and add the
language identifier (for example changetodotenv).</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **Run ID**: `d39908d5-2392-4bed-8b57-648e0eff8401` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 7eb2f029cb6a0811113ebed90674d7f017816c38 and bff56ee39154d72dd64fc1cd82325b5d6c523d12. </details> <details> <summary>📒 Files selected for processing (5)</summary> * `README.md` * `plugins/openai/examples/nemotron_example/README.md` * `plugins/openai/examples/nemotron_example/__init__.py` * `plugins/openai/examples/nemotron_example/nemotron_example.py` * `plugins/openai/examples/nemotron_example/pyproject.toml` </details> <details> <summary>✅ Files skipped from review due to trivial changes (2)</summary> * README.md * plugins/openai/examples/nemotron_example/pyproject.toml </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Added a link to demo and clarified function calls in README.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/openai/examples/nemotron_example/README.md (1)
9-9: Clarify tool-call behavior to avoid contradictory expectations.Line 9 currently reads as both “event-only” and “agent executes actions,” which is confusing for readers trying to understand side effects.
Suggested wording
-It uses function calls that send events instead of actually blocking a card etc. The agent is empowered to do this itself. +It uses function calls that emit events (for demo purposes) rather than directly executing irreversible account actions like card blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/openai/examples/nemotron_example/README.md` at line 9, Update the README wording to remove the contradictory statement that tools both "send events" and "execute actions"; explicitly state that the example's function calls are event-only (they emit events rather than performing side effects) and that the agent itself is responsible for carrying out any blocking UI updates or card actions. Mention the terms "function calls", "event-only", and "agent executes actions" so readers understand that tool-call invocations generate events to be handled by the agent/runtime and do not directly mutate UI state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/openai/examples/nemotron_example/README.md`:
- Around line 23-29: The fenced code block containing environment variables is
missing a language tag (markdownlint MD040); update the triple-backtick fence
that surrounds the
BASETEN_API_KEY/STREAM_API_KEY/DEEPGRAM_API_KEY/OPENAI_API_KEY block to include
a language identifier such as dotenv (i.e., change ``` to ```dotenv) so the
block is properly annotated in README.md.
---
Nitpick comments:
In `@plugins/openai/examples/nemotron_example/README.md`:
- Line 9: Update the README wording to remove the contradictory statement that
tools both "send events" and "execute actions"; explicitly state that the
example's function calls are event-only (they emit events rather than performing
side effects) and that the agent itself is responsible for carrying out any
blocking UI updates or card actions. Mention the terms "function calls",
"event-only", and "agent executes actions" so readers understand that tool-call
invocations generate events to be handled by the agent/runtime and do not
directly mutate UI state.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a529ea9-8ba7-48ee-834a-62c0d350e03e
📒 Files selected for processing (1)
plugins/openai/examples/nemotron_example/README.md
This pull request adds a new
pyproject.tomlconfiguration file for thenemotron-exampleproject. The file sets up the basic project metadata and specifies dependencies required for running a fraud detection demo using NVIDIA Nemotron with Vision Agents.Project setup and dependency management:
pyproject.tomlfile for thenemotron-exampleproject, including project metadata, Python version requirement, and dependencies such asvision-agentsand related plugins.Summary by CodeRabbit
New Features
Documentation
Chores