feat: semantic memory (user facts) for cuga-lite#180
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds evolve-backed async user-facts persistence/retrieval and preference/guideline formatting; introduces recursive checkpoint-safe normalization for complex types (NumPy, dataclasses, Pydantic-like, datetime, Enum, Path, bytes) in Cuga Lite and tool-call tracker; sets AgentState.user_id default to None; updates evolve dependency and related tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as CugaLiteGraph
participant Evolve as EvolveIntegration
participant Serializer as JsonPlusSerializer
Client->>Client: extract first HumanMessage (user_id, message)
par fire-and-forget
Client->>Evolve: schedule store_user_facts(user_id, message) -- non-blocking
and await retrieve
Client->>Evolve: await retrieve_user_facts(user_id, query)
activate Evolve
Evolve->>Evolve: is_enabled? validate inputs & call tool
Evolve-->>Client: dict(categories) or None
deactivate Evolve
end
alt categories present
Client->>Client: format preferences section
Client->>Client: inject "## Evolve User Preference" into prompt
end
Client->>Client: _sanitize_cuga_lite_update(state) (normalize complex types)
Client->>Serializer: dumps_typed(normalized_state)
Serializer-->>Client: JSON-serializable payload
Client->>Client: checkpoint write / return payload
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
2affb72 to
8b6f1a6
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1152-1168: The current code schedules
EvolveIntegration.store_user_facts in background (_store_task via
asyncio.create_task) then immediately awaits
EvolveIntegration.retrieve_user_facts with the same memory_query, causing a race
where the just-stored facts may or may not be visible; to make ordering
deterministic, await the store call before calling retrieve (i.e., replace the
background scheduling of EvolveIntegration.store_user_facts with an await and
remove or adapt the _store_task/_log_store_error usage) so that store_user_facts
completes before retrieve_user_facts is invoked.
- Around line 1128-1130: Stop logging the full prompt contents: find the
logger.debug calls that reference special_instructions_final (the debug lines in
the cuga_lite_graph code that print "Evolve: Full special_instructions...") and
replace them so they do not emit the entire special_instructions_final; instead
log safe metadata only (e.g., a redacted snippet, its length, and/or a
hash/checksum). Update both debug occurrences that reference
special_instructions_final so logs contain no sensitive prompt text but still
provide useful diagnostics.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 116-123: The test’s immutability check uses a shallow copy
(`before = list(state.chat_messages)`), which won’t detect in-place mutations of
message objects; replace the shallow snapshot with a deep snapshot (e.g.,
capture a deep copy of state.chat_messages using copy.deepcopy or equivalent)
before calling the helpers (_get_first_human_message_content,
_get_latest_memory_query, _format_evolve_user_preference,
_format_evolve_guidelines) and then compare the deep-copied snapshot to
state.chat_messages to ensure in-place mutations are detected.
In `@src/cuga/backend/cuga_graph/state/agent_state.py`:
- Line 924: The user_id field in AgentState is currently defaulting to
"default_user", causing unrelated sessions to share durable memory; change the
default for user_id in AgentState (user_id: Optional[str]) to None (or empty
string ""), and make the identical change in CugaLiteState.user_id so the
existing skip-on-missing-user logic triggers correctly; locate the user_id
declarations in agent_state.py (AgentState) and the CugaLiteState class and
update their defaults and any type hints/comments accordingly, and run tests to
ensure memory-keying and skip logic behave as intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c0f52109-e11a-4bf6-a3d5-5e4f0203eeac
📒 Files selected for processing (9)
.gitignoresrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/evolve/integration.pysrc/cuga/backend/evolve/tests/test_integration.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py (1)
116-123:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse a deep snapshot for the immutability assertion.
Line 116 uses a shallow copy, so in-place mutation of message objects could slip through while Line 123 still passes.
Suggested fix
+import copy import numpy as np @@ - before = list(state.chat_messages) + before = copy.deepcopy(state.chat_messages)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py` around lines 116 - 123, The test currently takes a shallow copy of state.chat_messages into before so in-place mutations of message objects can go undetected; replace the shallow copy with a deep snapshot (e.g., deepcopy or serializing/deserializing) of state.chat_messages before calling _get_first_human_message_content, _get_latest_memory_query, _format_evolve_user_preference, and _format_evolve_guidelines, then assert the deep-copied snapshot equals state.chat_messages to ensure true immutability of message objects.src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
1414-1416:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop logging the full injected prompt content.
Line 1415 and Line 1465 dump
special_instructions_final, which can include user preferences/profile context and other sensitive prompt content.Suggested fix
- logger.debug( - f"Evolve: Full special_instructions with guidelines:\n{special_instructions_final}" - ) + logger.debug( + "Evolve: Injected guidelines section (chars={})", + len(evolve_section), + ) @@ - logger.debug( - f"Evolve: Full special_instructions with preferences:\n{special_instructions_final}" - ) + logger.debug( + "Evolve: Injected user preference section (chars={})", + len(preference_section), + )Also applies to: 1464-1466
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 1414 - 1416, The code is logging full sensitive prompt content by dumping special_instructions_final via logger.debug; remove or redact the raw special_instructions_final output in the logger calls (the logger.debug calls that print "Evolve: Full special_instructions with guidelines" around special_instructions_final) and instead log a non-sensitive summary such as existence, length, SHA256/hash, or a masked snippet, or simply log that special_instructions_final was set without printing its contents; update all occurrences (including the other call at lines around 1464–1466 that also prints special_instructions_final) to use the redaction/summary approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 86-90: The dict-key normalization in _normalize_tool_payload can
turn tuple keys into unhashable lists causing record_call to raise; modify
_normalize_tool_payload so when normalizing dict keys (in the dict branch) you
preserve hashability: compute normalized_key = _normalize_tool_payload(key) and
if the original key was a tuple or the normalized_key is a list/frozenset-like,
convert it back to an immutable/hashable type (e.g., tuple) before using it as a
dict key, while still normalizing the value via _normalize_tool_payload(item);
ensure this preserves tuples as keys for record_call to work correctly.
In `@src/cuga/backend/evolve/integration.py`:
- Around line 78-80: The log in the Evolve success path currently emits raw
user_id (logger.info f"...{user_id}..."), which leaks PII; replace that by
logging an anonymized identifier instead (e.g., compute a deterministic hash or
token from user_id and log only a short prefix) so you still can correlate
events without storing the raw ID; update the call site that builds the message
(the logger.info line that references user_id and result.get) to use the
anonymized value (e.g., user_hash or anonymized_user_id) and ensure the
hashing/obfuscation is done via a stable function imported/defined beside this
code.
---
Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1414-1416: The code is logging full sensitive prompt content by
dumping special_instructions_final via logger.debug; remove or redact the raw
special_instructions_final output in the logger calls (the logger.debug calls
that print "Evolve: Full special_instructions with guidelines" around
special_instructions_final) and instead log a non-sensitive summary such as
existence, length, SHA256/hash, or a masked snippet, or simply log that
special_instructions_final was set without printing its contents; update all
occurrences (including the other call at lines around 1464–1466 that also prints
special_instructions_final) to use the redaction/summary approach.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 116-123: The test currently takes a shallow copy of
state.chat_messages into before so in-place mutations of message objects can go
undetected; replace the shallow copy with a deep snapshot (e.g., deepcopy or
serializing/deserializing) of state.chat_messages before calling
_get_first_human_message_content, _get_latest_memory_query,
_format_evolve_user_preference, and _format_evolve_guidelines, then assert the
deep-copied snapshot equals state.chat_messages to ensure true immutability of
message objects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0ac16a7-83cb-4708-b4bb-997bc2775d8f
📒 Files selected for processing (9)
.gitignoresrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/evolve/integration.pysrc/cuga/backend/evolve/tests/test_integration.py
✅ Files skipped from review due to trivial changes (2)
- .gitignore
- src/cuga/backend/cuga_graph/state/agent_state.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (2)
336-344: 💤 Low valueConsider logging conversion failures for debugging.
The
try-except-passblocks catch blind exceptions and silently continue, which can make debugging difficult when unexpected conversion failures occur. While the fallback behavior is intentional, a debug-level log would help diagnose issues without affecting the control flow.Suggested improvement
if value_module.startswith("numpy"): if hasattr(value, "item") and callable(value.item): try: return _normalize_checkpoint_value(value.item()) - except Exception: - pass + except Exception as e: + logger.debug("numpy .item() conversion failed: %s", e) if hasattr(value, "tolist") and callable(value.tolist): try: return _normalize_checkpoint_value(value.tolist()) - except Exception: - pass + except Exception as e: + logger.debug("numpy .tolist() conversion failed: %s", e)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 336 - 344, The two silent except blocks around converting values using value.item() and value.tolist() should log the caught exception at debug level before continuing; update the try/except in the _normalize_checkpoint_value conversion code (the blocks that call value.item() and value.tolist()) to catch Exception as e and call the module or class logger (e.g., logger.debug or self.logger.debug) with a short message including which conversion failed and the exception (e.g., "normalize checkpoint: value.item() failed: %s", e) so behavior remains unchanged but failures are discoverable.
1263-1271: 💤 Low valueConsider logging the agent_id fallback failure.
The broad exception catch on Line 1270 silently swallows errors when accessing
app_state.agent_id. A debug log would help diagnose configuration issues without affecting the fallback behavior.Suggested improvement
try: from cuga.backend.server.main import app as _app _app_state = getattr(_app.state, "app_state", None) current_agent_id = str(getattr(_app_state, "agent_id", "") or "").strip() - except Exception: + except Exception as e: + logger.debug("Could not resolve agent_id from app_state: %s", e) current_agent_id = ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 1263 - 1271, The except block that falls back when reading agent_id from configurable/_app_state in cuga_lite_graph.py silently swallows errors; update the except Exception handler around the import and getattr block (where current_agent_id, _app, and _app_state are used) to log the caught exception at debug level (including context like "failed to read app_state.agent_id" and the exception info) before setting current_agent_id = ""; use the module or class logger already in scope (or create one if missing) so failures to access app_state.agent_id are recorded without changing the fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 336-344: The two silent except blocks around converting values
using value.item() and value.tolist() should log the caught exception at debug
level before continuing; update the try/except in the
_normalize_checkpoint_value conversion code (the blocks that call value.item()
and value.tolist()) to catch Exception as e and call the module or class logger
(e.g., logger.debug or self.logger.debug) with a short message including which
conversion failed and the exception (e.g., "normalize checkpoint: value.item()
failed: %s", e) so behavior remains unchanged but failures are discoverable.
- Around line 1263-1271: The except block that falls back when reading agent_id
from configurable/_app_state in cuga_lite_graph.py silently swallows errors;
update the except Exception handler around the import and getattr block (where
current_agent_id, _app, and _app_state are used) to log the caught exception at
debug level (including context like "failed to read app_state.agent_id" and the
exception info) before setting current_agent_id = ""; use the module or class
logger already in scope (or create one if missing) so failures to access
app_state.agent_id are recorded without changing the fallback behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 07aa22d6-0cbc-49d3-8e37-f8d8c377109d
📒 Files selected for processing (3)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.pysrc/cuga/backend/evolve/formatting.py
- Add user_id field to CugaLiteState so evolve memory calls are invoked - Align AgentState.user_id default to "default_user" matching DEFAULT_USER_ID - Change store_user_facts to asyncio.create_task (non-blocking) - Add debug log for full special_instructions after preference injection - Add integration tests for the full store/retrieve/prompt injection flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracts 7 evolve-related formatting helpers from cuga_lite_graph.py into src/cuga/backend/evolve/formatting.py for better cohesion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrades evolve SDK to 1.1.0 which adds user facts MCP tools, multi-user/namespace support, and SSE transport hardening. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ce741a5 to
206fc4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py (1)
118-125:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse a deep snapshot for the immutability assertion.
before = list(state.chat_messages)only clones the list container, so an in-place mutation of a message object would still pass this test.Suggested fix
+import copy import numpy as np from langchain_core.messages import HumanMessage, AIMessage from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer @@ - before = list(state.chat_messages) + before = copy.deepcopy(state.chat_messages)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py` around lines 118 - 125, The test currently captures only a shallow copy with before = list(state.chat_messages), which will miss in-place mutations of message objects; replace this with a deep snapshot of state.chat_messages (for example using copy.deepcopy or serializing each message) before calling get_first_human_message_content, get_latest_memory_query, format_evolve_user_preference, and format_evolve_guidelines, then assert the deep-copied snapshot equals the post-call state to ensure true immutability of the message objects referenced by state.chat_messages.src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py (1)
86-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve hashable dict keys during payload normalization.
A tuple key will flow through the iterable branch and become a list here, so rebuilding the dict can raise
TypeError. Sincerecord_call()runs from the decorator'sfinally, that can turn an otherwise successful tool call into a failure.Suggested fix
- if isinstance(value, dict): - return {_normalize_tool_payload(key): _normalize_tool_payload(item) for key, item in value.items()} - - if isinstance(value, (list, tuple, set, frozenset)): - return [_normalize_tool_payload(item) for item in value] + if isinstance(value, dict): + normalized: dict[Any, Any] = {} + for key, item in value.items(): + normalized_key = _normalize_tool_payload(key) + if not isinstance(normalized_key, (str, int, float, bool, type(None), tuple)): + normalized_key = str(normalized_key) + normalized[normalized_key] = _normalize_tool_payload(item) + return normalized + + if isinstance(value, list): + return [_normalize_tool_payload(item) for item in value] + + if isinstance(value, tuple): + return tuple(_normalize_tool_payload(item) for item in value) + + if isinstance(value, (set, frozenset)): + return [_normalize_tool_payload(item) for item in value]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py` around lines 86 - 90, The dict branch in _normalize_tool_payload currently normalizes both keys and values which can convert hashable keys (e.g., tuple) into unhashable types and break dict reconstruction inside record_call; change the dict comprehension in _normalize_tool_payload to preserve original dict keys (do not run _normalize_tool_payload on the key) and only normalize the dict values (e.g., return {key: _normalize_tool_payload(item) for key, item in value.items()}) so tuple keys remain hashable and record_call() won't fail.src/cuga/backend/evolve/integration.py (1)
78-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop logging raw
user_idin the success path.This keeps a stable per-user identifier in INFO logs for every memory write. Dropping or anonymizing it preserves the useful signal (
stored_count) without retaining user-linked data in centralized logs.Suggested fix
- logger.info( - f"Evolve: Stored user facts for {user_id} ({result.get('stored_count', 0)} updates)" - ) + logger.info( + "Evolve: Stored user facts (stored_count={})", + result.get("stored_count", 0), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/evolve/integration.py` around lines 78 - 80, The INFO log in integration.py currently prints the raw user_id (logger.info f"Evolve: Stored user facts for {user_id} ..."), which must be removed or anonymized; update the logger.info call in the function/module that handles the memory write (referencing user_id, result, and logger.info) so it no longer emits the raw user_id — either drop the user_id entirely and only log the stored_count (result.get('stored_count', 0)), or replace user_id with a stable anonymized token (e.g., a short deterministic hash of user_id) before including it in the message so logs retain a stable per-user signal without exposing raw identifiers.src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
1256-1258:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid dumping the fully injected prompt to debug logs.
Both debug lines emit
special_instructions_final, which now includes retrieved preference memory and other prompt-only context. That creates an unnecessary sensitive-data copy in logs on every turn.Also applies to: 1306-1308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 1256 - 1258, The debug statements currently logging special_instructions_final expose sensitive prompt-only context (including retrieved preference memory); change the logger.debug calls that reference special_instructions_final (the two occurrences around the Evolve block and the other at lines ~1306-1308) to log a redacted/summary version instead — e.g., log only a sanitized summary, length, or hash of special_instructions_final (create or call a small helper like redact_or_hash_prompt(special_instructions_final) or sanitize_prompt) so the code still provides traceability without dumping the full prompt contents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 328-393: The helper _normalize_checkpoint_value currently returns
unknown objects unchanged which lets non-serializable SDK/ORM/tool objects
bypass sanitization for variables_storage, cuga_lite_metadata, and tool_calls;
change the final fallback so unknown objects are converted into a safe
serializable representation (e.g., try to convert via vars(value) or
getattr(value, "__dict__", None) and normalize that, otherwise return
repr(value) or str(value)) so every branch yields JSON/msgpack-friendly
primitives/containers rather than raw arbitrary objects.
- Around line 1260-1296: The stream endpoint extracts user_id (variable user_id
in the stream handler) but never assigns it onto the local_state, causing
per-request state to default to DEFAULT_USER_ID; after each local_state creation
or reload (where local_state is instantiated/loaded in the stream handler and
where state is reloaded inside the stream loop), set local_state.user_id =
user_id immediately after the local_state is created/loaded so subsequent code
(e.g., EvolveIntegration.store_user_facts / retrieve_user_facts and any usage of
state.user_id) sees the authenticated user; update all creation paths and reload
paths where local_state is constructed (the initial local_state instantiation
sites and the reload branches referenced in the comment) to assign the user_id
to local_state.
---
Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1256-1258: The debug statements currently logging
special_instructions_final expose sensitive prompt-only context (including
retrieved preference memory); change the logger.debug calls that reference
special_instructions_final (the two occurrences around the Evolve block and the
other at lines ~1306-1308) to log a redacted/summary version instead — e.g., log
only a sanitized summary, length, or hash of special_instructions_final (create
or call a small helper like redact_or_hash_prompt(special_instructions_final) or
sanitize_prompt) so the code still provides traceability without dumping the
full prompt contents.
In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 118-125: The test currently captures only a shallow copy with
before = list(state.chat_messages), which will miss in-place mutations of
message objects; replace this with a deep snapshot of state.chat_messages (for
example using copy.deepcopy or serializing each message) before calling
get_first_human_message_content, get_latest_memory_query,
format_evolve_user_preference, and format_evolve_guidelines, then assert the
deep-copied snapshot equals the post-call state to ensure true immutability of
the message objects referenced by state.chat_messages.
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 86-90: The dict branch in _normalize_tool_payload currently
normalizes both keys and values which can convert hashable keys (e.g., tuple)
into unhashable types and break dict reconstruction inside record_call; change
the dict comprehension in _normalize_tool_payload to preserve original dict keys
(do not run _normalize_tool_payload on the key) and only normalize the dict
values (e.g., return {key: _normalize_tool_payload(item) for key, item in
value.items()}) so tuple keys remain hashable and record_call() won't fail.
In `@src/cuga/backend/evolve/integration.py`:
- Around line 78-80: The INFO log in integration.py currently prints the raw
user_id (logger.info f"Evolve: Stored user facts for {user_id} ..."), which must
be removed or anonymized; update the logger.info call in the function/module
that handles the memory write (referencing user_id, result, and logger.info) so
it no longer emits the raw user_id — either drop the user_id entirely and only
log the stored_count (result.get('stored_count', 0)), or replace user_id with a
stable anonymized token (e.g., a short deterministic hash of user_id) before
including it in the message so logs retain a stable per-user signal without
exposing raw identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 156cec38-18fe-4edb-b5df-d46c8d8175f0
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
.gitignorepyproject.tomlsrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/evolve/formatting.pysrc/cuga/backend/evolve/integration.pysrc/cuga/backend/evolve/tests/test_integration.py
✅ Files skipped from review due to trivial changes (3)
- src/cuga/backend/cuga_graph/state/agent_state.py
- pyproject.toml
- src/cuga/backend/evolve/formatting.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .gitignore
- src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py
- tool_call_tracker: keep dict keys hashable (tuple keys stay tuples) - integration.py: remove raw user_id from info logs (PII) - agent_state/CugaLiteState: default user_id to None instead of "default_user" to prevent cross-user memory bleed - main.py: wire authenticated user_id onto local_state - cuga_lite_graph: reorder retrieve before fire-and-forget store for deterministic ordering; redact full prompt from debug logs - _normalize_checkpoint_value: fallback to str() for unknown objects - test: use deepcopy for immutability assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 402-414: The sanitizer _sanitize_cuga_lite_update currently
normalizes keys like "variables_storage" and "tool_calls" but omits
"task_todos", so any non-serializable values under the new task_todos payload
(e.g., datetime, Enum, Path, model objects in text/status) can break checkpoint
writes; update _sanitize_cuga_lite_update to include "task_todos" in the
for-loop so it calls _normalize_checkpoint_value(sanitized["task_todos"]) (same
normalization used for other keys) and ensure the change is also applied to the
duplicate occurrence referenced around lines 2124-2126.
- Around line 1252-1282: The EvolveIntegration calls
(EvolveIntegration.get_guidelines and EvolveIntegration.retrieve_user_facts) are
awaited directly in prompt preparation and must be made fail-open: wrap each
await in a bounded timeout (e.g., asyncio.wait_for) and catch
TimeoutError/Exception so that on timeout or error you log the failure and
continue with no injected context (leave special_instructions_final unchanged or
use empty string) rather than aborting the turn; apply this pattern around the
block that builds evolve_section (build_evolve_guidelines_section) and the
retrieved_preferences assignment so slow or failing Evolve lookups fall back
silently.
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 83-84: The dataclass handling in _normalize_tool_payload uses
asdict(value) without guarding against RecursionError from cyclic dataclasses;
wrap the asdict(...) call in a try/except that catches RecursionError (and
optionally Exception) and on failure falls back to returning str(value)
(matching existing fallback behavior), so that tracked_tool wrappers do not get
a new exception from normalization masking the original tool result/exception.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d62ade03-97b0-4b25-a513-e035a75e879e
📒 Files selected for processing (6)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/evolve/integration.pysrc/cuga/backend/server/main.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
- Add task_todos to checkpoint sanitization keys - Wrap evolve get_guidelines and retrieve_user_facts with timeout for fail-open behavior - Guard dataclass asdict() against RecursionError in tool_call_tracker Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| return m | ||
|
|
||
|
|
||
| def _normalize_checkpoint_value(value: Any) -> Any: |
| return str(value) | ||
|
|
||
|
|
||
| def _sanitize_cuga_lite_update(update: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
LangGraph's default checkpointer serializes state updates with msgpack, which only handles JSON-ish primitives (str/int/float/bool/None, list, dict, bytes). Tool results flowing into variables_storage, cuga_lite_metadata, tool_calls, and task_todos frequently contain datetime, Enum, Path, Pydantic models, dataclasses, or numpy scalars — any one of those crashes the checkpoint write with TypeError: Object of type X is not msgpack serializable, which fails the turn even though the tool call succeeded.
normalize_checkpoint_value converts those types to their serializable form (datetime.isoformat(), enum.value, model_dump(), asdict(), etc.) before we call Command(update=...). It's called from every node that returns a state update (lines 611, 1479, 1813, 1880, 2056, 2072). LangChain BaseMessage objects are deliberately passed through untouched since LangGraph has dedicated serializers for them.
Without this, any tool that returns e.g. a datetime field would break checkpoint-backed resumption.
Moves _normalize_checkpoint_value and _sanitize_cuga_lite_update into checkpoint_sanitizer.py so cuga_lite_graph.py no longer has to import dataclasses, datetime, enum, or pathlib solely for that logic. Also drops the __dict__/vars() fallback: payloads passed through _SANITIZE_KEYS are JSON-like, and the remaining branches already cover Pydantic, dataclasses, datetime, Enum, Path, bytes, and numpy. The fallback silently coerced arbitrary objects, which was not desired. Addresses PR #180 review feedback from @sami-marreed.
| updates.update(additional_updates) | ||
|
|
||
| return Command(goto=END, update=updates) | ||
| return Command(goto=END, update=_sanitize_cuga_lite_update(updates)) |
There was a problem hiding this comment.
why do we need this sanitize
There was a problem hiding this comment.
This sanitization is required for LangGraph checkpoint safety. The Command(update=...) payload flows through the msgpack checkpointer, which fails on non-JSON-ish values (dataclasses, enums, paths, datetimes, numpy types, pydantic models, etc.). Any one of variables_storage, cuga_lite_metadata, tool_calls, task_todos, metrics, or last_summarization_metrics — which is what additional_updates may carry on the error path — could contain such values. _sanitize_cuga_lite_update normalizes only those known keys and leaves message objects untouched. Without it, a runtime error that lands in additional_updates can poison the final checkpoint and break resumption.
There was a problem hiding this comment.
what tool not returning Json or dict?
There was a problem hiding this comment.
msgpack_error.txt
I saw this error once when running the crm demo
| # If policy returned metadata (e.g., playbook guidance), store it | ||
| if metadata: | ||
| state.cuga_lite_metadata = metadata | ||
| state.cuga_lite_metadata = _normalize_checkpoint_value(metadata) |
There was a problem hiding this comment.
why do we need normalize
There was a problem hiding this comment.
Same checkpoint-safety reason: cuga_lite_metadata is one of the fields the msgpack checkpointer serializes, and PolicyEnactment.check_and_enact returns policy metadata that can include dataclass/pydantic/enum instances (e.g. playbook guidance). Assigning raw metadata directly here would propagate non-JSON-ish values into state and fail the next checkpoint write. _normalize_checkpoint_value recursively converts them to plain dicts/lists/primitives while preserving LangChain messages.
There was a problem hiding this comment.
we didn't have issues with this before any reason why we do now?
Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com> # Conflicts: # pyproject.toml # src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py # uv.lock
Checkpoint sanitization — prefer upstream fixesI'm not in favor of sanitizing state at the checkpoint boundary in I'd rather we guarantee msgpack-safe payloads at the edges that produce state (Evolve client/formatting, tool tracker inputs, policy metadata builders) and keep checkpoint writes transparent. If something isn't serializable, fail fast at the producer with a clear error instead of coercing at checkpoint time. Prefer fixing Evolve + tool payloads over graph wrappersBefore adding
That targets the actual sources of bad types instead of a second normalization pass right before persistence. Duplication doesn't justify checkpoint coercionEven if we keep some normalization for tool payloads, Pick one small shared helper used at ingestion, not at graph completion.
|
|
Next step after pr walkthrough meeting:
|
…producers Addresses PR #180 review feedback. Removes the boundary sanitization layer (checkpoint_sanitizer._sanitize_cuga_lite_update / _normalize_checkpoint_value) and the wrappers around every Command(update=...) site in cuga_lite_graph. Producers are now responsible for emitting JSON-friendly values; tool tracker's _normalize_tool_payload is the reference pattern (re-inlined since the shared module is gone). Also drops the `from cuga.backend.server.main import app` fallback in the evolve preference flow so cuga-lite stays server-agnostic; agent_id now flows in via `configurable` only. Refs: #180, #229 Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Continues PR #180 cleanup. Tool tracker now records arguments and result verbatim — no msgpack-safety coercion at the recording boundary. Producer contract: any tool whose value lands in checkpointed state (tool_calls, variables_storage) must already be JSON-clean. If not, fail fast at the producer instead of normalizing here. Drops the dedicated tracker normalization test as well; behavior is no longer asserted at this layer. Refs: #180, #229 Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
| @@ -0,0 +1,243 @@ | |||
| """ | |||
There was a problem hiding this comment.
can you add these tests in sub folder of tests called memory also do you want them to run in CI?
| @@ -0,0 +1,121 @@ | |||
| import copy | |||
There was a problem hiding this comment.
can you add these tests in sub folder of tests called memory also do you want them to run in CI?
There was a problem hiding this comment.
yeah we want to run it in CI, does it need special setup?
| memory_query = state.sub_task or get_latest_memory_query(state.chat_messages) | ||
| current_user_id = str(getattr(state, "user_id", "") or "").strip() | ||
| if current_user_id and memory_query: | ||
| current_agent_id = str(configurable.get("agent_id") or "").strip() | ||
| thread_id_for_memory = str(configurable.get("thread_id") or state.thread_id or "").strip() | ||
|
|
||
| try: | ||
| retrieved_preferences = await asyncio.wait_for( | ||
| EvolveIntegration.retrieve_user_facts( | ||
| current_user_id, | ||
| memory_query, | ||
| ), | ||
| timeout=settings.evolve.timeout, | ||
| ) | ||
| except Exception: | ||
| logger.warning( | ||
| "Evolve: retrieve_user_facts timed out or failed; continuing without preferences" | ||
| ) | ||
| retrieved_preferences = None | ||
|
|
||
| def _log_store_error(task: asyncio.Task) -> None: | ||
| exc = task.exception() if not task.cancelled() else None | ||
| if exc: | ||
| logger.warning("Evolve: store_user_facts failed (non-blocking): %s", exc) | ||
|
|
||
| _store_task = asyncio.create_task( | ||
| EvolveIntegration.store_user_facts( | ||
| current_user_id, | ||
| memory_query, | ||
| metadata={ | ||
| "thread_id": thread_id_for_memory, | ||
| "agent_id": current_agent_id, | ||
| "source": "cuga-lite", | ||
| }, | ||
| ) | ||
| ) | ||
| _store_task.add_done_callback(_log_store_error) | ||
| categories = ( | ||
| retrieved_preferences.get("categories") | ||
| if isinstance(retrieved_preferences, dict) | ||
| else None | ||
| ) | ||
| preference_section = build_evolve_user_preference_section(categories) | ||
| if preference_section: | ||
| special_instructions_final = (special_instructions_final or "") + preference_section | ||
| logger.info("Evolve: Injected user preference context into system prompt") | ||
| logger.debug( |
There was a problem hiding this comment.
can you move this to nice helper files in memory sdk or client etc,
| class AgentState(BaseModel): | ||
| # pages: Annotated[Sequence[str], operator.add] # List of pages traversed | ||
| # page: Page # The Playwright web page lets us interact with the web environment | ||
| user_id: Optional[str] = "default" # TODO: this should be updated in multi user scenario |
There was a problem hiding this comment.
Can you restore this to the default, is it ensures in local its not null so ui doesn't file when showing it
… helper Move the inline guidelines + user-facts (store/retrieve) block out of cuga_lite_graph and into src/cuga/backend/evolve/memory.py exposing build_evolve_special_instructions_extension(). The graph call site shrinks from ~70 lines to a single helper invocation. Group the evolve-related tests under cuga_lite/tests/memory/ and wire that subfolder into the unit-tests CI step. Addresses PR #180 review comments (r3244165611, r3244165051, r3244174174). Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Reverts AgentState.user_id default from None back to "default" so the UI in local single-user mode keeps a non-null value to render. The Evolve memory flow already gates on truthy user_id, so behavior is unchanged when a real user_id is supplied. Addresses PR #180 review comment r3244176771. Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Summary
user_idfield toCugaLiteStateand alignAgentState.user_iddefault to"default_user"store_user_factsfire-and-forget (asyncio.create_task) to avoid blocking the critical path_normalize_checkpoint_valueand_sanitize_cuga_lite_updatefor msgpack-safe state serializationTest plan
uv run pytest src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/ -vpasses (28 tests)uv run pytest src/cuga/backend/evolve/tests/test_integration.py -vpasses (21 tests)cuga start demowithEVOLVE_ENABLED=trueandEVOLVE__LITE_MODE_ONLY=false, confirm log showsEvolve: Stored user factsandEvolve: Injected user preference contextPartially resolve #108
Summary by CodeRabbit
New Features
Improvements
Tests
Chores