Skip to content

feat: semantic memory (user facts) for cuga-lite#180

Open
gaodan-fang wants to merge 13 commits into
mainfrom
feat/semantic-mcp
Open

feat: semantic memory (user facts) for cuga-lite#180
gaodan-fang wants to merge 13 commits into
mainfrom
feat/semantic-mcp

Conversation

@gaodan-fang
Copy link
Copy Markdown
Collaborator

@gaodan-fang gaodan-fang commented Apr 29, 2026

Summary

  • Add Evolve user facts store/retrieve integration to cuga-lite graph for durable user preference memory
  • Add user_id field to CugaLiteState and align AgentState.user_id default to "default_user"
  • Make store_user_facts fire-and-forget (asyncio.create_task) to avoid blocking the critical path
  • Add _normalize_checkpoint_value and _sanitize_cuga_lite_update for msgpack-safe state serialization
  • Add helper functions for evolve guidelines/preferences formatting and prompt injection
  • Add comprehensive unit and integration tests (49 tests total)

Test plan

  • Verify uv run pytest src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/ -v passes (28 tests)
  • Verify uv run pytest src/cuga/backend/evolve/tests/test_integration.py -v passes (21 tests)
  • Run cuga start demo with EVOLVE_ENABLED=true and EVOLVE__LITE_MODE_ONLY=false, confirm log shows Evolve: Stored user facts and Evolve: Injected user preference context
  • Confirm store_user_facts does not add latency to turns (fire-and-forget)

Partially resolve #108

Summary by CodeRabbit

  • New Features

    • Durable Evolve-based user-facts storage and retrieval; prompts may include injected user-preference sections.
  • Improvements

    • Better guideline extraction/formatting for clearer guidance.
    • Tool-call and checkpoint payload normalization for more robust serialization.
    • Request-scoped user ID now propagated into local state for consistent personalization.
    • Optional evolve dependency updated to a newer minimum version.
  • Tests

    • New integration and unit tests covering evolve flows, preference formatting, and normalization.
  • Chores

    • VCS ignores updated to exclude additional local directories.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds evolve-backed async user-facts persistence/retrieval and preference/guideline formatting; introduces recursive checkpoint-safe normalization for complex types (NumPy, dataclasses, Pydantic-like, datetime, Enum, Path, bytes) in Cuga Lite and tool-call tracker; sets AgentState.user_id default to None; updates evolve dependency and related tests.

Changes

Cohort / File(s) Summary
Config
/.gitignore
Added .omc/ and .omx/ ignore patterns.
Evolve integration & tests
src/cuga/backend/evolve/integration.py, src/cuga/backend/evolve/tests/test_integration.py
Added async store_user_facts and retrieve_user_facts with enablement gating, tool-call wiring, JSON metadata encoding, return-shape handling, and failure suppression; tests cover enabled/disabled/error cases and payload shape.
Evolve formatting & tests
src/cuga/backend/evolve/formatting.py, src/cuga/backend/evolve/tests/...
New helpers to extract first/latest human message, parse/format guidelines, and render user-preference sections; tests validate selection logic, formatting, immutability, and empty-input behavior.
Cuga Lite graph core & tests
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py, .../test_cuga_lite_graph_evolve_preferences.py
Added _normalize_checkpoint_value and _sanitize_cuga_lite_update and applied sanitization across create/call_model/sandbox paths; extended CugaLiteState with optional user_id handling and injected formatted Evolve preference section when available; schedules non-blocking store_user_facts and awaits retrieve_user_facts; tests cover integration, injection, and serialization.
Tool call tracker & tests
src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py
Added _normalize_tool_payload to recursively convert arguments/results to builtin/msgpack-safe types (NumPy, datetime, Enum, Path, bytes, model_dump/asdict, mappings/iterables, fallback stringification); record_call persists normalized payloads; tests assert conversion and serializability.
Agent state
src/cuga/backend/cuga_graph/state/agent_state.py
Changed AgentState.user_id default from the string "default" to None.
Server
src/cuga/backend/server/main.py
event_stream now writes the request-scoped user_id into local_state.user_id when loading/creating local_state.
Project metadata
pyproject.toml
Bumped optional altk-evolve constraint for Python ≥3.12 from >=1.0.6 to >=1.1.0.

Sequence Diagram(s)

sequenceDiagram
    participant Client as CugaLiteGraph
    participant Evolve as EvolveIntegration
    participant Serializer as JsonPlusSerializer

    Client->>Client: extract first HumanMessage (user_id, message)
    par fire-and-forget
        Client->>Evolve: schedule store_user_facts(user_id, message)  -- non-blocking
    and await retrieve
        Client->>Evolve: await retrieve_user_facts(user_id, query)
        activate Evolve
        Evolve->>Evolve: is_enabled? validate inputs & call tool
        Evolve-->>Client: dict(categories) or None
        deactivate Evolve
    end
    alt categories present
        Client->>Client: format preferences section
        Client->>Client: inject "## Evolve User Preference" into prompt
    end
    Client->>Client: _sanitize_cuga_lite_update(state) (normalize complex types)
    Client->>Serializer: dumps_typed(normalized_state)
    Serializer-->>Client: JSON-serializable payload
    Client->>Client: checkpoint write / return payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • sami-marreed
  • gjt-prog

Poem

🐰 I hopped through bytes and smoothed each float,
Stored tiny facts while keeping prompts afloat,
NumPy grew gentle, checkpoints sang in tune,
Preferences tucked in — a soft paper moon,
The rabbit nudges code and hums a quiet rune.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: adding semantic memory (user facts) integration for cuga-lite through Evolve.
Linked Issues check ✅ Passed PR addresses core foundation goals from #108: user profile integration via user_id field, semantic memory through store/retrieve user facts, preferences formatting, and prompt injection for context awareness.
Out of Scope Changes check ✅ Passed All changes align with #108 foundation phase: user_id field addition, Evolve integration for facts storage, preferences formatting utilities, and msgpack serialization normalization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/semantic-mcp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1152-1168: The current code schedules
EvolveIntegration.store_user_facts in background (_store_task via
asyncio.create_task) then immediately awaits
EvolveIntegration.retrieve_user_facts with the same memory_query, causing a race
where the just-stored facts may or may not be visible; to make ordering
deterministic, await the store call before calling retrieve (i.e., replace the
background scheduling of EvolveIntegration.store_user_facts with an await and
remove or adapt the _store_task/_log_store_error usage) so that store_user_facts
completes before retrieve_user_facts is invoked.
- Around line 1128-1130: Stop logging the full prompt contents: find the
logger.debug calls that reference special_instructions_final (the debug lines in
the cuga_lite_graph code that print "Evolve: Full special_instructions...") and
replace them so they do not emit the entire special_instructions_final; instead
log safe metadata only (e.g., a redacted snippet, its length, and/or a
hash/checksum). Update both debug occurrences that reference
special_instructions_final so logs contain no sensitive prompt text but still
provide useful diagnostics.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 116-123: The test’s immutability check uses a shallow copy
(`before = list(state.chat_messages)`), which won’t detect in-place mutations of
message objects; replace the shallow snapshot with a deep snapshot (e.g.,
capture a deep copy of state.chat_messages using copy.deepcopy or equivalent)
before calling the helpers (_get_first_human_message_content,
_get_latest_memory_query, _format_evolve_user_preference,
_format_evolve_guidelines) and then compare the deep-copied snapshot to
state.chat_messages to ensure in-place mutations are detected.

In `@src/cuga/backend/cuga_graph/state/agent_state.py`:
- Line 924: The user_id field in AgentState is currently defaulting to
"default_user", causing unrelated sessions to share durable memory; change the
default for user_id in AgentState (user_id: Optional[str]) to None (or empty
string ""), and make the identical change in CugaLiteState.user_id so the
existing skip-on-missing-user logic triggers correctly; locate the user_id
declarations in agent_state.py (AgentState) and the CugaLiteState class and
update their defaults and any type hints/comments accordingly, and run tests to
ensure memory-keying and skip logic behave as intended.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0f52109-e11a-4bf6-a3d5-5e4f0203eeac

📥 Commits

Reviewing files that changed from the base of the PR and between ee840d6 and 2affb72.

📒 Files selected for processing (9)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/evolve/integration.py
  • src/cuga/backend/evolve/tests/test_integration.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
Comment thread src/cuga/backend/cuga_graph/state/agent_state.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py (1)

116-123: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a deep snapshot for the immutability assertion.

Line 116 uses a shallow copy, so in-place mutation of message objects could slip through while Line 123 still passes.

Suggested fix
+import copy
 import numpy as np
@@
-    before = list(state.chat_messages)
+    before = copy.deepcopy(state.chat_messages)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`
around lines 116 - 123, The test currently takes a shallow copy of
state.chat_messages into before so in-place mutations of message objects can go
undetected; replace the shallow copy with a deep snapshot (e.g., deepcopy or
serializing/deserializing) of state.chat_messages before calling
_get_first_human_message_content, _get_latest_memory_query,
_format_evolve_user_preference, and _format_evolve_guidelines, then assert the
deep-copied snapshot equals state.chat_messages to ensure true immutability of
message objects.
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

1414-1416: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop logging the full injected prompt content.

Line 1415 and Line 1465 dump special_instructions_final, which can include user preferences/profile context and other sensitive prompt content.

Suggested fix
-                        logger.debug(
-                            f"Evolve: Full special_instructions with guidelines:\n{special_instructions_final}"
-                        )
+                        logger.debug(
+                            "Evolve: Injected guidelines section (chars={})",
+                            len(evolve_section),
+                        )
@@
-                        logger.debug(
-                            f"Evolve: Full special_instructions with preferences:\n{special_instructions_final}"
-                        )
+                        logger.debug(
+                            "Evolve: Injected user preference section (chars={})",
+                            len(preference_section),
+                        )

Also applies to: 1464-1466

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1414 - 1416, The code is logging full sensitive prompt content by dumping
special_instructions_final via logger.debug; remove or redact the raw
special_instructions_final output in the logger calls (the logger.debug calls
that print "Evolve: Full special_instructions with guidelines" around
special_instructions_final) and instead log a non-sensitive summary such as
existence, length, SHA256/hash, or a masked snippet, or simply log that
special_instructions_final was set without printing its contents; update all
occurrences (including the other call at lines around 1464–1466 that also prints
special_instructions_final) to use the redaction/summary approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 86-90: The dict-key normalization in _normalize_tool_payload can
turn tuple keys into unhashable lists causing record_call to raise; modify
_normalize_tool_payload so when normalizing dict keys (in the dict branch) you
preserve hashability: compute normalized_key = _normalize_tool_payload(key) and
if the original key was a tuple or the normalized_key is a list/frozenset-like,
convert it back to an immutable/hashable type (e.g., tuple) before using it as a
dict key, while still normalizing the value via _normalize_tool_payload(item);
ensure this preserves tuples as keys for record_call to work correctly.

In `@src/cuga/backend/evolve/integration.py`:
- Around line 78-80: The log in the Evolve success path currently emits raw
user_id (logger.info f"...{user_id}..."), which leaks PII; replace that by
logging an anonymized identifier instead (e.g., compute a deterministic hash or
token from user_id and log only a short prefix) so you still can correlate
events without storing the raw ID; update the call site that builds the message
(the logger.info line that references user_id and result.get) to use the
anonymized value (e.g., user_hash or anonymized_user_id) and ensure the
hashing/obfuscation is done via a stable function imported/defined beside this
code.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1414-1416: The code is logging full sensitive prompt content by
dumping special_instructions_final via logger.debug; remove or redact the raw
special_instructions_final output in the logger calls (the logger.debug calls
that print "Evolve: Full special_instructions with guidelines" around
special_instructions_final) and instead log a non-sensitive summary such as
existence, length, SHA256/hash, or a masked snippet, or simply log that
special_instructions_final was set without printing its contents; update all
occurrences (including the other call at lines around 1464–1466 that also prints
special_instructions_final) to use the redaction/summary approach.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 116-123: The test currently takes a shallow copy of
state.chat_messages into before so in-place mutations of message objects can go
undetected; replace the shallow copy with a deep snapshot (e.g., deepcopy or
serializing/deserializing) of state.chat_messages before calling
_get_first_human_message_content, _get_latest_memory_query,
_format_evolve_user_preference, and _format_evolve_guidelines, then assert the
deep-copied snapshot equals state.chat_messages to ensure true immutability of
message objects.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0ac16a7-83cb-4708-b4bb-997bc2775d8f

📥 Commits

Reviewing files that changed from the base of the PR and between 2affb72 and 8b6f1a6.

📒 Files selected for processing (9)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/evolve/integration.py
  • src/cuga/backend/evolve/tests/test_integration.py
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • src/cuga/backend/cuga_graph/state/agent_state.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py Outdated
Comment thread src/cuga/backend/evolve/integration.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (2)

336-344: 💤 Low value

Consider logging conversion failures for debugging.

The try-except-pass blocks catch blind exceptions and silently continue, which can make debugging difficult when unexpected conversion failures occur. While the fallback behavior is intentional, a debug-level log would help diagnose issues without affecting the control flow.

Suggested improvement
     if value_module.startswith("numpy"):
         if hasattr(value, "item") and callable(value.item):
             try:
                 return _normalize_checkpoint_value(value.item())
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug("numpy .item() conversion failed: %s", e)
         if hasattr(value, "tolist") and callable(value.tolist):
             try:
                 return _normalize_checkpoint_value(value.tolist())
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug("numpy .tolist() conversion failed: %s", e)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
336 - 344, The two silent except blocks around converting values using
value.item() and value.tolist() should log the caught exception at debug level
before continuing; update the try/except in the _normalize_checkpoint_value
conversion code (the blocks that call value.item() and value.tolist()) to catch
Exception as e and call the module or class logger (e.g., logger.debug or
self.logger.debug) with a short message including which conversion failed and
the exception (e.g., "normalize checkpoint: value.item() failed: %s", e) so
behavior remains unchanged but failures are discoverable.

1263-1271: 💤 Low value

Consider logging the agent_id fallback failure.

The broad exception catch on Line 1270 silently swallows errors when accessing app_state.agent_id. A debug log would help diagnose configuration issues without affecting the fallback behavior.

Suggested improvement
                         try:
                             from cuga.backend.server.main import app as _app

                             _app_state = getattr(_app.state, "app_state", None)
                             current_agent_id = str(getattr(_app_state, "agent_id", "") or "").strip()
-                        except Exception:
+                        except Exception as e:
+                            logger.debug("Could not resolve agent_id from app_state: %s", e)
                             current_agent_id = ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1263 - 1271, The except block that falls back when reading agent_id from
configurable/_app_state in cuga_lite_graph.py silently swallows errors; update
the except Exception handler around the import and getattr block (where
current_agent_id, _app, and _app_state are used) to log the caught exception at
debug level (including context like "failed to read app_state.agent_id" and the
exception info) before setting current_agent_id = ""; use the module or class
logger already in scope (or create one if missing) so failures to access
app_state.agent_id are recorded without changing the fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 336-344: The two silent except blocks around converting values
using value.item() and value.tolist() should log the caught exception at debug
level before continuing; update the try/except in the
_normalize_checkpoint_value conversion code (the blocks that call value.item()
and value.tolist()) to catch Exception as e and call the module or class logger
(e.g., logger.debug or self.logger.debug) with a short message including which
conversion failed and the exception (e.g., "normalize checkpoint: value.item()
failed: %s", e) so behavior remains unchanged but failures are discoverable.
- Around line 1263-1271: The except block that falls back when reading agent_id
from configurable/_app_state in cuga_lite_graph.py silently swallows errors;
update the except Exception handler around the import and getattr block (where
current_agent_id, _app, and _app_state are used) to log the caught exception at
debug level (including context like "failed to read app_state.agent_id" and the
exception info) before setting current_agent_id = ""; use the module or class
logger already in scope (or create one if missing) so failures to access
app_state.agent_id are recorded without changing the fallback behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07aa22d6-0cbc-49d3-8e37-f8d8c377109d

📥 Commits

Reviewing files that changed from the base of the PR and between 8b6f1a6 and 8433c31.

📒 Files selected for processing (3)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
  • src/cuga/backend/evolve/formatting.py

gaodan-fang and others added 5 commits May 1, 2026 12:07
- Add user_id field to CugaLiteState so evolve memory calls are invoked
- Align AgentState.user_id default to "default_user" matching DEFAULT_USER_ID
- Change store_user_facts to asyncio.create_task (non-blocking)
- Add debug log for full special_instructions after preference injection
- Add integration tests for the full store/retrieve/prompt injection flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracts 7 evolve-related formatting helpers from cuga_lite_graph.py
into src/cuga/backend/evolve/formatting.py for better cohesion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrades evolve SDK to 1.1.0 which adds user facts MCP tools,
multi-user/namespace support, and SSE transport hardening.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gaodan-fang gaodan-fang force-pushed the feat/semantic-mcp branch from ce741a5 to 206fc4b Compare May 1, 2026 16:08
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py (1)

118-125: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a deep snapshot for the immutability assertion.

before = list(state.chat_messages) only clones the list container, so an in-place mutation of a message object would still pass this test.

Suggested fix
+import copy
 import numpy as np
 from langchain_core.messages import HumanMessage, AIMessage
 from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer
@@
-    before = list(state.chat_messages)
+    before = copy.deepcopy(state.chat_messages)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`
around lines 118 - 125, The test currently captures only a shallow copy with
before = list(state.chat_messages), which will miss in-place mutations of
message objects; replace this with a deep snapshot of state.chat_messages (for
example using copy.deepcopy or serializing each message) before calling
get_first_human_message_content, get_latest_memory_query,
format_evolve_user_preference, and format_evolve_guidelines, then assert the
deep-copied snapshot equals the post-call state to ensure true immutability of
the message objects referenced by state.chat_messages.
src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py (1)

86-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve hashable dict keys during payload normalization.

A tuple key will flow through the iterable branch and become a list here, so rebuilding the dict can raise TypeError. Since record_call() runs from the decorator's finally, that can turn an otherwise successful tool call into a failure.

Suggested fix
-    if isinstance(value, dict):
-        return {_normalize_tool_payload(key): _normalize_tool_payload(item) for key, item in value.items()}
-
-    if isinstance(value, (list, tuple, set, frozenset)):
-        return [_normalize_tool_payload(item) for item in value]
+    if isinstance(value, dict):
+        normalized: dict[Any, Any] = {}
+        for key, item in value.items():
+            normalized_key = _normalize_tool_payload(key)
+            if not isinstance(normalized_key, (str, int, float, bool, type(None), tuple)):
+                normalized_key = str(normalized_key)
+            normalized[normalized_key] = _normalize_tool_payload(item)
+        return normalized
+
+    if isinstance(value, list):
+        return [_normalize_tool_payload(item) for item in value]
+
+    if isinstance(value, tuple):
+        return tuple(_normalize_tool_payload(item) for item in value)
+
+    if isinstance(value, (set, frozenset)):
+        return [_normalize_tool_payload(item) for item in value]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py` around
lines 86 - 90, The dict branch in _normalize_tool_payload currently normalizes
both keys and values which can convert hashable keys (e.g., tuple) into
unhashable types and break dict reconstruction inside record_call; change the
dict comprehension in _normalize_tool_payload to preserve original dict keys (do
not run _normalize_tool_payload on the key) and only normalize the dict values
(e.g., return {key: _normalize_tool_payload(item) for key, item in
value.items()}) so tuple keys remain hashable and record_call() won't fail.
src/cuga/backend/evolve/integration.py (1)

78-80: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop logging raw user_id in the success path.

This keeps a stable per-user identifier in INFO logs for every memory write. Dropping or anonymizing it preserves the useful signal (stored_count) without retaining user-linked data in centralized logs.

Suggested fix
-                logger.info(
-                    f"Evolve: Stored user facts for {user_id} ({result.get('stored_count', 0)} updates)"
-                )
+                logger.info(
+                    "Evolve: Stored user facts (stored_count={})",
+                    result.get("stored_count", 0),
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/evolve/integration.py` around lines 78 - 80, The INFO log in
integration.py currently prints the raw user_id (logger.info f"Evolve: Stored
user facts for {user_id} ..."), which must be removed or anonymized; update the
logger.info call in the function/module that handles the memory write
(referencing user_id, result, and logger.info) so it no longer emits the raw
user_id — either drop the user_id entirely and only log the stored_count
(result.get('stored_count', 0)), or replace user_id with a stable anonymized
token (e.g., a short deterministic hash of user_id) before including it in the
message so logs retain a stable per-user signal without exposing raw
identifiers.
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

1256-1258: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid dumping the fully injected prompt to debug logs.

Both debug lines emit special_instructions_final, which now includes retrieved preference memory and other prompt-only context. That creates an unnecessary sensitive-data copy in logs on every turn.

Also applies to: 1306-1308

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1256 - 1258, The debug statements currently logging special_instructions_final
expose sensitive prompt-only context (including retrieved preference memory);
change the logger.debug calls that reference special_instructions_final (the two
occurrences around the Evolve block and the other at lines ~1306-1308) to log a
redacted/summary version instead — e.g., log only a sanitized summary, length,
or hash of special_instructions_final (create or call a small helper like
redact_or_hash_prompt(special_instructions_final) or sanitize_prompt) so the
code still provides traceability without dumping the full prompt contents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 328-393: The helper _normalize_checkpoint_value currently returns
unknown objects unchanged which lets non-serializable SDK/ORM/tool objects
bypass sanitization for variables_storage, cuga_lite_metadata, and tool_calls;
change the final fallback so unknown objects are converted into a safe
serializable representation (e.g., try to convert via vars(value) or
getattr(value, "__dict__", None) and normalize that, otherwise return
repr(value) or str(value)) so every branch yields JSON/msgpack-friendly
primitives/containers rather than raw arbitrary objects.
- Around line 1260-1296: The stream endpoint extracts user_id (variable user_id
in the stream handler) but never assigns it onto the local_state, causing
per-request state to default to DEFAULT_USER_ID; after each local_state creation
or reload (where local_state is instantiated/loaded in the stream handler and
where state is reloaded inside the stream loop), set local_state.user_id =
user_id immediately after the local_state is created/loaded so subsequent code
(e.g., EvolveIntegration.store_user_facts / retrieve_user_facts and any usage of
state.user_id) sees the authenticated user; update all creation paths and reload
paths where local_state is constructed (the initial local_state instantiation
sites and the reload branches referenced in the comment) to assign the user_id
to local_state.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1256-1258: The debug statements currently logging
special_instructions_final expose sensitive prompt-only context (including
retrieved preference memory); change the logger.debug calls that reference
special_instructions_final (the two occurrences around the Evolve block and the
other at lines ~1306-1308) to log a redacted/summary version instead — e.g., log
only a sanitized summary, length, or hash of special_instructions_final (create
or call a small helper like redact_or_hash_prompt(special_instructions_final) or
sanitize_prompt) so the code still provides traceability without dumping the
full prompt contents.

In
`@src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py`:
- Around line 118-125: The test currently captures only a shallow copy with
before = list(state.chat_messages), which will miss in-place mutations of
message objects; replace this with a deep snapshot of state.chat_messages (for
example using copy.deepcopy or serializing each message) before calling
get_first_human_message_content, get_latest_memory_query,
format_evolve_user_preference, and format_evolve_guidelines, then assert the
deep-copied snapshot equals the post-call state to ensure true immutability of
the message objects referenced by state.chat_messages.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 86-90: The dict branch in _normalize_tool_payload currently
normalizes both keys and values which can convert hashable keys (e.g., tuple)
into unhashable types and break dict reconstruction inside record_call; change
the dict comprehension in _normalize_tool_payload to preserve original dict keys
(do not run _normalize_tool_payload on the key) and only normalize the dict
values (e.g., return {key: _normalize_tool_payload(item) for key, item in
value.items()}) so tuple keys remain hashable and record_call() won't fail.

In `@src/cuga/backend/evolve/integration.py`:
- Around line 78-80: The INFO log in integration.py currently prints the raw
user_id (logger.info f"Evolve: Stored user facts for {user_id} ..."), which must
be removed or anonymized; update the logger.info call in the function/module
that handles the memory write (referencing user_id, result, and logger.info) so
it no longer emits the raw user_id — either drop the user_id entirely and only
log the stored_count (result.get('stored_count', 0)), or replace user_id with a
stable anonymized token (e.g., a short deterministic hash of user_id) before
including it in the message so logs retain a stable per-user signal without
exposing raw identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 156cec38-18fe-4edb-b5df-d46c8d8175f0

📥 Commits

Reviewing files that changed from the base of the PR and between 8433c31 and 206fc4b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .gitignore
  • pyproject.toml
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_integration.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/evolve/formatting.py
  • src/cuga/backend/evolve/integration.py
  • src/cuga/backend/evolve/tests/test_integration.py
✅ Files skipped from review due to trivial changes (3)
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • pyproject.toml
  • src/cuga/backend/evolve/formatting.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitignore
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_tool_call_tracker.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
- tool_call_tracker: keep dict keys hashable (tuple keys stay tuples)
- integration.py: remove raw user_id from info logs (PII)
- agent_state/CugaLiteState: default user_id to None instead of
  "default_user" to prevent cross-user memory bleed
- main.py: wire authenticated user_id onto local_state
- cuga_lite_graph: reorder retrieve before fire-and-forget store for
  deterministic ordering; redact full prompt from debug logs
- _normalize_checkpoint_value: fallback to str() for unknown objects
- test: use deepcopy for immutability assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 402-414: The sanitizer _sanitize_cuga_lite_update currently
normalizes keys like "variables_storage" and "tool_calls" but omits
"task_todos", so any non-serializable values under the new task_todos payload
(e.g., datetime, Enum, Path, model objects in text/status) can break checkpoint
writes; update _sanitize_cuga_lite_update to include "task_todos" in the
for-loop so it calls _normalize_checkpoint_value(sanitized["task_todos"]) (same
normalization used for other keys) and ensure the change is also applied to the
duplicate occurrence referenced around lines 2124-2126.
- Around line 1252-1282: The EvolveIntegration calls
(EvolveIntegration.get_guidelines and EvolveIntegration.retrieve_user_facts) are
awaited directly in prompt preparation and must be made fail-open: wrap each
await in a bounded timeout (e.g., asyncio.wait_for) and catch
TimeoutError/Exception so that on timeout or error you log the failure and
continue with no injected context (leave special_instructions_final unchanged or
use empty string) rather than aborting the turn; apply this pattern around the
block that builds evolve_section (build_evolve_guidelines_section) and the
retrieved_preferences assignment so slow or failing Evolve lookups fall back
silently.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py`:
- Around line 83-84: The dataclass handling in _normalize_tool_payload uses
asdict(value) without guarding against RecursionError from cyclic dataclasses;
wrap the asdict(...) call in a try/except that catches RecursionError (and
optionally Exception) and on failure falls back to returning str(value)
(matching existing fallback behavior), so that tracked_tool wrappers do not get
a new exception from normalization masking the original tool result/exception.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d62ade03-97b0-4b25-a513-e035a75e879e

📥 Commits

Reviewing files that changed from the base of the PR and between 206fc4b and cb4d8da.

📒 Files selected for processing (6)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/evolve/integration.py
  • src/cuga/backend/server/main.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/tests/test_cuga_lite_graph_evolve_preferences.py

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py Outdated
Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/tool_call_tracker.py Outdated
- Add task_todos to checkpoint sanitization keys
- Wrap evolve get_guidelines and retrieve_user_facts with timeout
  for fail-open behavior
- Guard dataclass asdict() against RecursionError in tool_call_tracker

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
return m


def _normalize_checkpoint_value(value: Any) -> Any:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to some helper

return str(value)


def _sanitize_cuga_lite_update(update: Dict[str, Any]) -> Dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangGraph's default checkpointer serializes state updates with msgpack, which only handles JSON-ish primitives (str/int/float/bool/None, list, dict, bytes). Tool results flowing into variables_storage, cuga_lite_metadata, tool_calls, and task_todos frequently contain datetime, Enum, Path, Pydantic models, dataclasses, or numpy scalars — any one of those crashes the checkpoint write with TypeError: Object of type X is not msgpack serializable, which fails the turn even though the tool call succeeded.

normalize_checkpoint_value converts those types to their serializable form (datetime.isoformat(), enum.value, model_dump(), asdict(), etc.) before we call Command(update=...). It's called from every node that returns a state update (lines 611, 1479, 1813, 1880, 2056, 2072). LangChain BaseMessage objects are deliberately passed through untouched since LangGraph has dedicated serializers for them.

Without this, any tool that returns e.g. a datetime field would break checkpoint-backed resumption.

Comment thread src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
Moves _normalize_checkpoint_value and _sanitize_cuga_lite_update into
checkpoint_sanitizer.py so cuga_lite_graph.py no longer has to import
dataclasses, datetime, enum, or pathlib solely for that logic.

Also drops the __dict__/vars() fallback: payloads passed through
_SANITIZE_KEYS are JSON-like, and the remaining branches already cover
Pydantic, dataclasses, datetime, Enum, Path, bytes, and numpy. The
fallback silently coerced arbitrary objects, which was not desired.

Addresses PR #180 review feedback from @sami-marreed.
@gaodan-fang gaodan-fang requested a review from sami-marreed May 4, 2026 14:53
updates.update(additional_updates)

return Command(goto=END, update=updates)
return Command(goto=END, update=_sanitize_cuga_lite_update(updates))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this sanitize

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sanitization is required for LangGraph checkpoint safety. The Command(update=...) payload flows through the msgpack checkpointer, which fails on non-JSON-ish values (dataclasses, enums, paths, datetimes, numpy types, pydantic models, etc.). Any one of variables_storage, cuga_lite_metadata, tool_calls, task_todos, metrics, or last_summarization_metrics — which is what additional_updates may carry on the error path — could contain such values. _sanitize_cuga_lite_update normalizes only those known keys and leaves message objects untouched. Without it, a runtime error that lands in additional_updates can poison the final checkpoint and break resumption.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what tool not returning Json or dict?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msgpack_error.txt
I saw this error once when running the crm demo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid it?

# If policy returned metadata (e.g., playbook guidance), store it
if metadata:
state.cuga_lite_metadata = metadata
state.cuga_lite_metadata = _normalize_checkpoint_value(metadata)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need normalize

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same checkpoint-safety reason: cuga_lite_metadata is one of the fields the msgpack checkpointer serializes, and PolicyEnactment.check_and_enact returns policy metadata that can include dataclass/pydantic/enum instances (e.g. playbook guidance). Assigning raw metadata directly here would propagate non-JSON-ish values into state and fail the next checkpoint write. _normalize_checkpoint_value recursively converts them to plain dicts/lists/primitives while preserving LangChain messages.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we didn't have issues with this before any reason why we do now?

@sami-marreed sami-marreed self-assigned this May 10, 2026
Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>

# Conflicts:
#	pyproject.toml
#	src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
#	uv.lock
@sami-marreed
Copy link
Copy Markdown
Contributor

Checkpoint sanitization — prefer upstream fixes

I'm not in favor of sanitizing state at the checkpoint boundary in cuga_lite_graph. It spreads serialization policy across many Command(update=...) call sites, relies on allowlists + str() fallbacks, and is easy to miss or drift when the graph changes.

I'd rather we guarantee msgpack-safe payloads at the edges that produce state (Evolve client/formatting, tool tracker inputs, policy metadata builders) and keep checkpoint writes transparent. If something isn't serializable, fail fast at the producer with a clear error instead of coercing at checkpoint time.


Prefer fixing Evolve + tool payloads over graph wrappers

Before adding checkpoint_sanitizer / _sanitize_cuga_lite_update, can we normalize only where data enters the system?

  • Evolve: retrieve_user_facts / get_guidelines / any metadata merged into prompt or state should return plain JSON-friendly structures (no numpy, Path, enums, etc.).
  • Tool tracking: normalize arguments / result when they're recorded, if those records are what lands in checkpointed tool_calls.

That targets the actual sources of bad types instead of a second normalization pass right before persistence.


Duplication doesn't justify checkpoint coercion

Even if we keep some normalization for tool payloads, normalize_checkpoint_value and _normalize_tool_payload shouldn't both exist as near-copies—and they don't justify a separate checkpoint sanitization layer on top.

Pick one small shared helper used at ingestion, not at graph completion.


str() fallback hides bugs

The checkpoint sanitizer's str(value) fallback makes persistence "succeed" but can destroy structure and hide mistakes after resume. That's another reason I'd avoid checkpoint-level sanitization: I'd prefer explicit serialization at producers or a hard error when an unsupported type appears.


Drop server.main import from the graph

Regardless of sanitization, from cuga.backend.server.main import app inside create_cuga_lite_graph couples the graph to the HTTP stack. Please pass agent_id / thread_id / user_id via configurable only so cuga-lite stays server-agnostic.

@gaodan-fang
Copy link
Copy Markdown
Collaborator Author

gaodan-fang commented May 14, 2026

Next step after pr walkthrough meeting:

…producers

Addresses PR #180 review feedback. Removes the boundary sanitization layer
(checkpoint_sanitizer._sanitize_cuga_lite_update / _normalize_checkpoint_value)
and the wrappers around every Command(update=...) site in cuga_lite_graph.
Producers are now responsible for emitting JSON-friendly values; tool tracker's
_normalize_tool_payload is the reference pattern (re-inlined since the shared
module is gone).

Also drops the `from cuga.backend.server.main import app` fallback in the
evolve preference flow so cuga-lite stays server-agnostic; agent_id now flows
in via `configurable` only.

Refs: #180, #229
Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Continues PR #180 cleanup. Tool tracker now records arguments and result
verbatim — no msgpack-safety coercion at the recording boundary. Producer
contract: any tool whose value lands in checkpointed state (tool_calls,
variables_storage) must already be JSON-clean. If not, fail fast at the
producer instead of normalizing here.

Drops the dedicated tracker normalization test as well; behavior is no
longer asserted at this layer.

Refs: #180, #229
Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
@gaodan-fang gaodan-fang requested a review from sami-marreed May 14, 2026 16:00
@@ -0,0 +1,243 @@
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add these tests in sub folder of tests called memory also do you want them to run in CI?

@@ -0,0 +1,121 @@
import copy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add these tests in sub folder of tests called memory also do you want them to run in CI?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we want to run it in CI, does it need special setup?

Comment on lines 1485 to 1531
memory_query = state.sub_task or get_latest_memory_query(state.chat_messages)
current_user_id = str(getattr(state, "user_id", "") or "").strip()
if current_user_id and memory_query:
current_agent_id = str(configurable.get("agent_id") or "").strip()
thread_id_for_memory = str(configurable.get("thread_id") or state.thread_id or "").strip()

try:
retrieved_preferences = await asyncio.wait_for(
EvolveIntegration.retrieve_user_facts(
current_user_id,
memory_query,
),
timeout=settings.evolve.timeout,
)
except Exception:
logger.warning(
"Evolve: retrieve_user_facts timed out or failed; continuing without preferences"
)
retrieved_preferences = None

def _log_store_error(task: asyncio.Task) -> None:
exc = task.exception() if not task.cancelled() else None
if exc:
logger.warning("Evolve: store_user_facts failed (non-blocking): %s", exc)

_store_task = asyncio.create_task(
EvolveIntegration.store_user_facts(
current_user_id,
memory_query,
metadata={
"thread_id": thread_id_for_memory,
"agent_id": current_agent_id,
"source": "cuga-lite",
},
)
)
_store_task.add_done_callback(_log_store_error)
categories = (
retrieved_preferences.get("categories")
if isinstance(retrieved_preferences, dict)
else None
)
preference_section = build_evolve_user_preference_section(categories)
if preference_section:
special_instructions_final = (special_instructions_final or "") + preference_section
logger.info("Evolve: Injected user preference context into system prompt")
logger.debug(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to nice helper files in memory sdk or client etc,

class AgentState(BaseModel):
# pages: Annotated[Sequence[str], operator.add] # List of pages traversed
# page: Page # The Playwright web page lets us interact with the web environment
user_id: Optional[str] = "default" # TODO: this should be updated in multi user scenario
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you restore this to the default, is it ensures in local its not null so ui doesn't file when showing it

… helper

Move the inline guidelines + user-facts (store/retrieve) block out of
cuga_lite_graph and into src/cuga/backend/evolve/memory.py exposing
build_evolve_special_instructions_extension(). The graph call site
shrinks from ~70 lines to a single helper invocation.

Group the evolve-related tests under cuga_lite/tests/memory/ and wire
that subfolder into the unit-tests CI step.

Addresses PR #180 review comments (r3244165611, r3244165051, r3244174174).

Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Reverts AgentState.user_id default from None back to "default" so the UI
in local single-user mode keeps a non-null value to render. The Evolve
memory flow already gates on truthy user_id, so behavior is unchanged
when a real user_id is supplied.

Addresses PR #180 review comment r3244176771.

Signed-off-by: Gaodan Fang <Gaodan.Fang@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Personalization Integration for CUGA

2 participants