Skip to content

InMemoryMemoryService search doesn't work with non-Latin text (Japanese, CJK, etc.) #5501

@yodeee9

Description

@yodeee9

I was building a demo with InMemoryMemoryService and noticed that search_memory returns nothing when the conversation is in Japanese.

After digging in, the cause is _extract_words_lower in in_memory_memory_service.py:

return set([word.lower() for word in re.findall(r'[A-Za-z]+', text)])

This regex only matches ASCII letters, so any Japanese/Chinese/Korean/Cyrillic text gets completely dropped. For example:

_extract_words_lower("私の名前は太郎です")  # returns set()
_extract_words_lower("太郎 works at ABC")   # returns {'works', 'at', 'abc'} — 太郎 is lost

This means if you store a session in Japanese and then search for it, you'll never get a match.

A simple fix would be changing [A-Za-z]+ to \w+ with re.UNICODE, which matches word characters across all scripts.

ADK 1.31.1, Python 3.11

Metadata

Metadata

Assignees

Labels

request clarification[Status] The maintainer need clarification or more information from the authorservices[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etcstale[Status] Issues which have been marked inactive since there is no user response

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions