Skip to content

Fix Unicode escape sequences in tool input output#36

Open
tim-watcha wants to merge 1 commit into
ZeroSumQuant:mainfrom
tim-watcha:fix/unicode-escape-in-tool-output
Open

Fix Unicode escape sequences in tool input output#36
tim-watcha wants to merge 1 commit into
ZeroSumQuant:mainfrom
tim-watcha:fix/unicode-escape-in-tool-output

Conversation

@tim-watcha
Copy link
Copy Markdown

Summary

Fixes Unicode characters (Korean, Chinese, Japanese, etc.) being displayed as escape sequences like \uc0ac\uc6a9\uc790 instead of actual readable text in HTML, JSON, and Markdown exports when using the --detailed flag.

Problem

When exporting conversations with --detailed flag, tool inputs containing non-ASCII characters were being escaped:

Before:

{
  "thought": "\uc0ac\uc6a9\uc790\uac00 \ucf58\ud150\uce20 \uc81c\ubaa9\uc744 \ud568\uaed8 \uc54c\ub824\ub2ec\ub77c\uace0 \uc694\uccad\ud588\uc2b5\ub2c8\ub2e4",
  "thoughtNumber": 1
}

After:

{
  "thought": "사용자가 콘텐츠 제목을 함께 알려달라고 요청했습니다",
  "thoughtNumber": 1
}

This made exported conversations with non-ASCII tool inputs unreadable, especially problematic for international users.

Solution

Added ensure_ascii=False parameter to json.dumps() calls when serializing tool inputs. Python's json.dumps() uses ensure_ascii=True by default, which escapes all non-ASCII characters.

Changes

  • Line 125 (extract_conversation method): Added ensure_ascii=False to tool_use content formatting
  • Line 186 (_extract_text_content method): Added ensure_ascii=False to detailed mode tool_use formatting

Testing

  • ✅ Core unit tests pass (test_extract_text_content_list)
  • ✅ Manual verification with Unicode characters confirms proper display
  • ✅ No breaking changes to existing functionality

Impact

  • Improves readability for all non-ASCII characters in tool inputs
  • Affects HTML, JSON, and Markdown exports when using --detailed flag
  • No impact on basic extraction without --detailed flag
  • Backward compatible - no API or CLI changes

🤖 Generated with Claude Code

Added ensure_ascii=False parameter to json.dumps() calls when
serializing tool inputs in both extract_conversation() and
_extract_text_content() methods. This prevents non-ASCII characters
(e.g., Korean, Chinese, Japanese) from being escaped as \uXXXX
sequences in HTML, JSON, and Markdown exports.

Changes:
- Line 125: Added ensure_ascii=False to tool_use content formatting
- Line 186: Added ensure_ascii=False to detailed mode tool_use formatting

Fixes issue where tool inputs with Unicode characters were displaying
as escape sequences like \uc0ac\uc6a9\uc790 instead of actual text.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ZeroSumQuant
Copy link
Copy Markdown
Owner

Thank you for submitting this! I'll be working on the repo more here soon and I'll check this out and get it sorted!

ZeroSumQuant pushed a commit that referenced this pull request Jan 1, 2026
Combined with PR #37's INDENT_NUMBER constant while preserving
ensure_ascii=False for proper Unicode handling.
ZeroSumQuant pushed a commit that referenced this pull request Jan 1, 2026
- Implement project filtering with `--project` flag (Issue #38)
- Improve Windows compatibility:
  - Add UTF-8 stdout reconfiguration
  - Remove Unix-specific code (`realtime_search.py`)
- Replace print with logging in `extract_claude_logs.py` (Issue #28)
- Add comprehensive type hints (Issue #27)
- Fix interactive UI tests by mocking `Path.stat`
- Add PDF/DOCX export capabilities (PRs #34, #36, #37 logic integrated)
- Support metadata (--title, --description, --tags) and todo extraction
- Clean up magic numbers into `constants.py`
sytelus added a commit to sytelus/claude-sessions that referenced this pull request Jan 7, 2026
…onstants

Combined with PR ZeroSumQuant#36's ensure_ascii=False for proper Unicode handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants