Skip to content

Conversation

@vivche
Copy link

@vivche vivche commented Jan 27, 2026

Fixes #644 - Windows Unicode Encoding Issue Report

Problem

The application crashes on Windows when processing or displaying Unicode characters beyond the Western European character set. This critical cross-platform compatibility issue occurs because:

  • Windows Default: Python uses cp1252 encoding for stdout/stderr (limited to 256 Western European characters)
  • Modern Web/Cloud: Azure services and web applications use UTF-8 encoding universally
  • Result: Application crashes when logging or displaying emojis, international characters, IPA symbols, or special formatting

This affects multiple areas including:

  • ✅ Video transcripts with phonetic symbols
  • ✅ Chat messages containing emojis or international text
  • ✅ Agent responses with Unicode formatting
  • ✅ Debug logging across the entire application
  • ✅ Error messages and stack traces

Common Error: UnicodeEncodeError: 'charmap' codec can't encode character '\uXXXX'

Solution

Configured UTF-8 encoding globally at application startup for Windows platforms. This ensures:

  • Consistent encoding across all output streams
  • Support for all Unicode characters (1.1M+ characters vs 256)
  • Cross-platform compatibility (matches Linux/macOS behavior)

Changes

  • Modified app.py to reconfigure sys.stdout and sys.stderr to UTF-8 on Windows
  • Applied at top of file before any imports or print statements
  • Includes fallback for older Python versions (<3.7)
  • Platform-specific fix (only applies on Windows)

Testing

  • ✅ Video processing with IPA phonetic symbols in transcripts
  • ✅ Chat messages with emojis and international characters
  • ✅ Debug logging with Unicode content
  • ✅ Verified no impact on Linux/macOS deployments

- Added explicit UTF-8 encoding when reading file content on Windows
- Prevents UnicodeDecodeError when processing non-ASCII filenames
- Ensures consistent file handling across different operating systems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant