fix: add explicit encoding='utf-8' to text-mode open() calls#7777
fix: add explicit encoding='utf-8' to text-mode open() calls#7777avoronov-explyt wants to merge 1 commit into
Conversation
|
@avoronov-explyt please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
t0ugh-sys
left a comment
There was a problem hiding this comment.
Code Review: fix: add explicit encoding='utf-8' to text-mode open() calls
Overall assessment: ✅ Looks good — this is a well-scoped and valuable fix.
Summary
This PR correctly addresses a real cross-platform issue. On systems where the default locale encoding is not UTF-8 (e.g., Windows with cp1252, cp950, cp1251, etc.), text-mode open() calls without an explicit encoding parameter can silently produce mojibake or raise UnicodeDecodeError. The changes are minimal, consistent, and targeted.
File-by-file observations
-
_docker_jupyter.py—open(path, "w", encoding="utf-8")- Writing HTML data that may contain non-ASCII characters. Correct fix.
-
chat_completion_client_recorder.py— Two changes:- Read (
json.load):open(..., "r", encoding="utf-8")— ensures JSON round-trip fidelity regardless of system locale. - Write (
json.dump):open(..., "w", encoding="utf-8")— ensures the session file is always written as UTF-8, which is the expected encoding for JSON. - ✅ Both directions are covered, preventing asymmetric encoding issues.
- Read (
-
page_logger.py— Three changes:- Hash file writing, call tree HTML, and page HTML all now use
encoding="utf-8". - These files may contain arbitrary user-generated text, so the fix is appropriate.
- Hash file writing, call tree HTML, and page HTML all now use
Minor suggestions (non-blocking)
-
Consistency note: This PR covers files in
autogen-ext. It might be worth a follow-up audit of other packages (autogen-core,autogen-agentchat, etc.) to ensure no similaropen()calls are missing explicit encoding. A lint rule (e.g.,pylint'sunspecified-encoding/ W1514) could help enforce this project-wide. -
pathlibconsideration (optional, not a blocker): Some of theseopen()calls could alternatively usepathlib.Path(...).write_text(data, encoding="utf-8")/.read_text(encoding="utf-8")for slightly cleaner code, but that's a style preference and not required for this fix.
Verdict
LGTM. The changes are correct, safe, and well-motivated by the PR description. The fix prevents a class of bugs that only manifests on non-UTF-8 default locales, making this a meaningful improvement for international users.
Several
open()calls in autogen-ext use text mode without specifyingencoding='utf-8'. On systems where the default encoding is not UTF-8 (e.g., cp950 on Chinese Windows, cp1251 on Russian Windows), these calls will raiseUnicodeDecodeErroror produce mojibake.Affected files:
_docker_jupyter.py: HTML output writingchat_completion_client_recorder.py: session file read/writepage_logger.py: hash file, call tree HTML, page HTML writingThis PR adds
encoding='utf-8'to all text-modeopen()calls in these files, ensuring consistent behavior across all platforms.