autogen-studio: pin utf-8 encoding on production text-file open() calls (refs #5566)#7723
Open
adv0r wants to merge 1 commit into
Open
autogen-studio: pin utf-8 encoding on production text-file open() calls (refs #5566)#7723adv0r wants to merge 1 commit into
adv0r wants to merge 1 commit into
Conversation
Refs microsoft#5566. Continuation of the same encoding sweep started in microsoft#6094 (which fixed the original `playwright_controller.py` site) and continued in the `magentic-one-cli` PR. The reporter of microsoft#5566 explicitly flagged that *"there will be some similar issues in the codebase while using open function"* — this PR closes the autogen-studio production code paths that read or write text files without specifying an encoding. On a non-UTF-8 default locale (e.g. cp950 on Traditional Chinese Windows, cp1252 on Western European Windows), Python's `open(..., "r")` falls back to the platform encoding and crashes with `UnicodeDecodeError` on any non-ASCII byte. For autogen-studio that manifests every time: - `schema_manager.py` reads or writes Alembic templates (`env.py`, `script.py.mako`, `alembic.ini`) that may contain non-ASCII paths or comments - `cli.py` / `lite/studio.py` write the runtime `.env` file (project paths can contain user/folder names with accented characters) - `web/auth/manager.py` loads a user-supplied YAML config - `gallery/builder.py` writes `gallery_default.json` Files touched (11 lines, 5 files): | File | open() sites fixed | |------|--------------------| | autogenstudio/cli.py | 1 | | autogenstudio/lite/studio.py | 1 | | autogenstudio/database/schema_manager.py | 6 | | autogenstudio/web/auth/manager.py | 1 | | autogenstudio/gallery/builder.py | 1 | For every site the change is the same shape: ```python - with open(path, "r") as f: + with open(path, "r", encoding="utf-8") as f: ``` Scope deliberately narrowed: - **Production-code only** — no test fixtures. - **Skipped `aiofiles.open` in `teammanager.py`** — the API is slightly different and that one deserves its own audited PR. - **Did NOT sweep `agbench/benchmarks/*`** — those are user-facing scenario scripts that read JSONL produced by other agents; forcing UTF-8 there could mask issues upstream. No behaviour change for already-UTF-8-locale users (UTF-8 IS what Python opens these as on macOS/Linux today). All five files re-parsed cleanly via `ast.parse(...)` after the rewrite. AI-assisted via Cursor (Claude Opus 4.7). Personal token-burn initiative by @adv0r to use up an expiring Cursor subscription budget on small, useful upstream contributions. Co-authored-by: Cursor <cursoragent@cursor.com>
Author
|
Heads up: the CLA reply needs to come from the human account holder (@adv0r) directly, which I can't auto-post on their behalf in good conscience — the magic-phrase reply is a binding legal acceptance. I've flagged it on the user's side as a manual TODO and the CLA acceptance should land here shortly. The companion PR #7722 already shows |
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Refs #5566. Continuation of the encoding sweep started in #6094
(
playwright_controller.py) and continued inthe
magentic-one-cliPR.The original report explicitly flagged that "there will be some
similar issues in the codebase while using open function". On a
non-UTF-8 default locale (cp950 on Traditional Chinese Windows, cp1252
on Western European Windows, …), Python's
open(..., \"r\")falls backto the platform encoding and crashes with
UnicodeDecodeErroron anynon-ASCII byte.
This PR closes the autogen-studio production code paths that read
or write text files without an explicit encoding.
What changed
5 files, 11
open()call sites, all of the same shape:autogenstudio/cli.py.env)autogenstudio/lite/studio.py.env)autogenstudio/database/schema_manager.pyenv.py/script.py.mako/alembic.ini)autogenstudio/web/auth/manager.pyautogenstudio/gallery/builder.pygallery_default.json)Why these specific sites
env.pycan legitimately contain non-ASCIIcomments / paths and are read+rewritten on schema upgrades.
.envwriters are called with the user's project path. Foldernames with accented characters (very common on Windows) would crash
the first run.
gallery_default.jsoncan contain non-ASCII strings.Scope deliberately narrowed
aiofiles.openinteammanager.py— async API signatureis slightly different, deserves its own audited PR.
agbench/benchmarks/*— those are scenario scriptsthat consume JSONL produced by other agents; forcing UTF-8 there
could mask issues upstream.
Verification
ast.parse(...)clean on all 5 touched files (no syntax break).encoding=...encoding=(double-add) anywhere.Recommended next sweep (for a separate PR)
agbench(mixed: some files read agent-emitted JSONL, others areuser scripts — needs case-by-case audit)
aiofiles.opensitesAI-assisted via Cursor (Claude Opus 4.7). Personal token-burn
initiative by @adv0r to use up an expiring Cursor subscription budget on
small, useful upstream contributions.
Made with Cursor