Avoid normalizing cloudpathlib providers#8869
Avoid normalizing cloudpathlib providers#8869johannesc wants to merge 1 commit intomarimo-team:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
There was a problem hiding this comment.
Pull request overview
This PR updates path normalization behavior to avoid corrupting cloud/URI-style paths (e.g., s3://...) when used with cloudpathlib providers, enabling external providers without relying on module-name checks.
Changes:
- Replaces the
cloudpathlibmodule-prefix check innormalize_path()with anisinstance(..., pathlib.Path)guard so only local filesystemPathobjects are normalized. - Leaves non-
Pathpath-like objects unchanged to preventos.path.normpathfrom rewriting URI schemes.
| # Skip normalization for cloud paths (e.g., S3Path, GCSPath, AzurePath) | ||
| # os.path.normpath corrupts URI schemes like s3:// by reducing them to s3:/ | ||
| if path.__class__.__module__.startswith("cloudpathlib"): | ||
| if not isinstance(path, Path): | ||
| return path |
There was a problem hiding this comment.
normalize_path is still annotated as def normalize_path(path: Path) -> Path, but it now explicitly returns the input unchanged when it is not a pathlib.Path. That means the function can return non-Path objects (e.g. cloudpathlib paths, mocks), which contradicts the type hints and docstring contract and can confuse callers/static type checkers.
Consider updating the signature/docs to reflect the broader accepted/returned types (e.g., via overloads/TypeVar or a widened union type), and clarify in the comment that the early-return is intentionally for non-local/URI-like path implementations (not just cloudpathlib).
Bundle ReportChanges will increase total bundle size by 52 bytes (0.0%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
|
|
I have read the CLA Document and I hereby sign the CLA |
JiwaniZakir
left a comment
There was a problem hiding this comment.
The fix works correctly today because cloudpathlib.CloudPath does not inherit from pathlib.Path, so not isinstance(path, Path) correctly short-circuits for cloud paths. However, the guard is now broader than intended: any object that isn't a pathlib.Path subclass will be silently returned unchanged, which could mask bugs where an unexpected type is passed in. The original __module__ check was fragile (string-based), but it was at least explicit about the intent.
Worth noting: the docstring comment on line 74 still says # Skip normalization for cloud paths (e.g., S3Path, GCSPath, AzurePath), which is now misleading since the condition no longer has any cloudpathlib-specific knowledge. If cloudpathlib ever changes CloudPath to subclass pathlib.Path (unlikely but not impossible), this check would silently stop protecting cloud paths.
A middle ground worth considering: keep the isinstance check but add a narrower fallback, or at least update the comment to reflect that this now protects against any non-pathlib.Path subclass, not just cloudpathlib types. A test covering a mock CloudPath-like object that doesn't inherit pathlib.Path would also help ensure this assumption doesn't regress silently.
Cloudpathlib providers typically have a path like "s3://my-bucket/folder/" which, when normalized is converted to "s3:/my-bucket/folder/" which will not work. Instead of checking the module, which prevents external providers, isinstance() is now used to detect regular Path objects. Closes marimo-team#8868
I changed to using |
Cloudpathlib providers typically have a path like
"s3://my-bucket/folder/" which, when normalized is converted to "s3:/my-bucket/folder/" which will not work.
Instead of checking the module, which prevents external providers, isinstance() is now used to detect regular Path objects.
📝 Summary
Closes #8868
🔍 Description of Changes
For more details see #8868.
📋 Checklist
I have read the CLA Document and I hereby sign the CLA