Skip to content

Avoid normalizing cloudpathlib providers#8869

Open
johannesc wants to merge 1 commit intomarimo-team:mainfrom
johannesc:main
Open

Avoid normalizing cloudpathlib providers#8869
johannesc wants to merge 1 commit intomarimo-team:mainfrom
johannesc:main

Conversation

@johannesc
Copy link
Copy Markdown

@johannesc johannesc commented Mar 25, 2026

Cloudpathlib providers typically have a path like
"s3://my-bucket/folder/" which, when normalized is converted to "s3:/my-bucket/folder/" which will not work.

Instead of checking the module, which prevents external providers, isinstance() is now used to detect regular Path objects.

📝 Summary

Closes #8868

🔍 Description of Changes

For more details see #8868.

📋 Checklist

  • I have read the contributor guidelines.
  • ruff check and ruff format pass on changed files
  • Pull request title is a good summary of the changes - it will be used in the release notes.

I have read the CLA Document and I hereby sign the CLA

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 27, 2026 0:50am

Request Review

@mscolnick mscolnick requested a review from Copilot March 25, 2026 14:46
@github-actions
Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates path normalization behavior to avoid corrupting cloud/URI-style paths (e.g., s3://...) when used with cloudpathlib providers, enabling external providers without relying on module-name checks.

Changes:

  • Replaces the cloudpathlib module-prefix check in normalize_path() with an isinstance(..., pathlib.Path) guard so only local filesystem Path objects are normalized.
  • Leaves non-Path path-like objects unchanged to prevent os.path.normpath from rewriting URI schemes.

Comment on lines 74 to 77
# Skip normalization for cloud paths (e.g., S3Path, GCSPath, AzurePath)
# os.path.normpath corrupts URI schemes like s3:// by reducing them to s3:/
if path.__class__.__module__.startswith("cloudpathlib"):
if not isinstance(path, Path):
return path
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_path is still annotated as def normalize_path(path: Path) -> Path, but it now explicitly returns the input unchanged when it is not a pathlib.Path. That means the function can return non-Path objects (e.g. cloudpathlib paths, mocks), which contradicts the type hints and docstring contract and can confuse callers/static type checkers.

Consider updating the signature/docs to reflect the broader accepted/returned types (e.g., via overloads/TypeVar or a widened union type), and clarify in the comment that the early-return is intentionally for non-local/URI-like path implementations (not just cloudpathlib).

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Bundle Report

Changes will increase total bundle size by 52 bytes (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
marimo-esm 25.59MB 52 bytes (0.0%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/add-*.js 52 bytes 55.39kB 0.09%

@johannesc
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix works correctly today because cloudpathlib.CloudPath does not inherit from pathlib.Path, so not isinstance(path, Path) correctly short-circuits for cloud paths. However, the guard is now broader than intended: any object that isn't a pathlib.Path subclass will be silently returned unchanged, which could mask bugs where an unexpected type is passed in. The original __module__ check was fragile (string-based), but it was at least explicit about the intent.

Worth noting: the docstring comment on line 74 still says # Skip normalization for cloud paths (e.g., S3Path, GCSPath, AzurePath), which is now misleading since the condition no longer has any cloudpathlib-specific knowledge. If cloudpathlib ever changes CloudPath to subclass pathlib.Path (unlikely but not impossible), this check would silently stop protecting cloud paths.

A middle ground worth considering: keep the isinstance check but add a narrower fallback, or at least update the comment to reflect that this now protects against any non-pathlib.Path subclass, not just cloudpathlib types. A test covering a mock CloudPath-like object that doesn't inherit pathlib.Path would also help ensure this assumption doesn't regress silently.

@mscolnick mscolnick added the bug Something isn't working label Mar 25, 2026
@mscolnick mscolnick requested a review from Copilot March 25, 2026 18:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Cloudpathlib providers typically have a path like
"s3://my-bucket/folder/" which, when normalized is converted to
"s3:/my-bucket/folder/" which will not work.

Instead of checking the module, which prevents external providers,
isinstance() is now used to detect regular Path objects.

Closes marimo-team#8868
@johannesc
Copy link
Copy Markdown
Author

The fix works correctly today because cloudpathlib.CloudPath does not inherit from pathlib.Path, so not isinstance(path, Path) correctly short-circuits for cloud paths. However, the guard is now broader than intended: any object that isn't a pathlib.Path subclass will be silently returned unchanged, which could mask bugs where an unexpected type is passed in. The original __module__ check was fragile (string-based), but it was at least explicit about the intent.

Worth noting: the docstring comment on line 74 still says # Skip normalization for cloud paths (e.g., S3Path, GCSPath, AzurePath), which is now misleading since the condition no longer has any cloudpathlib-specific knowledge. If cloudpathlib ever changes CloudPath to subclass pathlib.Path (unlikely but not impossible), this check would silently stop protecting cloud paths.

A middle ground worth considering: keep the isinstance check but add a narrower fallback, or at least update the comment to reflect that this now protects against any non-pathlib.Path subclass, not just cloudpathlib types. A test covering a mock CloudPath-like object that doesn't inherit pathlib.Path would also help ensure this assumption doesn't regress silently.

I changed to using if "CloudPath" in [cls.__name__ for cls in type(path).__mro__] instead which is a bit tighter check. Also note that there is another pull-request (#8879) which perhaps is better (if so I can just close this one)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Not possible to use Path extended from cloudpathlib

5 participants