Skip to content

feat: return uploaded file URL from Repo.upload and upload_files#688

Open
singhvishalkr wants to merge 3 commits into
DagsHub:mainfrom
singhvishalkr:feat/upload-return-url
Open

feat: return uploaded file URL from Repo.upload and upload_files#688
singhvishalkr wants to merge 3 commits into
DagsHub:mainfrom
singhvishalkr:feat/upload-return-url

Conversation

@singhvishalkr

@singhvishalkr singhvishalkr commented Apr 21, 2026

Copy link
Copy Markdown

Summary

Both Repo.upload() and the module-level upload_files() now return the DagsHub UI URL where the uploaded file(s) can be viewed.

Fixes #238

Changes

  • Repo.upload() returns str — the browsable DagsHub URL constructed via the existing get_files_ui_url() helper
  • upload_files() (module-level convenience wrapper) passes through that return value
  • Added Returns section to both docstrings

Motivation

Currently Repo.upload() returns None, forcing callers to manually reconstruct the URL if they want to link to the uploaded content. The get_files_ui_url() helper was already used internally for directory upload logging — this change exposes the same URL to callers.

Testing

Verified that get_files_ui_url produces the correct {owner}/{reponame}/src/{branch}/{path} URL by tracing the existing directory upload log message. No new dependencies introduced.

Fixes DagsHub#238

Both Repo.upload() and the module-level upload_files() now return the
DagsHub UI URL where the uploaded file(s) can be viewed. The URL is
constructed from the existing get_files_ui_url() helper, which was
already used internally for directory upload logging.
@dagshub

dagshub Bot commented Apr 21, 2026

Copy link
Copy Markdown

@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Updated dagshub/upload/wrapper.py: upload_files() and Repo.upload() now declare -> Optional[str] and return the upload result. For bucket uploads the method returns None; for non-bucket uploads it returns the computed DagsHub file-browser URL (using kwargs["new_branch"] or self.branch).

Changes

Upload wrapper changes

Layer / File(s) Summary
Upload function and Repo.upload signature
dagshub/upload/wrapper.py
upload_files(...) and Repo.upload(...) now declare -> Optional[str] and include Returns: docstrings. upload_files returns the value from repo_obj.upload(...).
Repo.upload behavior (bucket vs non-bucket)
dagshub/upload/wrapper.py
Repo.upload returns None immediately when bucket=True (uses upload_files_to_bucket(...)). For non-bucket uploads it normalizes remote_path, chooses directory vs file upload, and returns the computed DagsHub file-browser URL (uses kwargs["new_branch"] or self.branch).
Repository URL generation change
dagshub/upload/wrapper.py
get_repo_url(...) now URL-encodes the directory argument with safe="/", preserving path separators in generated URLs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped along the upload trail,
A URL caught on my whiskered tail,
For buckets I nod and quietly sleep,
For files I point where they live and keep —
Hop, click, and find your data tale.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: both Repo.upload and upload_files now return a URL string for uploaded content.
Linked Issues check ✅ Passed The PR fully addresses issue #238 by making Repo.upload return the DagsHub file-browser URL, providing callers convenient access to the uploaded file link.
Out of Scope Changes check ✅ Passed All changes are scoped to the upload functionality: return type declarations, docstring Updates, URL generation logic, and path normalization—all directly supporting the objective to return upload URLs.
Description check ✅ Passed The pull request description clearly explains the changes: both Repo.upload() and upload_files() now return a DagsHub UI URL instead of None, with docstrings updated accordingly, addressing the linked issue #238.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dagshub/upload/wrapper.py (1)

331-341: ⚠️ Potential issue | 🟠 Major

Return the correct URL for each upload target.

Line 341 unconditionally returns a repo source URL using self.branch, which is incorrect for two cases:

  1. When bucket=True, the method uploads to the S3 bucket but returns a repo URL that won't show the uploaded files.
  2. When new_branch is passed in kwargs, the files are uploaded to that branch but the returned URL points to self.branch instead.

However, a bucket UI URL helper method does not currently exist in the codebase. Determine whether bucket uploads should return a URL and what that should be before implementing. For the new_branch case, use kwargs.get("new_branch") with get_repo_url(FILES_UI_URL, remote_path, branch=...) to return the correct branch URL.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae242e7d-88ef-4c8e-8b96-0c7a530b682e

📥 Commits

Reviewing files that changed from the base of the PR and between 474f305 and 1b1b2e3.

📒 Files selected for processing (1)
  • dagshub/upload/wrapper.py
📜 Review details
🔇 Additional comments (1)
dagshub/upload/wrapper.py (1)

187-213: LGTM: wrapper now propagates the upload URL.

The module-level upload_files(...) signature, docstring, and return repo_obj.upload(...) change align with the new public contract.

singhvishalkr added a commit to singhvishalkr/client that referenced this pull request Apr 22, 2026
Addresses CodeRabbit review on DagsHub#688:
- Repo.upload(..., new_branch=...) now returns the URL for the target branch instead of self.branch.
- Repo.upload(..., bucket=True) and upload_files(..., bucket=True) now return None because there is no canonical file-browser URL for S3 bucket uploads.
- Signature + docstring updated to Optional[str].
@singhvishalkr

Copy link
Copy Markdown
Author

Thanks @coderabbitai -- both catches are valid. Pushed cee12b6 addressing them:

  1. new_branch kwarg: when Repo.upload(..., new_branch="branch-x") is called, the returned URL now uses branch-x instead of self.branch. Implementation threads kwargs.get("new_branch") or self.branch into get_repo_url(FILES_UI_URL, remote_path, branch=...).
  2. bucket=True: there is no canonical file-browser URL for S3 bucket uploads, so the function now returns None in that case rather than a misleading repo source URL. Signature changed to Optional[str] and the docstring calls this out explicitly.

Ready for re-review.

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull up the latest version of the file to verify the implementation.


🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs cat -n

Length of output: 45737


@singhvishalkr: Both fixes look correct in cee12b6. Here's what I verified:

1. new_branch kwarg (Lines 345–346):

target_branch = kwargs.get("new_branch") or self.branch
return self.get_repo_url(FILES_UI_URL, remote_path, branch=target_branch)

The resolution order is correct — explicit new_branch takes priority over self.branch, and kwargs are already forwarded through both code paths (single-file and directory) before reaching this line. The edge case where Repo.upload_files internally promotes a branch on BranchNotFoundError is also handled correctly since kwargs.get("new_branch") falls back to self.branch which is already set.

2. bucket=True (Lines 334–336):

if bucket:
    self.upload_files_to_bucket(local_path, remote_path, **kwargs)
    return None

Clean early return. Both signatures correctly reflect -> Optional[str] and the docstrings call out the None-for-bucket behaviour explicitly. ✅

3. Type consistency: Optional[str] is correctly applied to both the module-level upload_files (Line 194) and Repo.upload (Line 300). ✅

No issues found. The implementation is solid.

[approve]

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dagshub/upload/wrapper.py (1)

334-336: Optional: consider returning the S3 URI for bucket uploads instead of None.

Returning None for bucket uploads is consistent with the docstring, but callers that want a pointer to the uploaded data still have to reconstruct it themselves (the same motivation that led to this PR). A canonical s3://<bucket>/<remote_path> URI (or the DagsHub storage bucket browser URL, if one exists) would make the return type more useful and avoid the asymmetry where one branch of upload() returns a locator and the other returns nothing. Not blocking — just worth considering as a follow-up.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d5c79e2-3a87-49bd-a5a1-bf09775e9431

📥 Commits

Reviewing files that changed from the base of the PR and between 1b1b2e3 and cee12b6.

📒 Files selected for processing (1)
  • dagshub/upload/wrapper.py

Comment thread dagshub/upload/wrapper.py
singhvishalkr added a commit to singhvishalkr/client that referenced this pull request Apr 22, 2026
Address CodeRabbit review on DagsHub#688:

1. get_repo_url now uses urllib.parse.quote(path, safe="/") so nested
   paths like data/images/foo.png are rendered as path separators in
   the file-browser URL instead of percent-encoded (%2F), which
   DagsHub's file browser cannot resolve.

2. remote_path is normalized with posixpath.normpath() at entry of
   Repo.upload, matching the normalization applied by
   DataSet._clean_directory_name, so inputs like "foo/../bar" or
   "./data" resolve to the same destination as the returned URL.

Signed-off-by: Vishal Kumar Singh <vishal.kr.singh2021@gmail.com>
@singhvishalkr

singhvishalkr commented Apr 22, 2026

Copy link
Copy Markdown
Author

@coderabbitai thanks for the scan. Pushed 2a191ec addressing the major URL issue: get_repo_url now uses urllib.parse.quote(directory, safe="/"), so nested paths such as data/images/foo.png render as path separators instead of %2F, matching DagsHub's file-browser URL format (/{owner}/{repo}/src/{branch}/{path}). The CONTENT_UPLOAD_URL caller now benefits from the same behavior.

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull the latest state of the file to verify both fixes.


🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs cat -n

Length of output: 45758


@singhvishalkr: Both fixes in 2a191ec are verified and correct.

1. Slash preservation — get_repo_url (Lines 596–607):

path=urllib.parse.quote(directory, safe="/"),

Applied centrally in get_repo_url, so it benefits both FILES_UI_URL (the new return value) and CONTENT_UPLOAD_URL (the existing REST upload path) callers in one shot. Clean.

2. Path normalization — Repo.upload (Line 332):

remote_path = posixpath.normpath(cast(str, Path(remote_path).as_posix()))

Placed after the if not remote_path: resolution block and before the bucket / non-bucket branch split, so it normalizes once for all paths. Matches DataSet._clean_directory_name (Line 797) exactly. ✅

3. Bucket test compatibility: The test_bucket_upload parametrized case with remote_path="new_data/" still passes — after normpath it becomes "new_data", and os.path.join("new_data", relative_path) in upload_files_to_bucket produces the same expected S3 paths. ✅

4. All prior fixes intact: new_branch kwarg resolution (Lines 345–346), bucket=True early return None (Lines 334–336), and Optional[str] signatures (Lines 194, 300). ✅

The declination of s3:// URI is well-reasoned — keeping None as an honest signal until a proper bucket browser URL is available is the right call.

No issues found. The implementation is solid.

[approve]

Addresses CodeRabbit review on DagsHub#688:
- Repo.upload(..., new_branch=...) now returns the URL for the target branch instead of self.branch.
- Repo.upload(..., bucket=True) and upload_files(..., bucket=True) now return None because there is no canonical file-browser URL for S3 bucket uploads.
- Signature + docstring updated to Optional[str].
Address CodeRabbit review on DagsHub#688:

1. get_repo_url now uses urllib.parse.quote(path, safe="/") so nested
   paths like data/images/foo.png are rendered as path separators in
   the file-browser URL instead of percent-encoded (%2F), which
   DagsHub's file browser cannot resolve.

2. remote_path is normalized with posixpath.normpath() at entry of
   Repo.upload, matching the normalization applied by
   DataSet._clean_directory_name, so inputs like "foo/../bar" or
   "./data" resolve to the same destination as the returned URL.

Signed-off-by: Vishal Kumar Singh <vishal.kr.singh2021@gmail.com>
@singhvishalkr

Copy link
Copy Markdown
Author

Coderabbit has re-scanned and the inline suggestions from 2a191ec are all resolved on its side. Ready for a human review whenever someone has cycles.

@singhvishalkr

Copy link
Copy Markdown
Author

Latest head should address the earlier Major-tier URL/branch correctness items (see push 2a191ec and follow-ups). The remaining CodeRabbit note is optional (None for bucket uploads vs a canonical s3:// URI). Happy to add the URI if the team wants that contract; otherwise I would treat it as follow-up and rely on a human pass for merge readiness.

@singhvishalkr

Copy link
Copy Markdown
Author

@coderabbitai please re-review the latest head: bucket uploads intentionally return None, and repo uploads use kwargs.get('new_branch') or self.branch for the FILES_UI URL.

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull the latest state of the file to verify.


🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs cat -n

Length of output: 45759


@singhvishalkr: Re-review complete on the latest head. All key items verified:

Item Location Status
bucket=Truereturn None Lines 334–336
kwargs.get("new_branch") or self.branch for URL Lines 345–346
Optional[str] on both upload_files & Repo.upload Lines 194, 300
urllib.parse.quote(directory, safe="/") in get_repo_url Lines 596–607
posixpath.normpath(...) before bucket/non-bucket split Line 332
Module-level upload_files propagates the return value Line 214
Docstrings document None-for-bucket behaviour Lines 208–211, 315–319

No issues found. The implementation is solid and ready for human merge review. 🐇✨

[approve]

@singhvishalkr

Copy link
Copy Markdown
Author

The latest commits are ahead of the CodeRabbit run that left CHANGES_REQUESTED (that run compared through cee12b682361; head is newer). Nitpick-only feedback was optional S3 URL follow-up.

@coderabbitai please rerun review on head so stale CHANGES_REQUESTED clears if nothing blocks.

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown

@singhvishalkr: Understood — triggering a fresh review run on the current head now. All items verified in the previous passes remain solid; this will clear the stale CHANGES_REQUESTED gate.

✅ Actions performed

Full review triggered.

@singhvishalkr

Copy link
Copy Markdown
Author

I checked the current head. Repo.upload() and upload_files() already return the browsable URL for file uploads. I am leaving the optional bucket-upload follow-up untouched for now.

@singhvishalkr singhvishalkr force-pushed the feat/upload-return-url branch from 2a191ec to 799b2c6 Compare May 25, 2026 17:04
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs grep -n "quote\|get_repo_url\|FILES_UI_URL\|Optional\|new_branch\|normpath\|bucket" | head -60

Length of output: 5662



@singhvishalkr: Both fixes in 2a191ec are verified and correct.

1. Slash preservation — get_repo_url (Line 605):

path=urllib.parse.quote(directory, safe="/"),

Applied centrally in get_repo_url, so it benefits both FILES_UI_URL (the new return value) and CONTENT_UPLOAD_URL (the existing REST upload path) callers in one shot. Clean.

2. Path normalization — Repo.upload (Line 332):

remote_path = posixpath.normpath(cast(str, Path(remote_path).as_posix()))

Placed after the if not remote_path: resolution block and before the bucket / non-bucket branch split, so it normalizes once for all paths. Matches DataSet._clean_directory_name (Line 797) exactly. ✅

3. Bucket test compatibility: The test_bucket_upload parametrized case with remote_path="new_data/" still passes — after normpath it becomes "new_data", and os.path.join("new_data", relative_path) in upload_files_to_bucket produces the same expected S3 paths. ✅

4. All prior fixes intact: new_branch kwarg resolution (Lines 345–346), bucket=True early return None (Lines 334–336), and Optional[str] signatures (Lines 194, 300). ✅

The declination of s3:// URI is well-reasoned — keeping None as an honest signal until a proper bucket browser URL is available is the right call.

No issues found. The implementation is solid.

[approve]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repo.upload method will return url of that uploaded file in dagshub/upload/wrapper.py

1 participant