Skip to content

Fix/upload#12

Open
MigoXV wants to merge 4 commits intoKohakuBlueleaf:mainfrom
MigoXV:fix/upload
Open

Fix/upload#12
MigoXV wants to merge 4 commits intoKohakuBlueleaf:mainfrom
MigoXV:fix/upload

Conversation

@MigoXV
Copy link
Copy Markdown
Contributor

@MigoXV MigoXV commented Mar 17, 2026

This pull request introduces several improvements to HuggingFace Hub compatibility in KohakuHub, focusing on commit hash normalization, response formatting, and expanded API endpoints. The most important changes include normalizing commit hashes to 40-character hexadecimal values on HF-compatible surfaces, updating documentation and response headers, and adding new endpoints for path information. These updates ensure better interoperability with HuggingFace clients and improve error handling and metadata consistency.

HuggingFace Compatibility & Commit Hash Normalization:

  • All HF-compatible API responses now expose commit hashes as 40-character hexadecimal values, regardless of internal LakeFS commit ID length. This affects metadata fields, response headers, and file endpoints (format_hf_commit_hash used throughout). [1] [2] [3] [4] [5]
  • Documentation updated to clarify hash normalization and response formats, including explicit notes about 40-character commit hashes and header changes. [1] [2] [3] [4] [5]

API Response & Metadata Improvements:

  • Revision metadata endpoints now return siblings and files fields with HuggingFace-compatible file lists, including LFS metadata. [1] [2]
  • Commit listing endpoints use improved date formatting and author metadata for better HF compatibility. [1] [2]

Error Handling & Path Info Endpoints:

  • Tree listing and path info endpoints now provide detailed HF-compatible error responses for non-existent paths, including new hf_entry_not_found handling. [1] [2]
  • Added new GET endpoint for /paths-info/{revision} to support query-string based path info requests, improving compatibility with HF clients. [1] [2]

Codebase & Utility Enhancements:

  • Introduced utility functions for path normalization and existence checking, used across tree and path info endpoints for robust behavior. [1] [2]
  • Refactored commit history endpoint to return a simple list of commits and improved pagination handling. [1] [2]

These changes collectively improve HuggingFace compatibility, metadata accuracy, and error handling in KohakuHub.This pull request introduces several improvements to the repository API, focusing on compatibility with HuggingFace Hub, enhanced file and path handling, and improved commit and revision metadata. The changes include new utility functions, refactoring of endpoints for better code reuse, and more robust error handling for path and entry lookups.

HuggingFace Compatibility & Metadata Improvements:

  • Added _format_commit_date utility to standardize commit timestamps for HuggingFace compatibility in history.py.
  • Refactored the revision endpoint to build HuggingFace-compatible siblings metadata, including LFS file handling, and updated the response to include both siblings and files. [1] [2] [3]

Path and Entry Handling Enhancements:

  • Introduced normalize_repo_path and path_exists_in_revision utilities to standardize path handling and check for file/directory existence at a given revision.
  • Improved error handling in the repo tree endpoint to return HuggingFace-style entry-not-found errors instead of empty lists for non-existent paths, and added path existence checks for empty results. [1] [2]

API Endpoint Refactoring & Code Reuse:

  • Refactored the paths-info endpoints to use a shared implementation (get_paths_info_impl), supporting both POST and GET methods for better compatibility and maintainability. [1] [2] [3]

Commit Listing Response Simplification:

  • Simplified the commit listing endpoint to return a flat list of commits instead of a paginated response object, and updated commit metadata formatting. [1] [2] [3]

Minor Improvements:

  • Added missing imports and fixed minor issues to support new functionality and maintain code consistency. [1] [2] [3]

These changes collectively improve HuggingFace compatibility, error handling, and code maintainability in the repository API.# Pull Request

What changed?

Why?

Fixes #

Testing

  • Tested locally
  • Tested in Docker (if relevant)

Checklist

  • Code follows project style (see CONTRIBUTING.md)
  • Updated docs if needed (README, API.md, CLI.md, etc.)
  • No breaking changes (or documented them)
  • Tested my changes

Screenshots

Copilot AI review requested due to automatic review settings March 17, 2026 11:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81795acff8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/kohakuhub/api/commit/routers/history.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates KohakuHub’s repository APIs to better match HuggingFace Hub expectations, improving revision metadata (siblings), path handling, and error semantics, and refactoring duplicated endpoint logic.

Changes:

  • Added shared utilities for repo path normalization and path existence checks; updated /tree to return HF-style EntryNotFound errors instead of empty lists for missing paths.
  • Refactored paths-info into a shared implementation and added a GET variant for query-string based clients.
  • Updated revision metadata to include HF-style siblings (including LFS metadata) and simplified commit listing responses/commit date formatting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/kohakuhub/api/repo/routers/tree.py Adds path normalization/existence helpers, HF-style entry-not-found errors for tree listing, and a GET paths-info alias using shared implementation.
src/kohakuhub/api/files.py Adds build_revision_siblings and returns HF-compatible siblings/files in the revision response.
src/kohakuhub/api/commit/routers/history.py Adds commit date formatting helper and changes the commits endpoint to return a flat list instead of a pagination envelope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 142 to +143
logger.success(f"Returned {len(commits)} commits for {repo_id}/{branch}")
return response
return commits
Comment on lines +276 to +337
"""Build HuggingFace-compatible siblings metadata for a revision."""
client = get_lakefs_client()
siblings = []
all_results = []
after = ""
has_more = True

while has_more:
result = await client.list_objects(
repository=lakefs_repo,
ref=revision,
prefix="",
delimiter="",
amount=1000,
after=after,
)

all_results.extend(result["results"])

if result.get("pagination") and result["pagination"].get("has_more"):
after = result["pagination"]["next_offset"]
has_more = True
else:
has_more = False

file_objects = [obj for obj in all_results if obj["path_type"] == "object"]
lfs_files = [
obj
for obj in file_objects
if should_use_lfs(repo, obj["path"], obj.get("size_bytes", 0))
]

file_records = {}
for obj in lfs_files:
try:
record = get_file(repo, obj["path"])
if record:
file_records[obj["path"]] = record
except Exception:
continue

for obj in file_objects:
sibling = {
"rfilename": obj["path"],
"size": obj.get("size_bytes", 0),
}

if should_use_lfs(repo, obj["path"], obj.get("size_bytes", 0)):
file_record = file_records.get(obj["path"])
checksum = (
file_record.sha256
if file_record and file_record.sha256
else obj.get("checksum", "")
)
sibling["lfs"] = {
"sha256": checksum,
"size": obj.get("size_bytes", 0),
"pointerSize": 134,
}

siblings.append(sibling)

Comment on lines +308 to +312
file_records = {}
for obj in lfs_files:
try:
record = get_file(repo, obj["path"])
if record:
Comment on lines +421 to +422
if is_lakefs_not_found_error(e) and is_lakefs_revision_error(e):
return hf_revision_not_found(repo_id, revision)
Comment on lines 96 to 99
if not log_result or not log_result.get("results"):
logger.warning(f"No commits found for {lakefs_repo}/{branch}")
return {
"commits": [],
"hasMore": False,
"nextCursor": None,
}
return []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants