Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81795acff8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR updates KohakuHub’s repository APIs to better match HuggingFace Hub expectations, improving revision metadata (siblings), path handling, and error semantics, and refactoring duplicated endpoint logic.
Changes:
- Added shared utilities for repo path normalization and path existence checks; updated
/treeto return HF-styleEntryNotFounderrors instead of empty lists for missing paths. - Refactored
paths-infointo a shared implementation and added a GET variant for query-string based clients. - Updated revision metadata to include HF-style
siblings(including LFS metadata) and simplified commit listing responses/commit date formatting.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/kohakuhub/api/repo/routers/tree.py |
Adds path normalization/existence helpers, HF-style entry-not-found errors for tree listing, and a GET paths-info alias using shared implementation. |
src/kohakuhub/api/files.py |
Adds build_revision_siblings and returns HF-compatible siblings/files in the revision response. |
src/kohakuhub/api/commit/routers/history.py |
Adds commit date formatting helper and changes the commits endpoint to return a flat list instead of a pagination envelope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| logger.success(f"Returned {len(commits)} commits for {repo_id}/{branch}") | ||
| return response | ||
| return commits |
| """Build HuggingFace-compatible siblings metadata for a revision.""" | ||
| client = get_lakefs_client() | ||
| siblings = [] | ||
| all_results = [] | ||
| after = "" | ||
| has_more = True | ||
|
|
||
| while has_more: | ||
| result = await client.list_objects( | ||
| repository=lakefs_repo, | ||
| ref=revision, | ||
| prefix="", | ||
| delimiter="", | ||
| amount=1000, | ||
| after=after, | ||
| ) | ||
|
|
||
| all_results.extend(result["results"]) | ||
|
|
||
| if result.get("pagination") and result["pagination"].get("has_more"): | ||
| after = result["pagination"]["next_offset"] | ||
| has_more = True | ||
| else: | ||
| has_more = False | ||
|
|
||
| file_objects = [obj for obj in all_results if obj["path_type"] == "object"] | ||
| lfs_files = [ | ||
| obj | ||
| for obj in file_objects | ||
| if should_use_lfs(repo, obj["path"], obj.get("size_bytes", 0)) | ||
| ] | ||
|
|
||
| file_records = {} | ||
| for obj in lfs_files: | ||
| try: | ||
| record = get_file(repo, obj["path"]) | ||
| if record: | ||
| file_records[obj["path"]] = record | ||
| except Exception: | ||
| continue | ||
|
|
||
| for obj in file_objects: | ||
| sibling = { | ||
| "rfilename": obj["path"], | ||
| "size": obj.get("size_bytes", 0), | ||
| } | ||
|
|
||
| if should_use_lfs(repo, obj["path"], obj.get("size_bytes", 0)): | ||
| file_record = file_records.get(obj["path"]) | ||
| checksum = ( | ||
| file_record.sha256 | ||
| if file_record and file_record.sha256 | ||
| else obj.get("checksum", "") | ||
| ) | ||
| sibling["lfs"] = { | ||
| "sha256": checksum, | ||
| "size": obj.get("size_bytes", 0), | ||
| "pointerSize": 134, | ||
| } | ||
|
|
||
| siblings.append(sibling) | ||
|
|
| file_records = {} | ||
| for obj in lfs_files: | ||
| try: | ||
| record = get_file(repo, obj["path"]) | ||
| if record: |
| if is_lakefs_not_found_error(e) and is_lakefs_revision_error(e): | ||
| return hf_revision_not_found(repo_id, revision) |
| if not log_result or not log_result.get("results"): | ||
| logger.warning(f"No commits found for {lakefs_repo}/{branch}") | ||
| return { | ||
| "commits": [], | ||
| "hasMore": False, | ||
| "nextCursor": None, | ||
| } | ||
| return [] | ||
|
|
This pull request introduces several improvements to HuggingFace Hub compatibility in KohakuHub, focusing on commit hash normalization, response formatting, and expanded API endpoints. The most important changes include normalizing commit hashes to 40-character hexadecimal values on HF-compatible surfaces, updating documentation and response headers, and adding new endpoints for path information. These updates ensure better interoperability with HuggingFace clients and improve error handling and metadata consistency.
HuggingFace Compatibility & Commit Hash Normalization:
format_hf_commit_hashused throughout). [1] [2] [3] [4] [5]API Response & Metadata Improvements:
siblingsandfilesfields with HuggingFace-compatible file lists, including LFS metadata. [1] [2]Error Handling & Path Info Endpoints:
hf_entry_not_foundhandling. [1] [2]/paths-info/{revision}to support query-string based path info requests, improving compatibility with HF clients. [1] [2]Codebase & Utility Enhancements:
These changes collectively improve HuggingFace compatibility, metadata accuracy, and error handling in KohakuHub.This pull request introduces several improvements to the repository API, focusing on compatibility with HuggingFace Hub, enhanced file and path handling, and improved commit and revision metadata. The changes include new utility functions, refactoring of endpoints for better code reuse, and more robust error handling for path and entry lookups.
HuggingFace Compatibility & Metadata Improvements:
_format_commit_dateutility to standardize commit timestamps for HuggingFace compatibility inhistory.py.siblingsmetadata, including LFS file handling, and updated the response to include bothsiblingsandfiles. [1] [2] [3]Path and Entry Handling Enhancements:
normalize_repo_pathandpath_exists_in_revisionutilities to standardize path handling and check for file/directory existence at a given revision.API Endpoint Refactoring & Code Reuse:
paths-infoendpoints to use a shared implementation (get_paths_info_impl), supporting both POST and GET methods for better compatibility and maintainability. [1] [2] [3]Commit Listing Response Simplification:
Minor Improvements:
These changes collectively improve HuggingFace compatibility, error handling, and code maintainability in the repository API.# Pull Request
What changed?
Why?
Fixes #
Testing
Checklist
Screenshots