docs: record Hugging Face corpus audit findings by gaelic-ghost · Pull Request #24 · gaelic-ghost/SwiftlyFetch

gaelic-ghost · 2026-05-31T16:51:36Z

Summary

record the larger bounded Hugging Face corpus audit run and maintainer decision
link the findings from the fixture corpus note
mark the larger corpus audit and temp-directory cleanup decision complete in the roadmap

Verification

swift test
scripts/repo-maintenance/validate-all.sh
HF_CORPUS_AUDIT_TINYSTORIES_LENGTH=100 HF_CORPUS_AUDIT_SIMPLEWIKI_LENGTH=100 HF_CORPUS_AUDIT_POETRY_LENGTH=100 scripts/repo-maintenance/run-huggingface-corpus-audit.sh

coderabbitai · 2026-05-31T16:51:45Z

Warning

Review limit reached

@gaelic-ghost, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 24 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 088a461d-8d9b-4a13-9dab-2d275b696f2a

📥 Commits

Reviewing files that changed from the base of the PR and between 00867d7 and f591b62.

📒 Files selected for processing (3)

ROADMAP.md
docs/maintainers/fixture-corpus.md
docs/maintainers/huggingface-corpus-audit-findings.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/corpus-audit-findings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f591b620b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T16:53:47Z

 - [x] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end.
 - [x] Validate the current refinement pass against a broader checked-in fixture corpus with near-miss ranking and longer-body snippet cases.
- [ ] Validate whether the current refinement pass is enough for ordinary app callers against larger real app corpora.
+- [x] Validate whether the current refinement pass is enough for ordinary app callers against larger real app corpora.


Keep real-app corpus validation open

This newly checked roadmap item claims the refinement pass has been validated against larger real app corpora, but the recorded audit only uses the first 100 rows from three public Hugging Face datasets and the new findings doc explicitly says it does not stand in for a real app's private corpus. When maintainers use the roadmap to decide what Milestone 4 work remains, this marks a validation gap as completed even though the documented evidence says the next signal is still a caller-owned corpus.

Useful? React with 👍 / 👎.

docs: record Hugging Face corpus audit findings

f591b62

gaelic-ghost added the docs label May 31, 2026

gaelic-ghost merged commit a31dd02 into main May 31, 2026
2 checks passed

gaelic-ghost deleted the docs/corpus-audit-findings branch May 31, 2026 16:53

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: record Hugging Face corpus audit findings#24

docs: record Hugging Face corpus audit findings#24
gaelic-ghost merged 1 commit into
mainfrom
docs/corpus-audit-findings

gaelic-ghost commented May 31, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026

Review limit reached

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gaelic-ghost commented May 31, 2026

Summary

Verification

Uh oh!

coderabbitai Bot commented May 31, 2026

Review limit reached

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant