feat: resolve org repos from DB in classify (no GitHub API call)#54
Conversation
The classify command previously called get_org_repos (GitHub API) to expand --org into a list of repositories, making it impossible to run classify without network access. Add get_repos_for_org() to database.py which queries DISTINCT repository_name values matching the 'orgname/' prefix — no new column needed since repository_name is already stored as 'owner/repo'. Add use_db_for_orgs flag to _resolve_targets; classify passes True so org expansion reads the local SQLite DB. The fetch command continues to use the GitHub API (it needs to discover repos before any data is stored). Closes #53 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52543ab766
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| statement = select(PullRequest).where( | ||
| PullRequest.repository_name.like(f"{org_name}/%") # type: ignore[union-attr] | ||
| ) | ||
| prs = list(session.exec(statement).all()) | ||
| return sorted({pr.repository_name for pr in prs}) |
There was a problem hiding this comment.
Select distinct repo names instead of full PR rows
get_repos_for_org currently loads every matching PullRequest record and then deduplicates in Python, which scales poorly for large org histories (high memory and query/ORM overhead) and can noticeably slow classify --org. This helper should fetch only distinct repository_name values in SQL so org resolution remains lightweight regardless of PR volume.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Yes, this could be an issue. Most likely not very much of an issue though. I will look into it. :)
- Use col() from sqlmodel to get a proper column expression with .like() instead of calling .like() on the plain str field (attr-defined error) - Remove now-unnecessary type: ignore[assignment] on the get_org_repos import in _resolve_targets (unused-ignore error) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Engine return type to patched_engine fixture and Engine parameter type to all test functions that accept it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Select only the repository_name column with DISTINCT rather than fetching full PullRequest rows and deduplicating in Python. The DB now does both the filtering and the deduplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Closes #53
The
classifycommand previously calledget_org_repos(GitHub API) to expand--orginto a list of repositories. This madeclassifyrequire network access and a valid token even when all the data was already local.Approach
repository_nameis already stored in the DB as"owner/repo", so the org name is the prefix before/. Rather than adding a new column (as the issue suggested), we can query:This means:
Changes
sqlite/database.pyget_repos_for_org(org_name)— queries distinctrepository_namevalues for the given org prefixcli/app.pyuse_db_for_orgs: bool = Falseto_resolve_targets;classifypassesTrue,fetchkeeps the GitHub API pathtests/sqlite/test_database.pymy-orgvsmy-org-extra), empty result, and deduplicationTest plan
uv run pytest -m "not integration")get_repos_for_org("my-org")correctly filtersmy-org/repo-abut notmy-org-extra/repo-breview-classify classify --org my-orgwith data already fetched — confirm no GitHub API call is made🤖 Generated with Claude Code