Skip to content

feat: resolve org repos from DB in classify (no GitHub API call)#54

Merged
ghinks merged 4 commits into
mainfrom
feat/classify-org-from-db
May 1, 2026
Merged

feat: resolve org repos from DB in classify (no GitHub API call)#54
ghinks merged 4 commits into
mainfrom
feat/classify-org-from-db

Conversation

@ghinks
Copy link
Copy Markdown
Owner

@ghinks ghinks commented May 1, 2026

Summary

Closes #53

The classify command previously called get_org_repos (GitHub API) to expand --org into a list of repositories. This made classify require network access and a valid token even when all the data was already local.

Approach

repository_name is already stored in the DB as "owner/repo", so the org name is the prefix before /. Rather than adding a new column (as the issue suggested), we can query:

SELECT DISTINCT repository_name FROM pullrequest WHERE repository_name LIKE 'orgname/%'

This means:

  • No schema change — the information is already there
  • No migration needed

Changes

File Change
sqlite/database.py Add get_repos_for_org(org_name) — queries distinct repository_name values for the given org prefix
cli/app.py Add use_db_for_orgs: bool = False to _resolve_targets; classify passes True, fetch keeps the GitHub API path
tests/sqlite/test_database.py 4 new unit tests covering correct filtering, no prefix collisions (e.g. my-org vs my-org-extra), empty result, and deduplication

Test plan

  • 66 unit tests pass (uv run pytest -m "not integration")
  • get_repos_for_org("my-org") correctly filters my-org/repo-a but not my-org-extra/repo-b
  • Integration: review-classify classify --org my-org with data already fetched — confirm no GitHub API call is made

🤖 Generated with Claude Code

The classify command previously called get_org_repos (GitHub API) to
expand --org into a list of repositories, making it impossible to run
classify without network access.

Add get_repos_for_org() to database.py which queries DISTINCT
repository_name values matching the 'orgname/' prefix — no new column
needed since repository_name is already stored as 'owner/repo'.

Add use_db_for_orgs flag to _resolve_targets; classify passes True so
org expansion reads the local SQLite DB. The fetch command continues
to use the GitHub API (it needs to discover repos before any data is
stored).

Closes #53

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52543ab766

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/review_classification/cli/app.py
Comment on lines +126 to +130
statement = select(PullRequest).where(
PullRequest.repository_name.like(f"{org_name}/%") # type: ignore[union-attr]
)
prs = list(session.exec(statement).all())
return sorted({pr.repository_name for pr in prs})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Select distinct repo names instead of full PR rows

get_repos_for_org currently loads every matching PullRequest record and then deduplicates in Python, which scales poorly for large org histories (high memory and query/ORM overhead) and can noticeably slow classify --org. This helper should fetch only distinct repository_name values in SQL so org resolution remains lightweight regardless of PR volume.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be an issue. Most likely not very much of an issue though. I will look into it. :)

ghinks and others added 3 commits May 1, 2026 19:03
- Use col() from sqlmodel to get a proper column expression with .like()
  instead of calling .like() on the plain str field (attr-defined error)
- Remove now-unnecessary type: ignore[assignment] on the get_org_repos
  import in _resolve_targets (unused-ignore error)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Engine return type to patched_engine fixture and Engine parameter
type to all test functions that accept it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Select only the repository_name column with DISTINCT rather than
fetching full PullRequest rows and deduplicating in Python. The DB
now does both the filtering and the deduplication.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ghinks ghinks merged commit 9c34cc0 into main May 1, 2026
1 check passed
@ghinks ghinks deleted the feat/classify-org-from-db branch May 1, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limit use of _resolve_targets to the fetch command

1 participant