feat: resolve org repos from DB in classify (no GitHub API call) by ghinks · Pull Request #54 · ghinks/review-classification

ghinks · 2026-05-01T21:32:07Z

Summary

Closes #53

The classify command previously called get_org_repos (GitHub API) to expand --org into a list of repositories. This made classify require network access and a valid token even when all the data was already local.

Approach

repository_name is already stored in the DB as "owner/repo", so the org name is the prefix before /. Rather than adding a new column (as the issue suggested), we can query:

SELECT DISTINCT repository_name FROM pullrequest WHERE repository_name LIKE 'orgname/%'

This means:

No schema change — the information is already there
No migration needed

Changes

File	Change
`sqlite/database.py`	Add `get_repos_for_org(org_name)` — queries distinct `repository_name` values for the given org prefix
`cli/app.py`	Add `use_db_for_orgs: bool = False` to `_resolve_targets`; `classify` passes `True`, `fetch` keeps the GitHub API path
`tests/sqlite/test_database.py`	4 new unit tests covering correct filtering, no prefix collisions (e.g. `my-org` vs `my-org-extra`), empty result, and deduplication

Test plan

66 unit tests pass (uv run pytest -m "not integration")
get_repos_for_org("my-org") correctly filters my-org/repo-a but not my-org-extra/repo-b
Integration: review-classify classify --org my-org with data already fetched — confirm no GitHub API call is made

🤖 Generated with Claude Code

The classify command previously called get_org_repos (GitHub API) to expand --org into a list of repositories, making it impossible to run classify without network access. Add get_repos_for_org() to database.py which queries DISTINCT repository_name values matching the 'orgname/' prefix — no new column needed since repository_name is already stored as 'owner/repo'. Add use_db_for_orgs flag to _resolve_targets; classify passes True so org expansion reads the local SQLite DB. The fetch command continues to use the GitHub API (it needs to discover repos before any data is stored). Closes #53 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52543ab766

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-01T21:34:25Z

+        statement = select(PullRequest).where(
+            PullRequest.repository_name.like(f"{org_name}/%")  # type: ignore[union-attr]
+        )
+        prs = list(session.exec(statement).all())
+        return sorted({pr.repository_name for pr in prs})


Select distinct repo names instead of full PR rows

get_repos_for_org currently loads every matching PullRequest record and then deduplicates in Python, which scales poorly for large org histories (high memory and query/ORM overhead) and can noticeably slow classify --org. This helper should fetch only distinct repository_name values in SQL so org resolution remains lightweight regardless of PR volume.

Useful? React with 👍 / 👎.

Yes, this could be an issue. Most likely not very much of an issue though. I will look into it. :)

- Use col() from sqlmodel to get a proper column expression with .like() instead of calling .like() on the plain str field (attr-defined error) - Remove now-unnecessary type: ignore[assignment] on the get_org_repos import in _resolve_targets (unused-ignore error) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Engine return type to patched_engine fixture and Engine parameter type to all test functions that accept it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Select only the repository_name column with DISTINCT rather than fetching full PullRequest rows and deduplicating in Python. The DB now does both the filtering and the deduplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ghinks mentioned this pull request May 1, 2026

Limit use of _resolve_targets to the fetch command #53

Closed

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

ghinks and others added 3 commits May 1, 2026 19:03

fix: add missing type annotations in test_database.py for mypy

d585271

Add Engine return type to patched_engine fixture and Engine parameter type to all test functions that accept it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ghinks merged commit 9c34cc0 into main May 1, 2026
1 check passed

ghinks deleted the feat/classify-org-from-db branch May 1, 2026 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resolve org repos from DB in classify (no GitHub API call)#54

feat: resolve org repos from DB in classify (no GitHub API call)#54
ghinks merged 4 commits into
mainfrom
feat/classify-org-from-db

ghinks commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

ghinks May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ghinks commented May 1, 2026

Summary

Approach

Changes

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

ghinks May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant