Skip to content

refactor: prune orphaned utility variants#399

Merged
jerry609 merged 2 commits intodevfrom
refactor/pr4-prune-orphaned-utils
Mar 15, 2026
Merged

refactor: prune orphaned utility variants#399
jerry609 merged 2 commits intodevfrom
refactor/pr4-prune-orphaned-utils

Conversation

@jerry609
Copy link
Owner

@jerry609 jerry609 commented Mar 15, 2026

Summary

  • delete orphaned utility files under src/paperbot/utils that have no surviving import chain in the repo
  • remove invalid / backup-style variants (CCS-DOWN.py, downloader - ccs.py, *_back.py, *_new.py) and several conference / experiment helpers that were only referenced by those dead variants or not referenced at all
  • tighten paperbot.utils.__init__ docs to reflect the utilities that are still part of the live surface
  • add a contract test to keep backup-style / invalid utility filenames from reappearing

Validation

  • python -m pytest -q tests/unit/test_utils_cleanup_contracts.py
  • python -m pytest -q tests/unit/test_retry_helper_async.py tests/unit/test_api_security_middleware.py
  • rg check confirms no remaining imports reference the removed utility modules

Notes

  • tests/test_conference_agent_stats.py is already stale on origin/dev and still imports a dead top-level agents.* module; it was not changed in this PR.

Summary by CodeRabbit

  • Revert

    • Removed conference paper downloading and parsing capabilities for IEEE S&P, NDSS, USENIX Security, and ACM CCS conferences.
    • Removed keyword optimization utilities and smart batch download management features.
  • Tests

    • Added cleanup validation tests to enforce removal of obsolete modules and prevent stale references.

Copilot AI review requested due to automatic review settings March 15, 2026 05:19
@vercel
Copy link

vercel bot commented Mar 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
paper-bot Ready Ready Preview, Comment Mar 15, 2026 6:48am

@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This change removes multiple redundant and backup utility modules for paper downloading, conference parsing, and keyword optimization. A new test file enforces that removed utilities are not imported elsewhere and validates naming conventions for remaining utility modules.

Changes

Cohort / File(s) Summary
Paper Downloading Utilities
src/paperbot/utils/CCS-DOWN.py, src/paperbot/utils/downloader - ccs.py, src/paperbot/utils/downloader_back.py, src/paperbot/utils/smart_downloader.py
Removed multiple paper downloader implementations (~2,592 lines total) including session management, async download workflows with retry logic, content validation, and progress tracking for multiple conferences (IEEE SP, NDSS, USENIX, ACM CCS).
Conference Parsing Utilities
src/paperbot/utils/conference_parsers.py, src/paperbot/utils/conference_parsers_new.py, src/paperbot/utils/conference_helpers.py, src/paperbot/utils/conference_downloader.py
Removed redundant conference-specific web scraping and parsing modules (~923 lines total) that extracted paper metadata (titles, PDFs, DOIs) via BeautifulSoup, including duplicate implementations and helper utilities.
ACM & Keyword Utilities
src/paperbot/utils/acm_extractor.py, src/paperbot/utils/keyword_optimizer.py
Removed ACM paper extraction functionality (~288 lines) and keyword optimization system including query expansion, LLM-assisted rewriting, and security-focused query builders (~491 lines).
Utilities Module Export
src/paperbot/utils/__init__.py
Updated docstring from Chinese "包含:" to English "暴露当前仍在主代码路径中使用的通用工具:" and removed references to deleted utility modules; no code logic changes.
Cleanup Test Enforcement
tests/unit/test_utils_cleanup_contracts.py
Added new test module to enforce removal of stale utilities, verify no imports of deleted modules, and validate naming conventions for remaining utility files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Suggested reviewers

  • ThankUYou
  • wen-placeholder

Poem

🐰 Hop, hop, hooray! The cleanup crew,
Swept away the old, left only the new,
Backup files and duplicates gone,
The codebase is cleaner, more lean and strong! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and concisely describes the main change: removing orphaned utility variants that no longer have active import chains in the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/pr4-prune-orphaned-utils
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 15, 2026

Vercel Preview

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the src/paperbot/utils module by removing a large number of unused, orphaned, and improperly named utility files. The primary goal is to enhance code clarity, reduce technical debt, and improve the overall maintainability of the project. By eliminating dead code and enforcing stricter naming conventions, the project becomes easier to navigate and understand for current and future developers. A new test ensures these cleanup efforts are sustained.

Highlights

  • Codebase Cleanup: Orphaned utility files and backup-style variants within the src/paperbot/utils directory have been deleted to streamline the codebase.
  • Documentation Update: The paperbot.utils.__init__ docstring was updated to accurately reflect the utilities that are still actively used and exposed.
  • New Contract Test: A new contract test was added to prevent the reintroduction of backup-style or invalid utility filenames in the future, ensuring code hygiene.
  • Removal of Unreferenced Helpers: Several conference and experiment helper files that were either unreferenced or only used by the removed variants have also been removed.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/paperbot/utils/CCS-DOWN.py
    • Removed an orphaned downloader utility file.
  • src/paperbot/utils/init.py
    • Updated the module's docstring to accurately list exposed utilities.
  • src/paperbot/utils/acm_extractor.py
    • Removed an unused ACM paper extraction utility.
  • src/paperbot/utils/conference_downloader.py
    • Removed an unused conference downloader base class.
  • src/paperbot/utils/conference_helpers.py
    • Removed an unused conference helper utility.
  • src/paperbot/utils/conference_parsers.py
    • Removed an unused conference parsing utility.
  • src/paperbot/utils/conference_parsers_new.py
    • Removed another unused conference parsing utility variant.
  • src/paperbot/utils/downloader - ccs.py
    • Removed a backup-style downloader utility file with an invalid filename.
  • src/paperbot/utils/downloader_back.py
    • Removed a backup version of the downloader utility.
  • src/paperbot/utils/downloader_ccs.py
    • Removed a specific CCS downloader utility variant.
  • src/paperbot/utils/experiment_metrics.py
    • Removed an unused experiment metrics calculation utility.
  • src/paperbot/utils/experiment_runner.py
    • Removed an unused experiment runner utility.
  • src/paperbot/utils/keyword_optimizer.py
    • Removed an unused keyword optimizer utility.
  • src/paperbot/utils/smart_downloader.py
    • Removed an unused smart download manager utility.
  • tests/unit/test_utils_cleanup_contracts.py
    • Added a new contract test to ensure orphaned and invalid utility files are absent.
Activity
  • The author, jerry609, created this pull request to refactor and clean up utility files.
  • The PR description includes a summary of changes, validation steps, and notes regarding stale tests.
  • No human comments or reviews have been recorded in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cleans up src/paperbot/utils by removing unused/orphaned utility modules and adding a small contract test to prevent backup-style or invalid utility filenames from creeping back into the repo.

Changes:

  • Deleted multiple orphaned/variant utility modules under src/paperbot/utils.
  • Updated paperbot.utils.__init__ docstring to describe the remaining public surface.
  • Added a unit “cleanup contract” test to assert removed files stay removed and to flag invalid utility filenames.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_utils_cleanup_contracts.py Adds contract tests to ensure removed utility variants stay deleted and to prevent invalid utility filenames.
src/paperbot/utils/init.py Updates module docstring describing the utils surface.
src/paperbot/utils/smart_downloader.py Removes an orphaned smart download manager utility.
src/paperbot/utils/keyword_optimizer.py Removes an unused keyword/query optimization utility.
src/paperbot/utils/experiment_runner.py Removes an unused experiment runner helper.
src/paperbot/utils/experiment_metrics.py Removes an unused lightweight metrics helper.
src/paperbot/utils/downloader_ccs.py Removes a CCS-specific downloader variant.
src/paperbot/utils/downloader_back.py Removes a backup-style downloader variant.
src/paperbot/utils/conference_parsers_new.py Removes an unused “new” conference parser variant.
src/paperbot/utils/conference_parsers.py Removes an unused conference parser module.
src/paperbot/utils/conference_helpers.py Removes an unused conference helper module.
src/paperbot/utils/conference_downloader.py Removes an unused conference downloader base class.
src/paperbot/utils/acm_extractor.py Removes an unused ACM extractor helper.
src/paperbot/utils/CCS-DOWN.py Removes an invalidly-named utility variant file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +35 to +40
name = path.name
if " " in name or name != name.lower():
invalid.append(name)
if name.endswith("_back.py") or name.endswith("_new.py"):
invalid.append(name)

Comment on lines +5 to 10
暴露当前仍在主代码路径中使用的通用工具:
- logger: 日志配置
- downloader: 论文下载器
- retry_helper: 重试机制
- json_parser: JSON 解析
- text_processing: 文本处理
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a substantial refactoring that removes a large number of orphaned and backup utility files, which is a great improvement for maintainability. The addition of a contract test to prevent similar issues in the future is also an excellent practice. However, I've identified a critical issue where the deletion of downloader variants has broken the main downloader.py by removing a method it depends on. Please see the specific comment for details.




async def _parse_ccs_papers(self, base_url: str, year: str) -> List[Dict[str, Any]]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The deletion of this method (and its variants in other removed files) breaks the functionality of src/paperbot/utils/downloader.py.

The get_conference_papers method in downloader.py still contains a call to self._parse_ccs_papers for the 'ccs' conference type. This will now raise an AttributeError at runtime.

To resolve this, please consider one of the following:

  • Merge the implementation of _parse_ccs_papers from one of the deleted files into the PaperDownloader class in downloader.py.
  • Remove the logic for handling the 'ccs' conference from get_conference_papers if it is no longer supported.

This is a critical issue as it breaks existing functionality.

@sonarqubecloud
Copy link

@jerry609 jerry609 merged commit 5a02ab3 into dev Mar 15, 2026
15 of 16 checks passed
@jerry609 jerry609 deleted the refactor/pr4-prune-orphaned-utils branch March 15, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants