Skip to content

ENH: Optimize GitHub API usage for issues metadata backup #3

@mmcky

Description

@mmcky

Summary

The issues metadata backup feature (added in v0.3.0) works but has performance and rate limit concerns that need addressing before enabling in production.

Current Implementation

  • Uses PyGithub REST API
  • 1 API call per page of issues (30 issues/page)
  • 1 API call per issue to fetch comments
  • For a repo with 100 issues → ~100+ API calls

Test Results

Backing up QuantEcon.manual (29 issues):

  • Time: ~38 seconds (~1.3s per issue)
  • API calls: ~30
  • Output: 43 KB JSON file

Concerns for Full Org Backup

  • QuantEcon has ~100 active repos
  • If average 50 issues per repo = 5,000+ API calls
  • GitHub Actions GITHUB_TOKEN limit: 1,000 requests/hour
  • Could easily hit rate limits

Proposed Solutions

Option 1: GraphQL API (Recommended)

Use GitHub GraphQL API to fetch issues + comments in a single query per repo.

query {
  repository(owner: "QuantEcon", name: "quantecon-py") {
    issues(first: 100, states: [OPEN, CLOSED]) {
      nodes {
        number
        title
        body
        comments(first: 100) {
          nodes { author { login } body createdAt }
        }
      }
    }
  }
}

Benefits:

  • Single request per repo (with pagination)
  • Dramatically fewer API calls
  • Faster execution

Option 2: Add include_comments config option

backup_metadata:
  issues: true
  include_comments: false  # Skip comments, much faster

Option 3: Rate limit handling

Add retry logic with exponential backoff when rate limited.

Current Status

  • Feature implemented and tested ✅
  • Default disabled (issues: false) until optimized
  • Config includes warning comment about API usage

Related

  • Issues backup JSON schema is finalized and working
  • Markdown recovery utility planned for future

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions