Skip to content

tal7aouy/reporeaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ RepoReaper

RepoReaper is a high-performance, Go-based security research tool designed for mass acquisition and triage of potentially vulnerable repositories across GitHub, GitLab, and Bitbucket. Built for speed and scale using goroutines and concurrent processing.

⚠️ Disclaimer

This tool is intended for security research and educational purposes only. Use responsibly and only on repositories you have permission to scan. The authors are not responsible for misuse.

🎯 Features

1. Multi-Platform API Crawler

  • GitHub: Leverages GitHub Code Search API with token rotation
  • GitLab: Supports both gitlab.com and self-hosted instances
  • Bitbucket: Repository search with authentication
  • Concurrent crawling across all platforms simultaneously
  • Rate limiting and token rotation to avoid API bans

2. High-Speed Concurrent Cloner

  • Uses git clone --mirror for efficient repository acquisition
  • Spawns configurable worker pools using goroutines
  • Timeout protection (2 minutes per clone)
  • Automatic deduplication of repositories

3. Intelligent Triage Scanner

  • Filename detection: Scans for suspicious files like .env, credentials, id_rsa, secrets.yml
  • Pattern matching: High-speed regex detection for:
    • AWS Access Keys (AKIA[0-9A-Z]{16})
    • GitHub Tokens (ghp_, gho_)
    • GitLab Tokens (glpat-)
    • Slack Tokens
    • Google API Keys
    • Private SSH Keys
    • JWT Tokens
    • Database connection strings
    • And 15+ more patterns
  • Fast scanning: Only scans text files, skips binaries
  • Size limits: Skips files over 10MB for performance

4. JSON Output

  • Produces structured JSON output with all findings
  • Includes repository metadata (stars, last updated, URLs)
  • Detailed findings with file paths, line numbers, and matched patterns

πŸš€ Installation

Prerequisites

  • Go 1.21 or higher
  • Git installed and available in PATH
  • API tokens for the platforms you want to scan

Build from Source

# Clone the repository
git clone https://github.com/tal7aouy/reporeaper.git
cd reporeaper

# Download dependencies
go mod download

# Build the binary
go build -o reporeaper .

# Or install globally
go install

βš™οΈ Configuration

Create a config.json file based on the example:

cp config.example.json config.json

Edit config.json with your API credentials:

{
  "github": {
    "enabled": true,
    "tokens": [
      "ghp_YOUR_GITHUB_TOKEN_HERE"
    ]
  },
  "gitlab": {
    "enabled": false,
    "tokens": ["glpat-YOUR_TOKEN"],
    "base_urls": ["https://gitlab.com"]
  },
  "bitbucket": {
    "enabled": false,
    "username": "your_username",
    "password": "your_app_password"
  },
  "search_keywords": [
    "config", "deploy", "internal", "prod", "aws_keys"
  ],
  "clone_directory": "./repos",
  "max_repos_per_api": 100
}

Getting API Tokens

GitHub:

  1. Go to Settings β†’ Developer settings β†’ Personal access tokens
  2. Generate new token with repo and public_repo scopes
  3. Add multiple tokens for rotation to avoid rate limits

GitLab:

  1. Go to User Settings β†’ Access Tokens
  2. Create token with read_api and read_repository scopes

Bitbucket:

  1. Go to Personal settings β†’ App passwords
  2. Create app password with repository read permissions

πŸ’» Usage

Basic Usage

./reporeaper

Advanced Options

./reporeaper -config config.json -output results.json -workers 20

Flags:

  • -config: Path to configuration file (default: config.json)
  • -output: Path to output JSON file (default: suspicious_repos.json)
  • -workers: Number of concurrent workers (default: 10)

Example Output

πŸ” RepoReaper - Starting mass repository scan
Workers: 10 | Output: suspicious_repos.json
🌐 Crawling GitHub...
🌐 Crawling GitLab...
GitHub keyword 'aws_keys': 45 unique repos
GitLab keyword 'credentials': 23 unique repos
πŸ“Š Total repositories discovered: 68
πŸ”„ [Worker 1] Processing: acme/internal-config
🚨 [Worker 1] SUSPICIOUS: acme/internal-config (Findings: 3)
βœ“ [Worker 2] Clean: example/public-repo
πŸ’Ύ Results saved to: suspicious_repos.json
✨ RepoReaper scan complete!

Output Format

[
  {
    "repository": {
      "platform": "github",
      "full_name": "acme/internal-config",
      "clone_url": "https://github.com/acme/internal-config.git",
      "html_url": "https://github.com/acme/internal-config",
      "stars": 12,
      "updated_at": "2024-03-20T10:30:00Z"
    },
    "suspicious": true,
    "findings": [
      {
        "type": "secret_pattern",
        "file_path": "config/aws.yml",
        "line_number": 15,
        "match": "AKIAIOSFODNN7EXAMPLE",
        "description": "AWS Access Key detected"
      },
      {
        "type": "suspicious_filename",
        "file_path": ".env",
        "description": "Suspicious filename detected: .env"
      }
    ]
  }
]

πŸ—οΈ Architecture

RepoReaper/
β”œβ”€β”€ main.go                 # Entry point, orchestration
β”œβ”€β”€ pkg/
β”‚   β”œβ”€β”€ config/            # Configuration management
β”‚   β”‚   └── config.go
β”‚   β”œβ”€β”€ crawler/           # API crawlers
β”‚   β”‚   β”œβ”€β”€ types.go
β”‚   β”‚   β”œβ”€β”€ github.go      # GitHub API integration
β”‚   β”‚   β”œβ”€β”€ gitlab.go      # GitLab API integration
β”‚   β”‚   └── bitbucket.go   # Bitbucket API integration
β”‚   β”œβ”€β”€ cloner/            # Repository cloning
β”‚   β”‚   └── cloner.go      # Concurrent git operations
β”‚   └── scanner/           # Triage scanning
β”‚       └── scanner.go     # Pattern matching & detection

πŸ”§ Performance Tuning

Optimize Workers

  • Low bandwidth: Use 5-10 workers
  • High bandwidth: Use 20-50 workers
  • Rate limiting: Reduce workers or add more API tokens

Memory Management

  • Repositories are cloned with --mirror --depth 1 to minimize disk usage
  • Large files (>10MB) are automatically skipped
  • Scanning stops at 10,000 lines per file

API Rate Limits

  • GitHub: 30 requests/minute per token (use multiple tokens)
  • GitLab: 10 requests/second per token
  • Bitbucket: Varies by account type

πŸ›‘οΈ Security Best Practices

  1. Never commit config.json with real tokens
  2. Use read-only tokens with minimal permissions
  3. Rotate tokens regularly
  4. Review findings manually before taking action
  5. Respect rate limits and terms of service

πŸ“Š Detection Patterns

RepoReaper scans for 18+ secret patterns including:

  • AWS credentials (access keys, secret keys)
  • Cloud provider tokens (GCP, Azure)
  • Version control tokens (GitHub, GitLab, Bitbucket)
  • Communication platform tokens (Slack, Discord)
  • Payment processor keys (Stripe)
  • Database connection strings
  • Private SSH/SSL keys
  • JWT tokens
  • Generic API keys and passwords

🀝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional API platform support
  • More secret detection patterns
  • Performance optimizations
  • Better error handling

πŸ“ License

MIT License - See LICENSE file for details

πŸ”— Resources


Built with ❀️ for security researchers by tal7aouy

About

πŸ”₯ A high-performance, Go-based security research tool designed for mass acquisition and triage of potentially vulnerable repositories across GitHub, GitLab, and Bitbucket

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors