RepoReaper is a high-performance, Go-based security research tool designed for mass acquisition and triage of potentially vulnerable repositories across GitHub, GitLab, and Bitbucket. Built for speed and scale using goroutines and concurrent processing.
This tool is intended for security research and educational purposes only. Use responsibly and only on repositories you have permission to scan. The authors are not responsible for misuse.
- GitHub: Leverages GitHub Code Search API with token rotation
- GitLab: Supports both gitlab.com and self-hosted instances
- Bitbucket: Repository search with authentication
- Concurrent crawling across all platforms simultaneously
- Rate limiting and token rotation to avoid API bans
- Uses
git clone --mirrorfor efficient repository acquisition - Spawns configurable worker pools using goroutines
- Timeout protection (2 minutes per clone)
- Automatic deduplication of repositories
- Filename detection: Scans for suspicious files like
.env,credentials,id_rsa,secrets.yml - Pattern matching: High-speed regex detection for:
- AWS Access Keys (
AKIA[0-9A-Z]{16}) - GitHub Tokens (
ghp_,gho_) - GitLab Tokens (
glpat-) - Slack Tokens
- Google API Keys
- Private SSH Keys
- JWT Tokens
- Database connection strings
- And 15+ more patterns
- AWS Access Keys (
- Fast scanning: Only scans text files, skips binaries
- Size limits: Skips files over 10MB for performance
- Produces structured JSON output with all findings
- Includes repository metadata (stars, last updated, URLs)
- Detailed findings with file paths, line numbers, and matched patterns
- Go 1.21 or higher
- Git installed and available in PATH
- API tokens for the platforms you want to scan
# Clone the repository
git clone https://github.com/tal7aouy/reporeaper.git
cd reporeaper
# Download dependencies
go mod download
# Build the binary
go build -o reporeaper .
# Or install globally
go installCreate a config.json file based on the example:
cp config.example.json config.jsonEdit config.json with your API credentials:
{
"github": {
"enabled": true,
"tokens": [
"ghp_YOUR_GITHUB_TOKEN_HERE"
]
},
"gitlab": {
"enabled": false,
"tokens": ["glpat-YOUR_TOKEN"],
"base_urls": ["https://gitlab.com"]
},
"bitbucket": {
"enabled": false,
"username": "your_username",
"password": "your_app_password"
},
"search_keywords": [
"config", "deploy", "internal", "prod", "aws_keys"
],
"clone_directory": "./repos",
"max_repos_per_api": 100
}GitHub:
- Go to Settings β Developer settings β Personal access tokens
- Generate new token with
repoandpublic_reposcopes - Add multiple tokens for rotation to avoid rate limits
GitLab:
- Go to User Settings β Access Tokens
- Create token with
read_apiandread_repositoryscopes
Bitbucket:
- Go to Personal settings β App passwords
- Create app password with repository read permissions
./reporeaper./reporeaper -config config.json -output results.json -workers 20Flags:
-config: Path to configuration file (default:config.json)-output: Path to output JSON file (default:suspicious_repos.json)-workers: Number of concurrent workers (default:10)
π RepoReaper - Starting mass repository scan
Workers: 10 | Output: suspicious_repos.json
π Crawling GitHub...
π Crawling GitLab...
GitHub keyword 'aws_keys': 45 unique repos
GitLab keyword 'credentials': 23 unique repos
π Total repositories discovered: 68
π [Worker 1] Processing: acme/internal-config
π¨ [Worker 1] SUSPICIOUS: acme/internal-config (Findings: 3)
β [Worker 2] Clean: example/public-repo
πΎ Results saved to: suspicious_repos.json
β¨ RepoReaper scan complete![
{
"repository": {
"platform": "github",
"full_name": "acme/internal-config",
"clone_url": "https://github.com/acme/internal-config.git",
"html_url": "https://github.com/acme/internal-config",
"stars": 12,
"updated_at": "2024-03-20T10:30:00Z"
},
"suspicious": true,
"findings": [
{
"type": "secret_pattern",
"file_path": "config/aws.yml",
"line_number": 15,
"match": "AKIAIOSFODNN7EXAMPLE",
"description": "AWS Access Key detected"
},
{
"type": "suspicious_filename",
"file_path": ".env",
"description": "Suspicious filename detected: .env"
}
]
}
]RepoReaper/
βββ main.go # Entry point, orchestration
βββ pkg/
β βββ config/ # Configuration management
β β βββ config.go
β βββ crawler/ # API crawlers
β β βββ types.go
β β βββ github.go # GitHub API integration
β β βββ gitlab.go # GitLab API integration
β β βββ bitbucket.go # Bitbucket API integration
β βββ cloner/ # Repository cloning
β β βββ cloner.go # Concurrent git operations
β βββ scanner/ # Triage scanning
β βββ scanner.go # Pattern matching & detection
- Low bandwidth: Use 5-10 workers
- High bandwidth: Use 20-50 workers
- Rate limiting: Reduce workers or add more API tokens
- Repositories are cloned with
--mirror --depth 1to minimize disk usage - Large files (>10MB) are automatically skipped
- Scanning stops at 10,000 lines per file
- GitHub: 30 requests/minute per token (use multiple tokens)
- GitLab: 10 requests/second per token
- Bitbucket: Varies by account type
- Never commit
config.jsonwith real tokens - Use read-only tokens with minimal permissions
- Rotate tokens regularly
- Review findings manually before taking action
- Respect rate limits and terms of service
RepoReaper scans for 18+ secret patterns including:
- AWS credentials (access keys, secret keys)
- Cloud provider tokens (GCP, Azure)
- Version control tokens (GitHub, GitLab, Bitbucket)
- Communication platform tokens (Slack, Discord)
- Payment processor keys (Stripe)
- Database connection strings
- Private SSH/SSL keys
- JWT tokens
- Generic API keys and passwords
Contributions are welcome! Areas for improvement:
- Additional API platform support
- More secret detection patterns
- Performance optimizations
- Better error handling
MIT License - See LICENSE file for details
Built with β€οΈ for security researchers by tal7aouy