CodeGuard AI Testing Guide

This guide will help you test all the functionality of CodeGuard AI using the sample vulnerable code provided.

📋 What's Been Created

Sample Vulnerable Code (`sample_vulnerable_code/`)

Four Python files with intentionally vulnerable code:

File	Vulnerabilities	Description
`sql_injection.py`	4 patterns	String concat, f-strings, .format(), % operator
`xss_vulnerabilities.py`	6 patterns	Reflected, stored, DOM-based XSS
`command_injection.py`	8 patterns	os.system(), subprocess, eval(), exec()
`path_traversal.py`	9 patterns	File access, uploads, archives

Total: 27+ vulnerabilities to detect

Testing Tools

test_sample_code.py - Quick local vulnerability scanner test
create_test_repo.sh - Automated GitHub test repository setup
sample_vulnerable_code/README.md - Detailed documentation

🧪 Testing Methods

Method 1: Quick Local Test (Fastest - 30 seconds)

Test just the vulnerability detection without E2B or GitHub:

# Run the local test script
python test_sample_code.py

What it does:

Scans all sample files
Detects vulnerabilities using regex patterns
Shows detailed results for each file
Compares against expected counts
✅ No API keys required
✅ No GitHub needed
❌ Doesn't test exploit generation or MCP integration

Expected output:

📄 Scanning: sql_injection.py
✅ Found 4 vulnerabilities:
  🔴 SQL_INJECTION
     Line 16: query = "SELECT * FROM users WHERE username='" + username...
  ...

SUMMARY
Files scanned: 4
Total vulnerabilities found: 27+
🎉 All expected vulnerabilities detected!

Method 2: Manual GitHub PR Test (Moderate - 5 minutes)

Test the full pipeline with a real GitHub PR:

Step 1: Create a test repository

# Using GitHub CLI
gh repo create codeguard-test --public
cd /tmp
git clone https://github.com/<your-username>/codeguard-test
cd codeguard-test

Step 2: Add vulnerable code

# Copy sample files to your repo
mkdir api
cp ~/path/to/CodeGuardAI/sample_vulnerable_code/sql_injection.py ./api/database.py
cp ~/path/to/CodeGuardAI/sample_vulnerable_code/xss_vulnerabilities.py ./api/search.py

# Create a PR
git checkout -b add-api-endpoints
git add api/
git commit -m "Add API endpoints"
git push origin add-api-endpoints
gh pr create --title "Add API endpoints" --body "Testing CodeGuard AI"

Step 3: Run CodeGuard AI

# Get your PR number (usually #1)
gh pr list

# Run via CLI
cd ~/path/to/CodeGuardAI
python orchestrator.py <your-username> codeguard-test 1 <github-token>

# OR use the dashboard
streamlit run dashboard.py
# Navigate to "New Analysis" tab
# Enter: your-username / codeguard-test / 1
# Click "Launch Analysis"

What it tests:

✅ E2B sandbox creation
✅ Vulnerability detection
✅ Exploit generation
✅ Exploit execution (safe in sandbox)
✅ GitHub MCP integration
✅ PR comment posting
✅ Full end-to-end pipeline

Method 3: Automated GitHub Setup (Easiest - 2 minutes)

Use the automated script to set everything up:

# Run the setup script
./create_test_repo.sh

# Follow the prompts:
# - Enter repository name (or use default: codeguard-test)
# - Confirm creation
# - Script creates repo, commits files, and creates PR automatically

# Then run CodeGuard AI against the created PR
# (Script will give you the exact command to run)

What it does:

✅ Creates GitHub repository
✅ Adds all 4 vulnerable files
✅ Creates a pull request
✅ Gives you ready-to-run commands
✅ Fastest way to get a full test environment

📊 Expected Results

Local Test Results

When running test_sample_code.py, you should see:

Expected vs Actual:
  ✅ sql_injection.py: 4/4
  ✅ xss_vulnerabilities.py: 6/6
  ✅ command_injection.py: 8/8
  ✅ path_traversal.py: 9/9

Full Pipeline Results

When running the orchestrator or dashboard, you should see:

Console Output:

[ORCHESTRATOR] 🚀 Creating E2B sandbox...
[ORCHESTRATOR] ✓ Sandbox created
[ORCHESTRATOR] Setting up sandbox environment...
[ORCHESTRATOR] ✓ pip install httpx completed
[ORCHESTRATOR] 🔍 Starting security analysis inside sandbox...
[ORCHESTRATOR] Scanning api/database.py...
[ORCHESTRATOR] Found SQL injection vulnerability at line 16
[ORCHESTRATOR] Generating exploit...
[ORCHESTRATOR] Testing exploit...
[ORCHESTRATOR] ✅ Exploit successful!
...
[ORCHESTRATOR] ✅ Analysis complete!

GitHub PR Comment:

The PR should receive a comment like:

## 🛡️ CodeGuard AI Security Report

**Analysis Date:** 2024-01-15 10:30:00 UTC

### Summary
- **Files Scanned:** 4
- **Vulnerabilities Found:** 27
- **Severity:** 🔴 Critical

---

### 🔴 SQL Injection (4 instances)
**File:** `api/database.py`
**Severity:** High

**Line 16:**
```python
query = "SELECT * FROM users WHERE username='" + username + "' AND password='" + password + "'"

Exploit Proof: Payload: ' OR '1'='1' -- Result: Authentication bypass confirmed

Fix Suggestion:

# Use parameterized queries
cursor.execute("SELECT * FROM users WHERE username=? AND password=?", (username, password))

...


### Dashboard View

The Streamlit dashboard should show:

- **New Analysis Tab:**
  - Input form (filled)
  - "Launch Analysis" button
  - Success message with session ID

- **Live Monitor Tab:**
  - Real-time log streaming
  - Progress indicator
  - Vulnerability count
  - Timeline of events

- **History Tab:**
  - Analysis entry with results
  - Clickable to view detailed report
  - Statistics and charts

---

## 🐛 Troubleshooting

### Local Test Issues

**Error: "Could not import VulnerabilityScanner"**
```bash
# Make sure sandbox_agent/agent.py exists
ls sandbox_agent/agent.py

# Check Python path
python -c "import sys; print('\n'.join(sys.path))"

No vulnerabilities detected:

# Check if files exist
ls sample_vulnerable_code/*.py

# Verify file permissions
chmod 644 sample_vulnerable_code/*.py

# Check scanner patterns in sandbox_agent/agent.py

GitHub PR Test Issues

"E2B API key not found"

# Create config.json
cat > config.json << EOF
{
  "e2b_api_key": "your_e2b_api_key",
  "github_token": "your_github_token"
}
EOF

"GitHub MCP server not accessible"

# Start the MCP server
docker compose up -d

# Check if running
docker ps | grep github-mcp

# Check logs
docker compose logs github-mcp

"Sandbox creation failed"

# Verify E2B API key
python -c "import os; from e2b_code_interpreter import Sandbox; print('OK')"

# Check E2B dashboard for quota
# https://e2b.dev/dashboard

📝 Test Checklist

Use this checklist to ensure complete testing:

Basic Functionality

Local test script runs without errors
All 27+ vulnerabilities detected in local test
Sample files are readable and valid Python

Detection Capabilities

SQL injection detected (4 patterns)
XSS detected (6 patterns)
Command injection detected (8 patterns)
Path traversal detected (9 patterns)
Secure code examples NOT flagged

E2B Sandbox

Sandbox creates successfully
Dependencies install correctly
Agent code deploys to sandbox
Agent executes inside sandbox
Sandbox cleans up after analysis

MCP Integration

GitHub MCP server starts
Agent connects to MCP server
PR files fetched via MCP
Comments posted via MCP

Exploit Testing

Exploits generated for vulnerabilities
Exploits execute safely in sandbox
Exploit results captured
No false positives on secure code

Reporting

Security report generated
Report posted to GitHub PR
Report includes line numbers
Report includes fix suggestions with code snippets
Report includes remediation advice
Report properly formatted (markdown)

Dashboard

Dashboard loads successfully
New Analysis form works
Live monitoring shows real-time logs
History tab shows past analyses
All tabs render correctly

🎯 Performance Benchmarks

Expected timing for full pipeline test:

Phase	Expected Time
Sandbox creation	5-10 seconds
Environment setup	10-15 seconds
Agent deployment	2-5 seconds
Vulnerability scanning	5-10 seconds
Exploit generation	5-10 seconds
Exploit execution	10-20 seconds
Report generation	2-5 seconds
GitHub posting	2-5 seconds
Total	40-80 seconds

For 4 files with 27 vulnerabilities:

Expected: ~60 seconds
If slower: Check network, E2B quota, or MCP server

🚀 Next Steps

After successful testing:

Extend Detection:
- Add new vulnerability patterns to sandbox_agent/agent.py
- Test with different programming languages
- Add more complex exploit generation
Improve Accuracy:
- Reduce false positives with context analysis
- Add semantic analysis beyond regex
- Implement machine learning detection
Scale Up:
- Test with larger repositories
- Analyze multiple PRs concurrently
- Implement caching for faster re-scans
Production Use:
- Set up webhook for automatic PR monitoring
- Deploy dashboard to Streamlit Cloud
- Configure GitHub Action for CI/CD integration

📚 Additional Resources

README.md - Project overview and quick start
ARCHITECTURE.md - Detailed architecture documentation
SETUP_GUIDE.md - Complete setup instructions
sample_vulnerable_code/README.md - Vulnerability details and exploit examples

Happy Testing! 🛡️

If you find any issues or have questions, please check the troubleshooting section or refer to the documentation files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeGuard AI Testing Guide

📋 What's Been Created

Sample Vulnerable Code (`sample_vulnerable_code/`)

Testing Tools

🧪 Testing Methods

Method 1: Quick Local Test (Fastest - 30 seconds)

Method 2: Manual GitHub PR Test (Moderate - 5 minutes)

Step 1: Create a test repository

Step 2: Add vulnerable code

Step 3: Run CodeGuard AI

Method 3: Automated GitHub Setup (Easiest - 2 minutes)

📊 Expected Results

Local Test Results

Full Pipeline Results

GitHub PR Test Issues

📝 Test Checklist

Basic Functionality

Detection Capabilities

E2B Sandbox

MCP Integration

Exploit Testing

Reporting

Dashboard

🎯 Performance Benchmarks

🚀 Next Steps

📚 Additional Resources

FilesExpand file tree

TESTING_GUIDE.md

Latest commit

History

TESTING_GUIDE.md

File metadata and controls

CodeGuard AI Testing Guide

📋 What's Been Created

Sample Vulnerable Code (sample_vulnerable_code/)

Testing Tools

🧪 Testing Methods

Method 1: Quick Local Test (Fastest - 30 seconds)

Method 2: Manual GitHub PR Test (Moderate - 5 minutes)

Step 1: Create a test repository

Step 2: Add vulnerable code

Step 3: Run CodeGuard AI

Method 3: Automated GitHub Setup (Easiest - 2 minutes)

📊 Expected Results

Local Test Results

Full Pipeline Results

GitHub PR Test Issues

📝 Test Checklist

Basic Functionality

Detection Capabilities

E2B Sandbox

MCP Integration

Exploit Testing

Reporting

Dashboard

🎯 Performance Benchmarks

🚀 Next Steps

📚 Additional Resources

Sample Vulnerable Code (`sample_vulnerable_code/`)