Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions tests/benchmark/SCORING_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# BENCHMARK SCORING GUIDE
# =======================
#
# This document tracks the seeded bugs and InspectAI's detection performance.
# Run the InspectAI commands on the PR and fill in the results below.

## Seeded Bugs Summary

| File | Total Bugs | Security | Logic | Resource | Concurrency | Error Handling |
|------|------------|----------|-------|----------|-------------|----------------|
| seeded_bugs_python.py | 15 | 5 | 7 | 1 | 1 | 1 |
| seeded_bugs_api.py | 12 | 7 | 4 | 0 | 1 | 0 |
| **TOTAL** | **27** | **12** | **11** | **1** | **2** | **1** |

---

## Detailed Bug List (Ground Truth)

### File 1: seeded_bugs_python.py

| # | Bug Type | Severity | Category | Line | Description |
|---|----------|----------|----------|------|-------------|
| 1 | SQL Injection | HIGH | Security | 31 | User input directly in SQL query |
| 2 | Hardcoded Secret | CRITICAL | Security | 41-42 | API key and password in source |
| 3 | Off-by-One | MEDIUM | Logic | 56 | range(len+1) causes IndexError |
| 4 | Missing Null Check | MEDIUM | Logic | 66 | No None check before access |
| 5 | Resource Leak | MEDIUM | Resource | 76 | File never closed |
| 6 | Wrong Operator | HIGH | Logic | 86 | Using 'is' instead of '==' |
| 7 | Wrong Formula | LOW | Logic | 99 | Dividing by (n-1) not n |
| 8 | Race Condition | HIGH | Concurrency | 107 | Non-atomic counter increment |
| 9 | XSS | HIGH | Security | 118 | Unescaped user input in HTML |
| 10 | Weak Crypto | CRITICAL | Security | 131 | MD5 for password hashing |
| 11 | Missing Return | MEDIUM | Logic | 141 | No return True for valid case |
| 12 | Mutable Default | MEDIUM | Logic | 152 | Default arg `[]` persists |
| 13 | Unhandled Exception | MEDIUM | Error | 160 | No try-except for json.loads |
| 14 | Path Traversal | HIGH | Security | 168 | User can access any file |
| 15 | Infinite Loop | HIGH | Logic | 181 | Binary search doesn't converge |

### File 2: seeded_bugs_api.py

| # | Bug Type | Severity | Category | Line | Description |
|---|----------|----------|----------|------|-------------|
| 16 | Command Injection | CRITICAL | Security | 29 | shell=True with user input |
| 17 | Insecure Deserialize | CRITICAL | Security | 41 | pickle.loads on untrusted data |
| 18 | Weak Session | HIGH | Security | 51 | Predictable session tokens |
| 19 | Missing AuthZ | HIGH | Security | 62 | No permission check for delete |
| 20 | ReDoS | MEDIUM | Security | 74 | Evil regex pattern |
| 21 | Float Comparison | MEDIUM | Logic | 88 | Using == for floats |
| 22 | TOCTOU | HIGH | Concurrency | 96 | Check-then-use race condition |
| 23 | Precision Loss | MEDIUM | Logic | 105 | int() truncates cents |
| 24 | Weak Regex | LOW | Logic | 116 | Email regex too permissive |
| 25 | Type Confusion | MEDIUM | Logic | 123 | No type validation before divide |
| 26 | Timing Attack | MEDIUM | Security | 126 | String compare short-circuits |
| 27 | Info Disclosure | LOW | Security | 137 | Error reveals username exists |

---

## Testing Procedure

1. Open PR from `test-benchmark` branch to `main`
2. Run these commands and record findings:

### Command 1: `/inspectai_review`
- [ ] Run command
- Findings count: ___
- True Positives: ___
- False Positives: ___

### Command 2: `/inspectai_bugs`
- [ ] Run command
- Findings count: ___
- True Positives: ___
- False Positives: ___

### Command 3: `/inspectai_security`
- [ ] Run command
- Findings count: ___
- True Positives: ___
- False Positives: ___

---

## Scoring Template

After running commands, fill in this table:

| Command | Bugs Found | True Positives | False Positives | Recall | Precision |
|---------|------------|----------------|-----------------|--------|-----------|
| /inspectai_review | | | | | |
| /inspectai_bugs | | | | | |
| /inspectai_security | | | | | |

### Formulas:
- **Recall** = True Positives / Total Seeded Bugs (27)
- **Precision** = True Positives / (True Positives + False Positives)
- **F1 Score** = 2 × (Precision × Recall) / (Precision + Recall)

---

## Categories Breakdown (After Testing)

| Category | Total | Found | Recall |
|----------|-------|-------|--------|
| Security (CRITICAL) | 4 | | |
| Security (HIGH) | 5 | | |
| Security (MEDIUM) | 3 | | |
| Security (LOW) | 1 | | |
| Logic Errors | 11 | | |
| Resource Leaks | 1 | | |
| Concurrency | 2 | | |
| Error Handling | 1 | | |

---

## Notes
<!-- Record any observations during testing -->

205 changes: 205 additions & 0 deletions tests/benchmark/seeded_bugs_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
"""
BENCHMARK TEST FILE #2 - API/Web Application Bugs
=================================================
This file simulates a web API with common security and logic bugs.
Contains 10 additional seeded bugs.

DO NOT FIX THESE BUGS - They are intentional for benchmarking purposes.
=================================================
"""

from typing import Optional, Dict, List, Any
import re
import pickle
import subprocess


class UserService:
"""Service for managing users."""

def __init__(self):
self.users: Dict[str, Dict] = {}
self.session_tokens: Dict[str, str] = {}

# =========================================================================
# BUG #16: Command Injection (SECURITY - CRITICAL)
# User input passed directly to shell command
# =========================================================================
def ping_server(self, hostname: str) -> str:
"""Ping a server to check if it's online."""
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 🔒 Injection Vulnerability (critical)

The ping_server function takes a hostname as input and passes it directly to the subprocess.run function with shell=True. This allows an attacker to inject arbitrary shell commands by manipulating the hostname. For example, a hostname like "127.0.0.1; rm -rf /" would execute the rm -rf / command on the server.

Remediation: Sanitize the hostname input before passing it to the subprocess.run function. Use shlex.quote to escape the hostname, or avoid shell=True and pass the command as a list of arguments.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Command Injection (critical): The ping_server function is vulnerable to command injection because the hostname parameter is not sanitized before being used in a shell command. An attacker could inject arbitrary commands by providing a malicious hostname.
Fix: Sanitize the hostname input before passing it to the subprocess.run function. Use a library like shlex.quote to escape the hostname.

    # =========================================================================
    def ping_server(self, hostname: str) -> str:
        """Ping a server to check if it's online."""
        # BUG: Command injection - hostname not sanitized
        result = subprocess.run(

# BUG: Command injection - hostname not sanitized
result = subprocess.run(
Comment thread
hj2713 marked this conversation as resolved.
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 security (critical): Command injection vulnerability. User-provided hostname is directly injected into the ping command without proper sanitization. The shell=True argument exacerbates the risk.
Fix: Use subprocess.run with a list of arguments and avoid shell=True. Sanitize the hostname before passing it to the command.

f"ping -c 1 {hostname}",
shell=True, # BUG: shell=True with user input
capture_output=True,
text=True
)
return result.stdout

# =========================================================================
# BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
# Using pickle to deserialize untrusted data
Comment thread
hj2713 marked this conversation as resolved.
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Insecure Deserialization (critical): The load_user_preferences function uses pickle.loads to deserialize data, which is vulnerable to arbitrary code execution if the data is untrusted. An attacker could craft a malicious pickle payload to execute arbitrary code on the server.
Fix: Avoid using pickle for deserializing untrusted data. Use a safer serialization format like JSON or Protocol Buffers.

    # =========================================================================
    # BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
    # Using pickle to deserialize untrusted data
    # =========================================================================
    def load_user_preferences(self, data: bytes) -> dict:

🔴 Security: Dependency/Library Security (critical): The code uses pickle.loads() to deserialize data without proper sanitization. This can lead to arbitrary code execution if the data is malicious.

    # =========================================================================
    # BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
    # Using pickle to deserialize untrusted data
    # =========================================================================
    def load_user_preferences(self, data: bytes) -> dict:

🔴 Security: Injection Vulnerability (critical): The load_user_preferences function uses pickle.loads to deserialize data received as input. This is inherently unsafe as it allows arbitrary code execution if the data is crafted maliciously.

    # =========================================================================
    # BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
    # Using pickle to deserialize untrusted data
    # =========================================================================
    def load_user_preferences(self, data: bytes) -> dict:

# =========================================================================
def load_user_preferences(self, data: bytes) -> dict:
Comment thread
hj2713 marked this conversation as resolved.
"""Load user preferences from serialized data."""
# BUG: Pickle deserialization of untrusted data - RCE vulnerability
return pickle.loads(data)
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 security (critical): Insecure deserialization vulnerability. The pickle.loads function is used to deserialize data without any validation, allowing for arbitrary code execution if the data is malicious.
Fix: Avoid using pickle.loads with untrusted data. Use a safer serialization format like JSON or implement proper input validation and sanitization.


# =========================================================================
# BUG #18: Broken Authentication (SECURITY - HIGH)
# Weak session token generation
# =========================================================================
Comment thread
hj2713 marked this conversation as resolved.
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Broken Authentication (high): The create_session function generates predictable session tokens based on the user ID and current time. An attacker could easily guess valid session tokens and impersonate other users.
Fix: Use a cryptographically secure random number generator to generate unpredictable session tokens.

    # BUG #18: Broken Authentication (SECURITY - HIGH)
    # Weak session token generation
    # =========================================================================
    def create_session(self, user_id: str) -> str:
        """Create a session token for a user."""

def create_session(self, user_id: str) -> str:
"""Create a session token for a user."""
# BUG: Predictable session token based on user_id
import time
token = f"{user_id}_{int(time.time())}" # Easily guessable!
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Broken authentication due to weak session token generation. The session token is predictable as it's based on the user ID and current timestamp, making it vulnerable to session hijacking.
Fix: Use a cryptographically secure random number generator to generate unpredictable session tokens.

self.session_tokens[token] = user_id
return token

# =========================================================================
# BUG #19: Missing Authorization Check (SECURITY - HIGH)
# Any user can delete any other user
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Missing Authorization Check (high): The delete_user function lacks an authorization check, allowing any user to delete any other user's account. This can lead to unauthorized data deletion and privilege escalation.
Fix: Implement an authorization check to ensure that only authorized users can delete user accounts. Check if the requesting user has the necessary permissions to delete the target user.

    # =========================================================================
    # BUG #19: Missing Authorization Check (SECURITY - HIGH)
    # Any user can delete any other user
    # =========================================================================
    def delete_user(self, target_user_id: str, requesting_user_id: str) -> bool:

# =========================================================================
def delete_user(self, target_user_id: str, requesting_user_id: str) -> bool:
"""Delete a user account."""
# BUG: No authorization check - any user can delete any user
if target_user_id in self.users:
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Missing authorization check. Any user can delete any other user's account because there is no validation to ensure the requesting user has the necessary permissions.
Fix: Implement an authorization check to ensure that only authorized users can delete accounts. Verify that the requesting_user_id has permission to delete the target_user_id.

del self.users[target_user_id]
return True
return False

# =========================================================================
# BUG #20: ReDoS Vulnerability (SECURITY - MEDIUM)
# Regex pattern vulnerable to catastrophic backtracking
Comment thread
hj2713 marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 ReDoS Vulnerability (medium): The validate_input function uses a regular expression that is vulnerable to catastrophic backtracking (ReDoS). An attacker could provide a specially crafted input string that causes the regex engine to consume excessive resources, leading to a denial-of-service.
Fix: Use a more efficient regular expression or a different input validation method that is not vulnerable to ReDoS.

    # =========================================================================
    # BUG #20: ReDoS Vulnerability (SECURITY - MEDIUM)
    # Regex pattern vulnerable to catastrophic backtracking
    # =========================================================================
    def validate_input(self, text: str) -> bool:

# =========================================================================
def validate_input(self, text: str) -> bool:
"""Validate user input format."""
# BUG: ReDoS - evil regex with nested quantifiers
pattern = r"^(a+)+$"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 security (medium): ReDoS (Regular Expression Denial of Service) vulnerability. The regex pattern ^(a+)+$ is vulnerable to catastrophic backtracking, which can cause the application to hang or crash when processing specially crafted input.
Fix: Use a more efficient regex pattern or limit the input length to prevent excessive backtracking.

return bool(re.match(pattern, text))
Comment thread
hj2713 marked this conversation as resolved.


class PaymentProcessor:
"""Service for processing payments."""

def __init__(self):
self.transactions: List[Dict] = []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Floating Point Comparison (medium): The verify_payment function compares floating-point numbers for equality using ==. Due to the imprecision of floating-point arithmetic, this comparison may fail even if the numbers are very close. An attacker could exploit this to bypass payment verification.
Fix: Avoid comparing floating-point numbers for equality directly. Use a tolerance value to check if the numbers are within a certain range of each other.

    def __init__(self):
        self.transactions: List[Dict] = []
    
    # =========================================================================
    # BUG #21: Floating Point Comparison (LOGIC - MEDIUM)

🟡 Floating Point Comparison (medium): The verify_payment function compares floating-point numbers for equality using ==. This is unreliable due to the way floating-point numbers are represented in computers. Small rounding errors can cause the comparison to fail even if the numbers are conceptually equal.
Fix: Avoid comparing floating-point numbers for equality directly. Use a tolerance value to check if the numbers are close enough to be considered equal.

    def __init__(self):
        self.transactions: List[Dict] = []
    
    # =========================================================================
    # BUG #21: Floating Point Comparison (LOGIC - MEDIUM)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Floating Point Comparison (medium): The verify_payment function compares floating-point numbers for equality using ==. This is unreliable due to the inherent imprecision of floating-point arithmetic. Small differences in the values can lead to incorrect results.
Fix: Avoid comparing floating-point numbers for equality directly. Use a tolerance value to check if the difference between the numbers is within an acceptable range.

    def __init__(self):
        self.transactions: List[Dict] = []
    
    # =========================================================================
    # BUG #21: Floating Point Comparison (LOGIC - MEDIUM)

🟡 Floating Point Comparison (medium): The verify_payment function compares floating-point numbers for equality using ==. Due to the nature of floating-point representation, this can lead to unexpected results and incorrect payment verification.
Fix: Avoid comparing floating-point numbers for exact equality. Use a tolerance value to compare if the difference between the numbers is within an acceptable range.

    def __init__(self):
        self.transactions: List[Dict] = []
    
    # =========================================================================
    # BUG #21: Floating Point Comparison (LOGIC - MEDIUM)

# =========================================================================
# BUG #21: Floating Point Comparison (LOGIC - MEDIUM)
# Comparing floats for equality
# =========================================================================
def verify_payment(self, expected: float, received: float) -> bool:
"""Verify that the received payment matches expected amount."""
# BUG: Floating point comparison - 0.1 + 0.2 != 0.3
return expected == received
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Floating-point comparison. Comparing floating-point numbers for equality can lead to unexpected results due to precision issues.
Fix: Use a tolerance value to compare floating-point numbers within a certain range of accuracy. For example, abs(expected - received) < tolerance.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 TOCTOU Race Condition (high): The process_withdrawal function is vulnerable to a time-of-check to time-of-use (TOCTOU) race condition. The account balance is checked before the withdrawal is processed, but another thread could modify the balance in between the check and the update. This could allow an attacker to withdraw more funds than they have in their account.
Fix: Use a locking mechanism or a transactional operation to ensure that the balance check and update are performed atomically.

        """Verify that the received payment matches expected amount."""
        # BUG: Floating point comparison - 0.1 + 0.2 != 0.3
        return expected == received
    
    # =========================================================================

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 TOCTOU Race Condition (high): The process_withdrawal function is vulnerable to a Time-of-Check to Time-of-Use (TOCTOU) race condition. The balance is checked before the withdrawal is processed, but another thread could modify the balance in between the check and the update, leading to overdrafts or incorrect balances.
Fix: Use a locking mechanism or a transactional operation to ensure that the balance check and update are performed atomically.

        """Verify that the received payment matches expected amount."""
        # BUG: Floating point comparison - 0.1 + 0.2 != 0.3
        return expected == received
    
    # =========================================================================

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Floating-point comparison using equality. Comparing floating-point numbers for equality is unreliable due to the way floating-point numbers are represented in computers. This can lead to unexpected behavior and incorrect results.
Fix: Use a tolerance value to compare floating-point numbers within a certain range of accuracy.


# =========================================================================
# BUG #22: TOCTOU Race Condition (CONCURRENCY - HIGH)
# Time-of-check to time-of-use vulnerability
# =========================================================================
def process_withdrawal(self, account_id: str, amount: float, balance: Dict[str, float]) -> bool:
"""Process a withdrawal if sufficient balance exists."""
# BUG: TOCTOU - balance can change between check and update
if balance.get(account_id, 0) >= amount:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 concurrency (high): TOCTOU race condition. There is a time gap between checking the account balance and updating it, which allows for a race condition where another thread could modify the balance in between.
Fix: Use a locking mechanism or atomic operations to ensure that the balance check and update are performed as a single atomic operation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Precision Loss (medium): The calculate_total_with_fee function calculates the fee as amount * fee_percent / 100. This can lead to precision loss, especially when dealing with cents. The result is then cast to an integer, further exacerbating the precision loss.
Fix: Perform the calculation using integer arithmetic to avoid precision loss. For example, calculate the fee as amount * fee_percent // 100.

        """Process a withdrawal if sufficient balance exists."""
        # BUG: TOCTOU - balance can change between check and update
        if balance.get(account_id, 0) >= amount:
            # Gap here where another thread could modify balance
            balance[account_id] -= amount

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Precision Loss (medium): The calculate_total_with_fee function calculates the fee as amount * fee_percent / 100. While Python avoids integer overflow, this calculation can lead to precision loss, especially when dealing with cents. The result is then cast to an integer, further truncating the value.
Fix: Perform the calculation using floating-point numbers or use integer arithmetic with appropriate scaling to preserve precision. Consider using the decimal module for precise decimal arithmetic.

        """Process a withdrawal if sufficient balance exists."""
        # BUG: TOCTOU - balance can change between check and update
        if balance.get(account_id, 0) >= amount:
            # Gap here where another thread could modify balance
            balance[account_id] -= amount

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 concurrency (high): TOCTOU (Time-of-Check Time-of-Use) race condition. The balance is checked and then used later, but another thread could modify the balance in between, leading to incorrect withdrawals.
Fix: Use a locking mechanism or atomic operation to ensure that the balance check and update are performed atomically.

# Gap here where another thread could modify balance
balance[account_id] -= amount
return True
return False

# =========================================================================
# BUG #23: Integer Overflow (LOGIC - MEDIUM)
# Not handling large numbers properly
# =========================================================================
def calculate_total_with_fee(self, amount: int, fee_percent: int) -> int:
"""Calculate total amount including fee."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak Regex (low): The is_valid_email function uses an overly simple regular expression to validate email addresses. This regex accepts invalid email addresses, such as "a@b".
Fix: Use a more robust regular expression or a dedicated email validation library to validate email addresses.

    # =========================================================================
    def calculate_total_with_fee(self, amount: int, fee_percent: int) -> int:
        """Calculate total amount including fee."""
        # BUG: Potential overflow in multiplication before division
        # In Python this won't overflow but the logic is still wrong for cents

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak Regex (low): The is_valid_email function uses an overly simple regular expression to validate email addresses. This regex accepts many invalid email addresses, such as 'a@b'.
Fix: Use a more robust regular expression or a dedicated email validation library to ensure that email addresses are properly validated.

    # =========================================================================
    def calculate_total_with_fee(self, amount: int, fee_percent: int) -> int:
        """Calculate total amount including fee."""
        # BUG: Potential overflow in multiplication before division
        # In Python this won't overflow but the logic is still wrong for cents

# BUG: Potential overflow in multiplication before division
# In Python this won't overflow but the logic is still wrong for cents
fee = amount * fee_percent / 100
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Potential precision loss when calculating the fee. The fee is calculated using floating-point arithmetic, which can lead to rounding errors and precision loss. The result is then cast to an integer, further exacerbating the problem.
Fix: Use integer arithmetic with appropriate scaling to avoid precision loss. For example, multiply the amount and fee percentage by 100 to work with cents, then divide by 10000 to get the final fee.

return int(amount + fee) # Precision loss!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Potential precision loss. Converting the result of the fee calculation to an integer can lead to precision loss, especially when dealing with cents.
Fix: Use a decimal type or perform calculations with integers representing cents to avoid precision loss.



class DataValidator:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Type Confusion (medium): The safe_divide function does not validate the input types before performing the division. If a or b is not a number, a TypeError will be raised. This can lead to unexpected errors and potentially expose sensitive information.
Fix: Validate the input types before performing the division. Raise a more specific exception if the input types are invalid.

class DataValidator:
    """Utility class for data validation."""
    

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Type Confusion (medium): The safe_divide function does not validate the input types before performing the division. If a or b are not numbers, a TypeError will be raised, which is not handled. The comment indicates that it will fail silently with strings, but it will actually raise an exception.
Fix: Add type validation to ensure that both a and b are numbers before performing the division. Handle the TypeError exception appropriately.

class DataValidator:
    """Utility class for data validation."""
    

"""Utility class for data validation."""

# =========================================================================
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Timing Attack (medium): The verify_api_key function compares API keys using the == operator, which is vulnerable to timing attacks. An attacker could measure the time it takes to compare the keys and use this information to guess the correct API key.
Fix: Use hmac.compare_digest to compare API keys securely and prevent timing attacks.

    """Utility class for data validation."""
    
    # =========================================================================
    # BUG #24: Incorrect Regex for Email (LOGIC - LOW)
    # Overly permissive email regex

🟡 Timing Attack (medium): The verify_api_key function compares the provided API key with the stored API key using ==. This comparison is vulnerable to timing attacks because it short-circuits on the first mismatch. An attacker could use timing information to determine the correct API key one character at a time.
Fix: Use hmac.compare_digest to compare the API keys in a way that is resistant to timing attacks.

    """Utility class for data validation."""
    
    # =========================================================================
    # BUG #24: Incorrect Regex for Email (LOGIC - LOW)
    # Overly permissive email regex

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Timing Attack (medium): The verify_api_key function compares the provided API key with the stored key using ==. This comparison is vulnerable to timing attacks because it short-circuits on the first mismatch. An attacker could measure the time it takes for the comparison to return and use this information to guess the API key one character at a time.
Fix: Use hmac.compare_digest to perform a constant-time comparison of the API keys.

    """Utility class for data validation."""
    
    # =========================================================================
    # BUG #24: Incorrect Regex for Email (LOGIC - LOW)
    # Overly permissive email regex

# BUG #24: Incorrect Regex for Email (LOGIC - LOW)
# Overly permissive email regex
# =========================================================================
def is_valid_email(self, email: str) -> bool:
"""Check if email is valid."""
# BUG: Overly simple regex - accepts invalid emails like "a@b"
pattern = r".+@.+"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic (low): Incorrect regex for email validation. The regex pattern .+@.+ is overly permissive and accepts invalid email addresses like a@b.
Fix: Use a more robust regex pattern for email validation that adheres to the RFC 5322 standard.

return bool(re.match(pattern, email))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic (low): Incorrect regex for email validation. The regex pattern .+@.+ is overly permissive and accepts invalid email addresses.
Fix: Use a more robust regular expression for email validation or use a dedicated email validation library.


# =========================================================================
# BUG #25: Type Confusion (LOGIC - MEDIUM)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information Disclosure (low): The authenticate_user function raises a ValueError with a message that reveals whether the username exists. This information can be used by an attacker to enumerate valid usernames.
Fix: Provide a generic error message that does not reveal whether the username exists.

    
    # =========================================================================
    # BUG #25: Type Confusion (LOGIC - MEDIUM)
    # Not validating input type before operations
    # =========================================================================

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information Disclosure (low): The authenticate_user function's error message reveals whether the username exists in the database. This information can be used by attackers to enumerate valid usernames and launch targeted attacks.
Fix: Return a generic error message that does not reveal whether the username exists. For example, return 'Invalid username or password'.

    
    # =========================================================================
    # BUG #25: Type Confusion (LOGIC - MEDIUM)
    # Not validating input type before operations
    # =========================================================================

# Not validating input type before operations
# =========================================================================
def safe_divide(self, a: Any, b: Any) -> float:
"""Safely divide two numbers."""
# BUG: No type validation - will fail silently with strings
if b == 0:
return 0.0
return a / b # Will raise TypeError if a or b is not a number
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Type confusion. The safe_divide function does not validate the input types before performing the division, which can lead to a TypeError if the inputs are not numbers.
Fix: Add type validation to ensure that the inputs are numbers before performing the division. Raise a more informative error if the types are incorrect.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Type confusion vulnerability. The safe_divide function does not validate the input types of a and b before performing the division. This can lead to a TypeError if a or b is not a number.
Fix: Add type validation to ensure that a and b are numbers before performing the division.



# =============================================================================
# Additional standalone functions with bugs
# =============================================================================

# =========================================================================
# BUG #26: Timing Attack Vulnerability (SECURITY - MEDIUM)
# String comparison short-circuits on mismatch
# =========================================================================
def verify_api_key(provided_key: str, stored_key: str) -> bool:
"""Verify an API key."""
# BUG: Timing attack - comparison short-circuits
return provided_key == stored_key # Should use hmac.compare_digest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 security (medium): Timing attack vulnerability. Comparing strings using == short-circuits on the first mismatch, which can leak information about the correct API key.
Fix: Use hmac.compare_digest to perform a constant-time comparison of the API keys.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 security (medium): Timing attack vulnerability. The string comparison provided_key == stored_key short-circuits on the first mismatch, which can be exploited to determine the correct API key one character at a time.
Fix: Use hmac.compare_digest to perform a constant-time comparison of the API keys.



# =========================================================================
# BUG #27: Improper Error Message (SECURITY - LOW)
# Leaking sensitive information in error message
# =========================================================================
def authenticate_user(username: str, password: str, users_db: Dict) -> Dict:
"""Authenticate a user and return their profile."""
user = users_db.get(username)

if not user:
# BUG: Information disclosure - reveals if username exists
raise ValueError(f"User '{username}' does not exist")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (low): Improper error message. The error message reveals whether the username exists, which can be used for account enumeration attacks.
Fix: Return a generic error message that does not reveal whether the username exists. For example, "Invalid username or password".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (low): Information disclosure vulnerability. The error message reveals whether the username exists in the database, which can be used to enumerate valid usernames.
Fix: Return a generic error message that does not reveal whether the username exists.


if user["password"] != password:
# BUG: Should not differentiate between bad user and bad password
raise ValueError("Incorrect password")

return user


# =============================================================================
# GROUND TRUTH - Bug Summary for Evaluation
# =============================================================================
SEEDED_BUGS_FILE2 = {
"security": [
{"id": 16, "type": "Command Injection", "severity": "CRITICAL", "line": 29},
{"id": 17, "type": "Insecure Deserialization", "severity": "CRITICAL", "line": 41},
{"id": 18, "type": "Weak Session Token", "severity": "HIGH", "line": 51},
{"id": 19, "type": "Missing Authorization", "severity": "HIGH", "line": 62},
{"id": 20, "type": "ReDoS", "severity": "MEDIUM", "line": 74},
{"id": 26, "type": "Timing Attack", "severity": "MEDIUM", "line": 126},
{"id": 27, "type": "Information Disclosure", "severity": "LOW", "line": 137},
],
"logic": [
{"id": 21, "type": "Float Comparison", "severity": "MEDIUM", "line": 88},
{"id": 23, "type": "Precision Loss", "severity": "MEDIUM", "line": 105},
{"id": 24, "type": "Weak Regex", "severity": "LOW", "line": 116},
{"id": 25, "type": "Type Confusion", "severity": "MEDIUM", "line": 123},
],
"concurrency": [
{"id": 22, "type": "TOCTOU Race", "severity": "HIGH", "line": 96},
],
}

TOTAL_BUGS_FILE2 = 12
Loading