Skip to content

Test Files with seeded bugs for InspectAI Evaluation#19

Open
hj2713 wants to merge 1 commit into
mainfrom
test-benchmark
Open

Test Files with seeded bugs for InspectAI Evaluation#19
hj2713 wants to merge 1 commit into
mainfrom
test-benchmark

Conversation

@hj2713
Copy link
Copy Markdown
Owner

@hj2713 hj2713 commented Dec 10, 2025

Test Files with seeded bugs for InspectAI Evaluation

@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 10, 2025

/inspectai_help

@comse6998-inspectai
Copy link
Copy Markdown

🤖 InspectAI Commands

Triggered by: @hj2713

Available Commands

Command Description
/inspectai_review Quick Review - Reviews ONLY the changed lines in your PR. Posts inline comments on issues introduced by your changes. Fast and focused.
/inspectai_bugs Deep Bug Scan - Analyzes entire files (not just diffs) for potential bugs, logic errors, and edge cases. More thorough but slower.
/inspectai_refactor Refactor Suggestions - Suggests code improvements for readability, performance, and maintainability.
/inspectai_security Security Audit - Scans for security vulnerabilities using 4 specialized sub-agents: Injection, Auth, Data Exposure, Dependencies.
/inspectai_tests Test Generation - Generates unit tests for your changed code.
/inspectai_docs Documentation - Generates/updates docstrings for changed Python files using Google-style format.
/inspectai_help Help - Shows this message.

Tips

  • 🚀 Start with /inspectai_review for quick feedback on your changes
  • 🐛 Use /inspectai_bugs when you want a deeper analysis of edge cases
  • 🔐 Run /inspectai_security before merging code that handles user input or authentication
  • Generate tests with /inspectai_tests to improve coverage

Feedback

React with 👍 or 👎 on any InspectAI comment to help improve future reviews!


InspectAI - Your AI Code Review Assistant

@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 10, 2025

/inspectai_review

Copy link
Copy Markdown

@comse6998-inspectai comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 InspectAI Code Review

Triggered by: @hj2713
Files Reviewed: 2
Inline Comments: 29

I've added inline comments on the specific lines that need attention.
Only the changed lines in this PR were reviewed.


Use /inspectai_bugs to scan entire files for bugs.

Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_python.py
Comment thread tests/benchmark/seeded_bugs_python.py
"""Read a file from the user uploads directory."""
# BUG: Path traversal - user can access any file with ../
base_path = "/var/uploads/"
file_path = base_path + filename # No sanitization!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Path traversal vulnerability. The code directly concatenates the filename with the base_path without any sanitization, allowing malicious users to access arbitrary files on the system by using path traversal sequences like ../ in the filename.
Fix: Sanitize the filename to prevent path traversal attacks: file_path = os.path.join(base_path, os.path.basename(filename))

if numbers[mid] == target:
return mid
elif numbers[mid] < target:
left = mid # BUG: Should be mid + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 logic (high): Infinite loop risk. In the binary search, when numbers[mid] < target, the left pointer is updated to mid instead of mid + 1. This can lead to an infinite loop if the target is slightly larger than numbers[mid].
Fix: Update the left pointer to mid + 1: left = mid + 1

Comment thread tests/benchmark/seeded_bugs_python.py
@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 10, 2025

/inspectai_bugs

Copy link
Copy Markdown

@comse6998-inspectai comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 InspectAI Bug Detection

Triggered by: @hj2713
Files Scanned: 2
Issues Found: 57

🔴 Critical: 13 | 🟠 High: 19 | 🟡 Medium: 22 | ⚪ Low: 3

I've added 57 inline comments on issues introduced by your changes.

Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_python.py
"""Render a user profile as HTML."""
# BUG: XSS vulnerability - user input not escaped
html = f"""
<div class="profile">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unhandled Exception (medium): No exception handling for JSON parsing, which can cause the program to crash on invalid JSON input.
Fix: Add a try-except block to handle potential JSONDecodeError exceptions.

    # BUG: XSS vulnerability - user input not escaped
    html = f"""
    <div class="profile">
        <h1>Welcome, {username}!</h1>
        <p class="bio">{bio}</p>

🟡 Unhandled Exception (medium): The code doesn't handle exceptions when parsing JSON user input, which can cause the program to crash on invalid JSON.
Fix: Use a try-except block to catch JSONDecodeError.

    # BUG: XSS vulnerability - user input not escaped
    html = f"""
    <div class="profile">
        <h1>Welcome, {username}!</h1>
        <p class="bio">{bio}</p>

🟡 Runtime Issue (medium): The parse_user_input function does not handle potential json.JSONDecodeError exceptions that can occur when parsing invalid JSON data. This can cause the program to crash if the input is not valid JSON.
Fix: Wrap the json.loads call in a try-except block to catch json.JSONDecodeError and handle it appropriately (e.g., return an error message or a default value).

    # BUG: XSS vulnerability - user input not escaped
    html = f"""
    <div class="profile">
        <h1>Welcome, {username}!</h1>
        <p class="bio">{bio}</p>

Comment thread tests/benchmark/seeded_bugs_python.py
Comment thread tests/benchmark/seeded_bugs_python.py
Comment thread tests/benchmark/seeded_bugs_python.py
Repository owner deleted a comment from Yeshitha-co Dec 10, 2025
@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 10, 2025

/inspectai_security

Copy link
Copy Markdown

@comse6998-inspectai comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 InspectAI Security Scan

Triggered by: @hj2713
Files Scanned: 2
Vulnerabilities Found: 16
Risk Score: 🔴 8.7/10

🔴 Critical: 9 | 🟠 High: 7

I've added 16 inline comments on potential security issues.


Use /inspectai_review for code review or /inspectai_bugs for bug detection.

Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_api.py
# User input passed directly to shell command
# =========================================================================
def ping_server(self, hostname: str) -> str:
"""Ping a server to check if it's online."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 🔒 Injection Vulnerability (critical)

The ping_server function takes a hostname as input and passes it directly to the subprocess.run function with shell=True. This allows an attacker to inject arbitrary shell commands by manipulating the hostname. For example, a hostname like "127.0.0.1; rm -rf /" would execute the rm -rf / command on the server.

Remediation: Sanitize the hostname input before passing it to the subprocess.run function. Use shlex.quote to escape the hostname, or avoid shell=True and pass the command as a list of arguments.

Comment thread tests/benchmark/seeded_bugs_api.py
Comment thread tests/benchmark/seeded_bugs_python.py
"""Fetch a user from the database by their ID."""
conn = sqlite3.connect("users.db")
cursor = conn.cursor()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 🔒 Dependency/Library Security (high)

SQL Injection vulnerability due to direct string interpolation in the SQL query.

Remediation: Use parameterized queries to prevent SQL injection. For example: query = "SELECT * FROM users WHERE id = ?"; cursor.execute(query, (user_id,))


🟠 🔒 Injection Vulnerability (high)

The get_user_by_id function constructs a raw SQL query using f-strings and directly interpolates the user_id parameter without any sanitization or parameterization. This allows an attacker to inject arbitrary SQL code by providing a malicious user_id value, potentially compromising the entire database.

Remediation: Use parameterized queries to prevent SQL injection. Replace the line query = f"SELECT * FROM users WHERE id = \'{user_id}\'" with query = "SELECT * FROM users WHERE id = ?"; cursor.execute(query, (user_id,))

Comment thread tests/benchmark/seeded_bugs_python.py
Comment thread tests/benchmark/seeded_bugs_python.py
Comment thread tests/benchmark/seeded_bugs_python.py
@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 10, 2025

/inspectai_docs

@comse6998-inspectai
Copy link
Copy Markdown

📚 InspectAI Documentation Generator

Triggered by: @hj2713
Files Processed: 2
Files with New Documentation: 2

Updated Files with Docstrings

📝 tests/benchmark/seeded_bugs_api.py
"""
BENCHMARK TEST FILE #2 - API/Web Application Bugs
=================================================
This file simulates a web API with common security and logic bugs.
Contains 10 additional seeded bugs.

DO NOT FIX THESE BUGS - They are intentional for benchmarking purposes.
=================================================
"""

from typing import Optional, Dict, List, Any
import re
import pickle
import subprocess


class UserService:
    """Service for managing users."""
    
    def __init__(self):
        """Initializes the UserService with empty user and session dictionaries."""
        self.users: Dict[str, Dict] = {}
        self.session_tokens: Dict[str, str] = {}
    
    # =========================================================================
    # BUG #16: Command Injection (SECURITY - CRITICAL)
    # User input passed directly to shell command
    # =========================================================================
    def ping_server(self, hostname: str) -> str:
        """Ping a server to check if it's online.

        Args:
            hostname (str): The hostname or IP address to ping.

        Returns:
            str: The output of the ping command.

        Raises:
            subprocess.CalledProcessError: If the ping command returns a non-zero exit code.
            Exception: Any other exception raised by the subprocess.run call.

        Example:
            >>> service = UserService()
            >>> result = service.ping_server("127.0.0.1")
            >>> print(result)
            PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
            64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.024 ms

            --- 127.0.0.1 ping statistics ---
            1 packets transmitted, 1 received, 0% packet loss, time 0ms
            rtt min/avg/max/mdev = 0.024/0.024/0.024/0.000 ms
        """
        # BUG: Command injection - hostname not sanitized
        result = subprocess.run(
            f"ping -c 1 {hostname}",
            shell=True,  # BUG: shell=True with user input
            capture_output=True,
            text=True
        )
        return result.stdout
    
    # =========================================================================
    # BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
    # Using pickle to deserialize untrusted data
    # =========================================================================
    def load_user_preferences(self, data: bytes) -> dict:
        """Load user preferences from serialized data.

        Args:
            data (bytes): The serialized data containing user preferences.

        Returns:
            dict: The deserialized user preferences.

        Raises:
            pickle.UnpicklingError: If the data cannot be deserialized.
            Exception: Any other exception raised by the pickle.loads call.

        Example:
            >>> import pickle
            >>> service = UserService()
            >>> data = pickle.dumps({"theme": "dark", "notifications": True})
            >>> preferences = service.load_user_preferences(data)
            >>> print(preferences)
            {'theme': 'dark', 'notifications': True}
        """
        # BUG: Pickle deserialization of untrusted data - RCE vulnerability
        return pickle.loads(data)
    
    # =========================================================================
    # BUG #18: Broken Authentication (SECURITY - HIGH)
    # Weak session token generation
    # =========================================================================
    def create_session(self, user_id: str) -> str:
        """Create a session token for a user.

        Args:
            user_id (str): The ID of the user.

        Returns:
            str: The generated session token.

        Example:
            >>> service = UserService()
            >>> token = service.create_session("testuser")
            >>> print(token)
            testuser_1678886400  # Example output, timestamp will vary
        ""

... truncated (full file is 15844 chars)

📝 tests/benchmark/seeded_bugs_python.py
"""
BENCHMARK TEST FILE - Contains intentionally seeded bugs for testing InspectAI
================================================================================
This file contains 15 seeded bugs across different categories:
- Security vulnerabilities (SQL injection, hardcoded secrets, XSS)
- Logic errors (off-by-one, wrong operator, missing return)
- Null/None handling issues
- Resource leaks
- Race conditions
- Type errors

DO NOT FIX THESE BUGS - They are intentional for benchmarking purposes.
================================================================================
"""

import os
import sqlite3
import hashlib
import threading
from typing import List, Optional, Dict, Any


# =============================================================================
# BUG #1: SQL Injection Vulnerability (SECURITY - HIGH)
# The user_id is directly interpolated into the SQL query
# =============================================================================
def get_user_by_id(user_id: str) -> dict:
    """Fetch a user from the database by their ID.

    Args:
        user_id (str): The ID of the user to fetch.  This is vulnerable to SQL injection.

    Returns:
        dict: A dictionary containing the user's ID and name, or None if the user is not found.
              Example: {"id": "123", "name": "John Doe"}
        
    Raises:
        sqlite3.Error: If there is an error executing the SQL query.
    """
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    
    # BUG: SQL Injection - user_id is not parameterized
    query = f"SELECT * FROM users WHERE id = '{user_id}'"
    cursor.execute(query)
    
    result = cursor.fetchone()
    conn.close()
    return {"id": result[0], "name": result[1]} if result else None


# =============================================================================
# BUG #2: Hardcoded Secret/API Key (SECURITY - CRITICAL)
# API keys should never be hardcoded in source code
# =============================================================================
API_KEY = "sk-live-abc123def456ghi789jkl012mno345pqr678"
DATABASE_PASSWORD = "super_secret_password_123!"

def make_api_request(endpoint: str) -> dict:
    """Make an authenticated API request.

    Args:
        endpoint (str): The API endpoint to request.

    Returns:
        dict: A dictionary containing the status and endpoint of the request.
              Example: {"status": "ok", "endpoint": "/users"}
    
    Raises:
        Exception: If the API request fails (simulated).
    """
    headers = {"Authorization": f"Bearer {API_KEY}"}
    # Simulated request
    return {"status": "ok", "endpoint": endpoint}


# =============================================================================
# BUG #3: Off-by-One Error (LOGIC - MEDIUM)
# Loop should use range(len(items)) or enumerate, not len(items) + 1
# =============================================================================
def process_items(items: List[str]) -> List[str]:
    """Process each item in the list.

    Args:
        items (List[str]): A list of strings to process.

    Returns:
        List[str]: A list of the processed strings (uppercase).

    Raises:
        IndexError: If the loop iterates beyond the bounds of the list.
    """
    results = []
    
    # BUG: Off-by-one - will cause IndexError on last iteration
    for i in range(len(items) + 1):
        results.append(items[i].upper())
    
    return results


# =============================================================================
# BUG #4: Missing Null Check (LOGIC - MEDIUM)
# Accessing attributes without checking if object is None
# =============================================================================
def get_user_email(user: Optional[Dict]) -> str:
    """Get the user's email address.

    Args:
        user (Optional[Dict]): A dictionary containing user information, including the email address.
                             Can be None.

    Returns:
        str: The user's email address i

... truncated (full file is 15141 chars)


Review the generated docstrings and apply them to your codebase.

comse6998-inspectai[bot]

This comment was marked as duplicate.

Repository owner deleted a comment from comse6998-inspectai Bot Dec 16, 2025
@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 16, 2025

/inspectai_help

@comse6998-inspectai
Copy link
Copy Markdown

🤖 InspectAI Commands

Triggered by: @hj2713

Available Commands

Command Description
/inspectai_review Quick Review - Reviews ONLY the changed lines in your PR. Posts inline comments on issues introduced by your changes. Fast and focused.
/inspectai_bugs Deep Bug Scan - Analyzes entire files (not just diffs) for potential bugs, logic errors, and edge cases. More thorough but slower.
/inspectai_refactor Refactor Suggestions - Suggests code improvements for readability, performance, and maintainability.
/inspectai_security Security Audit - Scans for security vulnerabilities using 4 specialized sub-agents: Injection, Auth, Data Exposure, Dependencies.
/inspectai_tests Test Generation - Generates unit tests for your changed code.
/inspectai_docs Documentation - Generates/updates docstrings for changed Python files using Google-style format.
/inspectai_help Help - Shows this message.

Tips

  • 🚀 Start with /inspectai_review for quick feedback on your changes
  • 🐛 Use /inspectai_bugs when you want a deeper analysis of edge cases
  • 🔐 Run /inspectai_security before merging code that handles user input or authentication
  • Generate tests with /inspectai_tests to improve coverage

Feedback

React with 👍 or 👎 on any InspectAI comment to help improve future reviews!


InspectAI - Your AI Code Review Assistant

@hj2713
Copy link
Copy Markdown
Owner Author

hj2713 commented Dec 16, 2025

/inspectai_review

Copy link
Copy Markdown

@comse6998-inspectai comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 InspectAI Code Review

Triggered by: @hj2713
Files Reviewed: 2
Inline Comments: 29

I've added inline comments on the specific lines that need attention.
Only the changed lines in this PR were reviewed.


Use /inspectai_bugs to scan entire files for bugs.

def ping_server(self, hostname: str) -> str:
"""Ping a server to check if it's online."""
# BUG: Command injection - hostname not sanitized
result = subprocess.run(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 security (critical): Command injection vulnerability. User-provided hostname is directly injected into the ping command without proper sanitization. The shell=True argument exacerbates the risk.
Fix: Use subprocess.run with a list of arguments and avoid shell=True. Sanitize the hostname before passing it to the command.

def load_user_preferences(self, data: bytes) -> dict:
"""Load user preferences from serialized data."""
# BUG: Pickle deserialization of untrusted data - RCE vulnerability
return pickle.loads(data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 security (critical): Insecure deserialization vulnerability. The pickle.loads function is used to deserialize data without any validation, allowing for arbitrary code execution if the data is malicious.
Fix: Avoid using pickle.loads with untrusted data. Use a safer serialization format like JSON or implement proper input validation and sanitization.

"""Create a session token for a user."""
# BUG: Predictable session token based on user_id
import time
token = f"{user_id}_{int(time.time())}" # Easily guessable!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Broken authentication due to weak session token generation. The session token is predictable as it's based on the user ID and current timestamp, making it vulnerable to session hijacking.
Fix: Use a cryptographically secure random number generator to generate unpredictable session tokens.

def delete_user(self, target_user_id: str, requesting_user_id: str) -> bool:
"""Delete a user account."""
# BUG: No authorization check - any user can delete any user
if target_user_id in self.users:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Missing authorization check. Any user can delete any other user's account because there is no validation to ensure the requesting user has the necessary permissions.
Fix: Implement an authorization check to ensure that only authorized users can delete accounts. Verify that the requesting_user_id has permission to delete the target_user_id.

def validate_input(self, text: str) -> bool:
"""Validate user input format."""
# BUG: ReDoS - evil regex with nested quantifiers
pattern = r"^(a+)+$"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 security (medium): ReDoS (Regular Expression Denial of Service) vulnerability. The regex pattern ^(a+)+$ is vulnerable to catastrophic backtracking, which can cause the application to hang or crash when processing specially crafted input.
Fix: Use a more efficient regex pattern or limit the input length to prevent excessive backtracking.

# BUG #12: Mutable Default Argument (LOGIC - MEDIUM)
# Using mutable default argument causes unexpected behavior
# =============================================================================
def add_item_to_list(item: str, item_list: List[str] = []) -> List[str]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 logic (medium): Mutable default argument. The item_list argument has a mutable default value ([]), which is shared across all calls to the function without an explicit item_list argument. This can lead to unexpected behavior where the list accumulates items from previous calls.
Fix: Use None as the default value and create a new list if item_list is None: def add_item_to_list(item: str, item_list: List[str] = None) -> List[str]: item_list = item_list or []; item_list.append(item); return item_list

def parse_user_input(data: str) -> dict:
"""Parse JSON user input."""
# BUG: No try-except - will crash on invalid JSON
parsed = json.loads(data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 error_handling (medium): Unhandled exception. The code doesn't handle potential json.JSONDecodeError exceptions that can occur if the input data is not valid JSON. This will cause the program to crash if invalid JSON is provided.
Fix: Wrap the json.loads() call in a try-except block to handle json.JSONDecodeError exceptions: try: parsed = json.loads(data) except json.JSONDecodeError: return {}

"""Read a file from the user uploads directory."""
# BUG: Path traversal - user can access any file with ../
base_path = "/var/uploads/"
file_path = base_path + filename # No sanitization!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 security (high): Path traversal vulnerability. The code directly concatenates the user-provided filename to the base_path without any sanitization, allowing malicious users to access arbitrary files on the system by using path traversal sequences like ../ in the filename.
Fix: Sanitize the filename to prevent path traversal attacks: file_path = os.path.join(base_path, os.path.basename(filename))

if numbers[mid] == target:
return mid
elif numbers[mid] < target:
left = mid # BUG: Should be mid + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 logic (high): Infinite loop risk. In the binary search algorithm, when numbers[mid] < target, the left pointer is updated to mid instead of mid + 1. This can lead to an infinite loop if the target is greater than numbers[mid] but less than or equal to numbers[right].
Fix: Update the left pointer to mid + 1: left = mid + 1

elif numbers[mid] < target:
left = mid # BUG: Should be mid + 1
else:
right = mid # BUG: Should be mid - 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 logic (high): Infinite loop risk. In the binary search algorithm, when numbers[mid] > target, the right pointer is updated to mid instead of mid - 1. This can lead to an infinite loop if the target is less than numbers[mid] but greater than or equal to numbers[left].
Fix: Update the right pointer to mid - 1: right = mid - 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant