Test Files with seeded bugs for InspectAI Evaluation by hj2713 · Pull Request #19 · hj2713/InspectAI

hj2713 · 2025-12-10T12:07:44Z

Test Files with seeded bugs for InspectAI Evaluation

hj2713 · 2025-12-10T14:38:14Z

/inspectai_help

comse6998-inspectai · 2025-12-10T14:38:19Z

🤖 InspectAI Commands

Triggered by: @hj2713

Available Commands

Command	Description
`/inspectai_review`	Quick Review - Reviews ONLY the changed lines in your PR. Posts inline comments on issues introduced by your changes. Fast and focused.
`/inspectai_bugs`	Deep Bug Scan - Analyzes entire files (not just diffs) for potential bugs, logic errors, and edge cases. More thorough but slower.
`/inspectai_refactor`	Refactor Suggestions - Suggests code improvements for readability, performance, and maintainability.
`/inspectai_security`	Security Audit - Scans for security vulnerabilities using 4 specialized sub-agents: Injection, Auth, Data Exposure, Dependencies.
`/inspectai_tests`	Test Generation - Generates unit tests for your changed code.
`/inspectai_docs`	Documentation - Generates/updates docstrings for changed Python files using Google-style format.
`/inspectai_help`	Help - Shows this message.

Tips

🚀 Start with /inspectai_review for quick feedback on your changes
🐛 Use /inspectai_bugs when you want a deeper analysis of edge cases
🔐 Run /inspectai_security before merging code that handles user input or authentication
✅ Generate tests with /inspectai_tests to improve coverage

Feedback

React with 👍 or 👎 on any InspectAI comment to help improve future reviews!

InspectAI - Your AI Code Review Assistant

hj2713 · 2025-12-10T14:38:42Z

/inspectai_review

comse6998-inspectai

🔍 InspectAI Code Review

Triggered by: @hj2713
Files Reviewed: 2
Inline Comments: 29

I've added inline comments on the specific lines that need attention.
Only the changed lines in this PR were reviewed.

Use /inspectai_bugs to scan entire files for bugs.

comse6998-inspectai · 2025-12-10T14:39:08Z

+    """Read a file from the user uploads directory."""
+    # BUG: Path traversal - user can access any file with ../
+    base_path = "/var/uploads/"
+    file_path = base_path + filename  # No sanitization!


🟠 security (high): Path traversal vulnerability. The code directly concatenates the filename with the base_path without any sanitization, allowing malicious users to access arbitrary files on the system by using path traversal sequences like ../ in the filename.
Fix: Sanitize the filename to prevent path traversal attacks: file_path = os.path.join(base_path, os.path.basename(filename))

comse6998-inspectai · 2025-12-10T14:39:08Z

+        if numbers[mid] == target:
+            return mid
+        elif numbers[mid] < target:
+            left = mid  # BUG: Should be mid + 1


🟠 logic (high): Infinite loop risk. In the binary search, when numbers[mid] < target, the left pointer is updated to mid instead of mid + 1. This can lead to an infinite loop if the target is slightly larger than numbers[mid].
Fix: Update the left pointer to mid + 1: left = mid + 1

hj2713 · 2025-12-10T14:41:11Z

/inspectai_bugs

comse6998-inspectai

🐛 InspectAI Bug Detection

Triggered by: @hj2713
Files Scanned: 2
Issues Found: 57

🔴 Critical: 13 | 🟠 High: 19 | 🟡 Medium: 22 | ⚪ Low: 3

I've added 57 inline comments on issues introduced by your changes.

comse6998-inspectai · 2025-12-10T14:41:47Z

+    """Render a user profile as HTML."""
+    # BUG: XSS vulnerability - user input not escaped
+    html = f"""
+    <div class="profile">


🟡 Unhandled Exception (medium): No exception handling for JSON parsing, which can cause the program to crash on invalid JSON input.
Fix: Add a try-except block to handle potential JSONDecodeError exceptions.

# BUG: XSS vulnerability - user input not escaped html = f""" <div class="profile"> <h1>Welcome, {username}!</h1> <p class="bio">{bio}</p>

🟡 Unhandled Exception (medium): The code doesn't handle exceptions when parsing JSON user input, which can cause the program to crash on invalid JSON.
Fix: Use a try-except block to catch JSONDecodeError.

# BUG: XSS vulnerability - user input not escaped html = f""" <div class="profile"> <h1>Welcome, {username}!</h1> <p class="bio">{bio}</p>

🟡 Runtime Issue (medium): The parse_user_input function does not handle potential json.JSONDecodeError exceptions that can occur when parsing invalid JSON data. This can cause the program to crash if the input is not valid JSON.
Fix: Wrap the json.loads call in a try-except block to catch json.JSONDecodeError and handle it appropriately (e.g., return an error message or a default value).

# BUG: XSS vulnerability - user input not escaped html = f""" <div class="profile"> <h1>Welcome, {username}!</h1> <p class="bio">{bio}</p>

hj2713 · 2025-12-10T14:46:39Z

/inspectai_security

comse6998-inspectai

🔒 InspectAI Security Scan

Triggered by: @hj2713
Files Scanned: 2
Vulnerabilities Found: 16
Risk Score: 🔴 8.7/10

🔴 Critical: 9 | 🟠 High: 7

I've added 16 inline comments on potential security issues.

Use /inspectai_review for code review or /inspectai_bugs for bug detection.

comse6998-inspectai · 2025-12-10T14:46:53Z

+    # User input passed directly to shell command
+    # =========================================================================
+    def ping_server(self, hostname: str) -> str:
+        """Ping a server to check if it's online."""


🔴 🔒 Injection Vulnerability (critical)

The ping_server function takes a hostname as input and passes it directly to the subprocess.run function with shell=True. This allows an attacker to inject arbitrary shell commands by manipulating the hostname. For example, a hostname like "127.0.0.1; rm -rf /" would execute the rm -rf / command on the server.

Remediation: Sanitize the hostname input before passing it to the subprocess.run function. Use shlex.quote to escape the hostname, or avoid shell=True and pass the command as a list of arguments.

comse6998-inspectai · 2025-12-10T14:46:53Z

+    """Fetch a user from the database by their ID."""
+    conn = sqlite3.connect("users.db")
+    cursor = conn.cursor()
+


🟠 🔒 Dependency/Library Security (high)

SQL Injection vulnerability due to direct string interpolation in the SQL query.

Remediation: Use parameterized queries to prevent SQL injection. For example: query = "SELECT * FROM users WHERE id = ?"; cursor.execute(query, (user_id,))

🟠 🔒 Injection Vulnerability (high)

The get_user_by_id function constructs a raw SQL query using f-strings and directly interpolates the user_id parameter without any sanitization or parameterization. This allows an attacker to inject arbitrary SQL code by providing a malicious user_id value, potentially compromising the entire database.

Remediation: Use parameterized queries to prevent SQL injection. Replace the line query = f"SELECT * FROM users WHERE id = \'{user_id}\'" with query = "SELECT * FROM users WHERE id = ?"; cursor.execute(query, (user_id,))

hj2713 · 2025-12-10T14:53:46Z

/inspectai_docs

comse6998-inspectai · 2025-12-10T14:54:42Z

📚 InspectAI Documentation Generator

Triggered by: @hj2713
Files Processed: 2
Files with New Documentation: 2

Updated Files with Docstrings

📝 tests/benchmark/seeded_bugs_api.py

"""
BENCHMARK TEST FILE #2 - API/Web Application Bugs
=================================================
This file simulates a web API with common security and logic bugs.
Contains 10 additional seeded bugs.

DO NOT FIX THESE BUGS - They are intentional for benchmarking purposes.
=================================================
"""

from typing import Optional, Dict, List, Any
import re
import pickle
import subprocess


class UserService:
    """Service for managing users."""
    
    def __init__(self):
        """Initializes the UserService with empty user and session dictionaries."""
        self.users: Dict[str, Dict] = {}
        self.session_tokens: Dict[str, str] = {}
    
    # =========================================================================
    # BUG #16: Command Injection (SECURITY - CRITICAL)
    # User input passed directly to shell command
    # =========================================================================
    def ping_server(self, hostname: str) -> str:
        """Ping a server to check if it's online.

        Args:
            hostname (str): The hostname or IP address to ping.

        Returns:
            str: The output of the ping command.

        Raises:
            subprocess.CalledProcessError: If the ping command returns a non-zero exit code.
            Exception: Any other exception raised by the subprocess.run call.

        Example:
            >>> service = UserService()
            >>> result = service.ping_server("127.0.0.1")
            >>> print(result)
            PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
            64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.024 ms

            --- 127.0.0.1 ping statistics ---
            1 packets transmitted, 1 received, 0% packet loss, time 0ms
            rtt min/avg/max/mdev = 0.024/0.024/0.024/0.000 ms
        """
        # BUG: Command injection - hostname not sanitized
        result = subprocess.run(
            f"ping -c 1 {hostname}",
            shell=True,  # BUG: shell=True with user input
            capture_output=True,
            text=True
        )
        return result.stdout
    
    # =========================================================================
    # BUG #17: Insecure Deserialization (SECURITY - CRITICAL)
    # Using pickle to deserialize untrusted data
    # =========================================================================
    def load_user_preferences(self, data: bytes) -> dict:
        """Load user preferences from serialized data.

        Args:
            data (bytes): The serialized data containing user preferences.

        Returns:
            dict: The deserialized user preferences.

        Raises:
            pickle.UnpicklingError: If the data cannot be deserialized.
            Exception: Any other exception raised by the pickle.loads call.

        Example:
            >>> import pickle
            >>> service = UserService()
            >>> data = pickle.dumps({"theme": "dark", "notifications": True})
            >>> preferences = service.load_user_preferences(data)
            >>> print(preferences)
            {'theme': 'dark', 'notifications': True}
        """
        # BUG: Pickle deserialization of untrusted data - RCE vulnerability
        return pickle.loads(data)
    
    # =========================================================================
    # BUG #18: Broken Authentication (SECURITY - HIGH)
    # Weak session token generation
    # =========================================================================
    def create_session(self, user_id: str) -> str:
        """Create a session token for a user.

        Args:
            user_id (str): The ID of the user.

        Returns:
            str: The generated session token.

        Example:
            >>> service = UserService()
            >>> token = service.create_session("testuser")
            >>> print(token)
            testuser_1678886400  # Example output, timestamp will vary
        ""

... truncated (full file is 15844 chars)

📝 tests/benchmark/seeded_bugs_python.py

"""
BENCHMARK TEST FILE - Contains intentionally seeded bugs for testing InspectAI
================================================================================
This file contains 15 seeded bugs across different categories:
- Security vulnerabilities (SQL injection, hardcoded secrets, XSS)
- Logic errors (off-by-one, wrong operator, missing return)
- Null/None handling issues
- Resource leaks
- Race conditions
- Type errors

DO NOT FIX THESE BUGS - They are intentional for benchmarking purposes.
================================================================================
"""

import os
import sqlite3
import hashlib
import threading
from typing import List, Optional, Dict, Any


# =============================================================================
# BUG #1: SQL Injection Vulnerability (SECURITY - HIGH)
# The user_id is directly interpolated into the SQL query
# =============================================================================
def get_user_by_id(user_id: str) -> dict:
    """Fetch a user from the database by their ID.

    Args:
        user_id (str): The ID of the user to fetch.  This is vulnerable to SQL injection.

    Returns:
        dict: A dictionary containing the user's ID and name, or None if the user is not found.
              Example: {"id": "123", "name": "John Doe"}
        
    Raises:
        sqlite3.Error: If there is an error executing the SQL query.
    """
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    
    # BUG: SQL Injection - user_id is not parameterized
    query = f"SELECT * FROM users WHERE id = '{user_id}'"
    cursor.execute(query)
    
    result = cursor.fetchone()
    conn.close()
    return {"id": result[0], "name": result[1]} if result else None


# =============================================================================
# BUG #2: Hardcoded Secret/API Key (SECURITY - CRITICAL)
# API keys should never be hardcoded in source code
# =============================================================================
API_KEY = "sk-live-abc123def456ghi789jkl012mno345pqr678"
DATABASE_PASSWORD = "super_secret_password_123!"

def make_api_request(endpoint: str) -> dict:
    """Make an authenticated API request.

    Args:
        endpoint (str): The API endpoint to request.

    Returns:
        dict: A dictionary containing the status and endpoint of the request.
              Example: {"status": "ok", "endpoint": "/users"}
    
    Raises:
        Exception: If the API request fails (simulated).
    """
    headers = {"Authorization": f"Bearer {API_KEY}"}
    # Simulated request
    return {"status": "ok", "endpoint": endpoint}


# =============================================================================
# BUG #3: Off-by-One Error (LOGIC - MEDIUM)
# Loop should use range(len(items)) or enumerate, not len(items) + 1
# =============================================================================
def process_items(items: List[str]) -> List[str]:
    """Process each item in the list.

    Args:
        items (List[str]): A list of strings to process.

    Returns:
        List[str]: A list of the processed strings (uppercase).

    Raises:
        IndexError: If the loop iterates beyond the bounds of the list.
    """
    results = []
    
    # BUG: Off-by-one - will cause IndexError on last iteration
    for i in range(len(items) + 1):
        results.append(items[i].upper())
    
    return results


# =============================================================================
# BUG #4: Missing Null Check (LOGIC - MEDIUM)
# Accessing attributes without checking if object is None
# =============================================================================
def get_user_email(user: Optional[Dict]) -> str:
    """Get the user's email address.

    Args:
        user (Optional[Dict]): A dictionary containing user information, including the email address.
                             Can be None.

    Returns:
        str: The user's email address i

... truncated (full file is 15141 chars)

Review the generated docstrings and apply them to your codebase.

hj2713 · 2025-12-16T02:43:43Z

/inspectai_help

comse6998-inspectai · 2025-12-16T02:43:47Z

🤖 InspectAI Commands

Triggered by: @hj2713

Available Commands

Command	Description
`/inspectai_review`	Quick Review - Reviews ONLY the changed lines in your PR. Posts inline comments on issues introduced by your changes. Fast and focused.
`/inspectai_bugs`	Deep Bug Scan - Analyzes entire files (not just diffs) for potential bugs, logic errors, and edge cases. More thorough but slower.
`/inspectai_refactor`	Refactor Suggestions - Suggests code improvements for readability, performance, and maintainability.
`/inspectai_security`	Security Audit - Scans for security vulnerabilities using 4 specialized sub-agents: Injection, Auth, Data Exposure, Dependencies.
`/inspectai_tests`	Test Generation - Generates unit tests for your changed code.
`/inspectai_docs`	Documentation - Generates/updates docstrings for changed Python files using Google-style format.
`/inspectai_help`	Help - Shows this message.

Tips

🚀 Start with /inspectai_review for quick feedback on your changes
🐛 Use /inspectai_bugs when you want a deeper analysis of edge cases
🔐 Run /inspectai_security before merging code that handles user input or authentication
✅ Generate tests with /inspectai_tests to improve coverage

Feedback

React with 👍 or 👎 on any InspectAI comment to help improve future reviews!

InspectAI - Your AI Code Review Assistant

hj2713 · 2025-12-16T03:18:40Z

/inspectai_review

comse6998-inspectai

🔍 InspectAI Code Review

Triggered by: @hj2713
Files Reviewed: 2
Inline Comments: 29

I've added inline comments on the specific lines that need attention.
Only the changed lines in this PR were reviewed.

Use /inspectai_bugs to scan entire files for bugs.

comse6998-inspectai · 2025-12-16T03:19:01Z

+    def ping_server(self, hostname: str) -> str:
+        """Ping a server to check if it's online."""
+        # BUG: Command injection - hostname not sanitized
+        result = subprocess.run(


🔴 security (critical): Command injection vulnerability. User-provided hostname is directly injected into the ping command without proper sanitization. The shell=True argument exacerbates the risk.
Fix: Use subprocess.run with a list of arguments and avoid shell=True. Sanitize the hostname before passing it to the command.

comse6998-inspectai · 2025-12-16T03:19:01Z

+    def load_user_preferences(self, data: bytes) -> dict:
+        """Load user preferences from serialized data."""
+        # BUG: Pickle deserialization of untrusted data - RCE vulnerability
+        return pickle.loads(data)


🔴 security (critical): Insecure deserialization vulnerability. The pickle.loads function is used to deserialize data without any validation, allowing for arbitrary code execution if the data is malicious.
Fix: Avoid using pickle.loads with untrusted data. Use a safer serialization format like JSON or implement proper input validation and sanitization.

comse6998-inspectai · 2025-12-16T03:19:01Z

+        """Create a session token for a user."""
+        # BUG: Predictable session token based on user_id
+        import time
+        token = f"{user_id}_{int(time.time())}"  # Easily guessable!


🟠 security (high): Broken authentication due to weak session token generation. The session token is predictable as it's based on the user ID and current timestamp, making it vulnerable to session hijacking.
Fix: Use a cryptographically secure random number generator to generate unpredictable session tokens.

comse6998-inspectai · 2025-12-16T03:19:01Z

+    def delete_user(self, target_user_id: str, requesting_user_id: str) -> bool:
+        """Delete a user account."""
+        # BUG: No authorization check - any user can delete any user
+        if target_user_id in self.users:


🟠 security (high): Missing authorization check. Any user can delete any other user's account because there is no validation to ensure the requesting user has the necessary permissions.
Fix: Implement an authorization check to ensure that only authorized users can delete accounts. Verify that the requesting_user_id has permission to delete the target_user_id.

comse6998-inspectai · 2025-12-16T03:19:01Z

+    def validate_input(self, text: str) -> bool:
+        """Validate user input format."""
+        # BUG: ReDoS - evil regex with nested quantifiers
+        pattern = r"^(a+)+$"


🟡 security (medium): ReDoS (Regular Expression Denial of Service) vulnerability. The regex pattern ^(a+)+$ is vulnerable to catastrophic backtracking, which can cause the application to hang or crash when processing specially crafted input.
Fix: Use a more efficient regex pattern or limit the input length to prevent excessive backtracking.

comse6998-inspectai · 2025-12-16T03:19:02Z

+# BUG #12: Mutable Default Argument (LOGIC - MEDIUM)
+# Using mutable default argument causes unexpected behavior
+# =============================================================================
+def add_item_to_list(item: str, item_list: List[str] = []) -> List[str]:


🟡 logic (medium): Mutable default argument. The item_list argument has a mutable default value ([]), which is shared across all calls to the function without an explicit item_list argument. This can lead to unexpected behavior where the list accumulates items from previous calls.
Fix: Use None as the default value and create a new list if item_list is None: def add_item_to_list(item: str, item_list: List[str] = None) -> List[str]: item_list = item_list or []; item_list.append(item); return item_list

comse6998-inspectai · 2025-12-16T03:19:02Z

+def parse_user_input(data: str) -> dict:
+    """Parse JSON user input."""
+    # BUG: No try-except - will crash on invalid JSON
+    parsed = json.loads(data)


🟡 error_handling (medium): Unhandled exception. The code doesn't handle potential json.JSONDecodeError exceptions that can occur if the input data is not valid JSON. This will cause the program to crash if invalid JSON is provided.
Fix: Wrap the json.loads() call in a try-except block to handle json.JSONDecodeError exceptions: try: parsed = json.loads(data) except json.JSONDecodeError: return {}

comse6998-inspectai · 2025-12-16T03:19:02Z

+    """Read a file from the user uploads directory."""
+    # BUG: Path traversal - user can access any file with ../
+    base_path = "/var/uploads/"
+    file_path = base_path + filename  # No sanitization!


🟠 security (high): Path traversal vulnerability. The code directly concatenates the user-provided filename to the base_path without any sanitization, allowing malicious users to access arbitrary files on the system by using path traversal sequences like ../ in the filename.
Fix: Sanitize the filename to prevent path traversal attacks: file_path = os.path.join(base_path, os.path.basename(filename))

comse6998-inspectai · 2025-12-16T03:19:02Z

+        if numbers[mid] == target:
+            return mid
+        elif numbers[mid] < target:
+            left = mid  # BUG: Should be mid + 1


🟠 logic (high): Infinite loop risk. In the binary search algorithm, when numbers[mid] < target, the left pointer is updated to mid instead of mid + 1. This can lead to an infinite loop if the target is greater than numbers[mid] but less than or equal to numbers[right].
Fix: Update the left pointer to mid + 1: left = mid + 1

comse6998-inspectai · 2025-12-16T03:19:02Z

+        elif numbers[mid] < target:
+            left = mid  # BUG: Should be mid + 1
+        else:
+            right = mid  # BUG: Should be mid - 1


🟠 logic (high): Infinite loop risk. In the binary search algorithm, when numbers[mid] > target, the right pointer is updated to mid instead of mid - 1. This can lead to an infinite loop if the target is less than numbers[mid] but greater than or equal to numbers[left].
Fix: Update the right pointer to mid - 1: right = mid - 1

Add benchmark test files with 27 seeded bugs for InspectAI evaluation

d259460

Conversation

hj2713 commented Dec 10, 2025

Uh oh!

hj2713 commented Dec 10, 2025

Uh oh!

comse6998-inspectai Bot commented Dec 10, 2025

🤖 InspectAI Commands

Available Commands

Tips

Feedback

Uh oh!

hj2713 commented Dec 10, 2025

Uh oh!

comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

🔍 InspectAI Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comse6998-inspectai Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

comse6998-inspectai Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hj2713 commented Dec 10, 2025

Uh oh!

comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

🐛 InspectAI Bug Detection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comse6998-inspectai Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hj2713 commented Dec 10, 2025

Uh oh!

comse6998-inspectai Bot left a comment

Choose a reason for hiding this comment

🔒 InspectAI Security Scan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comse6998-inspectai Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

comse6998-inspectai Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hj2713 commented Dec 10, 2025

Uh oh!

comse6998-inspectai Bot commented Dec 10, 2025

📚 InspectAI Documentation Generator

Updated Files with Docstrings

Uh oh!

This comment was marked as duplicate.

Uh oh!

hj2713 commented Dec 16, 2025

Uh oh!

comse6998-inspectai Bot commented Dec 16, 2025