Skip to content

Add factorial function#7

Closed
Sekiro4321 wants to merge 6 commits intomainfrom
feature/new-function
Closed

Add factorial function#7
Sekiro4321 wants to merge 6 commits intomainfrom
feature/new-function

Conversation

@Sekiro4321
Copy link
Copy Markdown
Owner

Testing AI test generation

@github-actions
Copy link
Copy Markdown

🤖 AI Code Review

Here's a detailed review of the provided code diff:

1. .github/workflows/test-gen.yml

  • Severity: MEDIUM

    • Description: The "Commit generated tests" step contains a duplicate block of git add tests/, git commit, and git push commands. The second identical block will effectively re-execute the same actions immediately after the first, which is redundant and likely unintended.
    • Suggested fix: Remove the duplicate block.
    -          git add tests/
    -          git commit -m "Auto-generate tests for updated Python files [skip ci]"
    -          git push
  • Severity: MEDIUM

    • Description: The workflow uses contents: write and pull-requests: write permissions. While necessary for the intended function (committing to the PR branch), the git diff --name-only origin/${{ github.base_ref }}..HEAD -- '*.py' | grep -v '^tests/' | tr '\n' ' ' command generates a list of files. This list is then directly passed to python scripts/generate_tests.py as command-line arguments. Although Python's sys.argv typically handles arguments without shell expansion, there's always a slight risk if filenames could be maliciously crafted with special characters that might be misinterpreted by the underlying shell or subsequent commands within the Python script (e.g., if the Python script later uses subprocess.run(shell=True) with these paths, though generate_tests.py does not).
    • Suggested fix: To mitigate this, consider passing file paths via a more robust method like a temporary file or using xargs with git diff -z and xargs -0 if processing within the shell, although for Python sys.argv is generally safe. Ensure that generate_tests.py is robust against unusual file paths (e.g., paths containing quotes, spaces, newlines). For GitHub Actions specifically, github.head_ref combined with actions/checkout and then git diff is common. The main defense lies in generate_tests.py not re-interpreting these paths as shell commands.
  • Severity: LOW

    • Description: The fetch-depth: 0 option will fetch the entire git history, which can be slow for very large repositories. For a PR-based workflow that only needs to compare the current branch to its base, a smaller fetch depth is often sufficient.
    • Suggested fix: Consider using a shallower fetch depth, e.g., fetch-depth: 2 or fetch-depth: 10, if full history is not strictly required. However, for a git diff origin/${{ github.base_ref }}..HEAD to work correctly and compare against potentially older commits, fetch-depth: 0 is often necessary to ensure the entire history of the base ref is available. If performance becomes an issue, this is an area to optimize.
  • Severity: LOW

    • Description: The generated tests are committed with [skip ci]. While this prevents an infinite loop of CI triggers, it means the generated tests themselves are never actually run by CI when they are first introduced. This reduces the immediate value and trustworthiness of the auto-generated tests.
    • Suggested fix: Consider a two-step approach:
      1. Generate tests and commit with [skip ci].
      2. Optionally, have a separate scheduled workflow or a separate job within the PR workflow (that doesn't trigger the [skip ci] mechanism) that runs all tests, including the newly generated ones, on a merge to the main branch or periodically. Or, alternatively, if the generation logic is very stable, accept that they'll be tested on subsequent PRs that include those files.

2. app.py

  • Severity: LOW
    • Description: The file app.py is missing a newline at the end. This is a common style preference and can sometimes cause issues with certain build tools or diff utilities.
    • Suggested fix: Add a newline character at the end of the file.

3. scripts/dangerous.py

  • Severity: HIGH

    • Description: The run_command function uses subprocess.call(user_input, shell=True). When shell=True is used with user-provided input, it creates a severe command injection vulnerability. An attacker can craft the user_input string to execute arbitrary commands on the system where this script runs, potentially leading to remote code execution, data exfiltration, or system compromise.
    • Suggested fix: Never use shell=True with untrusted or user-supplied input. If you must execute an external command, pass the command and its arguments as a list to subprocess.call (or subprocess.run), e.g., subprocess.call(["command", "arg1", "arg2"]). If a shell is absolutely required, carefully sanitize and quote all user input using shlex.quote().
      import subprocess
      # import shlex # if shell is absolutely necessary and input must be quoted
      
      def run_command_safe(user_input_list): # Expect input as a list of command and arguments
          # Example for a simple command, assuming user_input_list is ["echo", "hello world"]
          subprocess.run(user_input_list, check=True) # Use subprocess.run for modern Python, check=True for error handling
      
      # If you truly need to use a shell, e.g., for pipes, and must incorporate user input:
      # def run_command_shell_safe(command_template, user_arg):
      #     # Example: command_template = "grep {} /var/log/syslog"
      #     # user_arg = "malicious_input"
      #     # This is still dangerous, prefer list-based args if possible.
      #     # shlex.quote() can help, but it's complex to get right for full commands.
      #     safe_arg = shlex.quote(user_arg)
      #     full_command = command_template.format(safe_arg)
      #     subprocess.run(full_command, shell=True, check=True)
    • Additional Context: This script is named dangerous.py, which suggests awareness of its nature. If this file is intentionally dangerous for testing purposes, it should be clearly documented as such and excluded from any build/deployment processes.
  • Severity: HIGH

    • Description: The API_KEY is hardcoded directly into the script (API_KEY = "sk-live-abc123def456"). Hardcoding sensitive credentials, especially "live" ones, into source code is a major security vulnerability. It exposes the API key to anyone with access to the repository, including its history. This key could be compromised if the repository is ever publicly exposed or accessed by an unauthorized individual.
    • Suggested fix: Never hardcode API keys or other sensitive credentials directly in source code. Instead, use environment variables, a secure secret management system (e.g., HashiCorp Vault, AWS Secrets Manager), or configuration files that are explicitly excluded from version control (e.g., using .gitignore). For development, a .env file loaded via python-dotenv is common. In the context of the GitHub Actions workflow, the GEMINI_API_KEY is already correctly handled as a GitHub secret, so the API_KEY in this file is a separate issue.
      import os
      
      # Instead of: API_KEY = "sk-live-abc123def456"
      API_KEY = os.environ.get("MY_API_KEY")
      if not API_KEY:
          raise ValueError("MY_API_KEY environment variable not set.")

4. scripts/generate_tests.py

  • Severity: HIGH

    • Description: The extract_functions function has a critical bug. The functions.append(...) statement is outside the if isinstance(node, ast.FunctionDef): block. This means func_name, args, docstring, and source_code will only be assigned if a FunctionDef node is encountered. If no functions are found or if other nodes are iterated over before a function, these variables will not be defined, leading to an UnboundLocalError. Furthermore, if multiple functions exist, ast.walk will iterate over all nodes, and the append will repeatedly add the last processed function's data for every single node in the AST, leading to incorrect and duplicated function information.
    • Suggested fix: Move the functions.append(...) statement inside the if isinstance(node, ast.FunctionDef): block so that it only appends a function's data when a function definition node is actually found and its variables are correctly assigned.
      def extract_functions(file_path):
          """parse the python file and extract function definitions""" 
          with open(file_path, 'r') as f: 
              source = f.read()
          
          tree = ast.parse(source)
          functions = []
      
          for node in ast.walk(tree):
              if isinstance(node, ast.FunctionDef): 
                  func_name = node.name
                  args = [arg.arg for arg in node.args.args]
                  docstring = ast.get_docstring(node) or ""
                  source_code = ast.get_source_segment(source, node)
                  functions.append({ # Moved inside the if block
                      'name': func_name,
                      'args': args,
                      'docstring': docstring,
                      'source_code': source_code
                  })
          return functions
  • Severity: MEDIUM

    • Description: The generate_tests_for_functions function directly writes the raw text output from the Gemini LLM to a file without any validation, sanitization, or parsing. If the LLM generates malicious code, non-Python code, or code that significantly deviates from the prompt's requirements (e.g., introducing os.system calls), this code will be committed to the repository. This represents a supply chain risk, as an attacker could potentially "prompt-inject" the LLM to generate harmful code.
    • Suggested fix: Implement validation steps for the LLM output:
      1. Parse and validate: Attempt to parse the generated test code using ast.parse to ensure it's syntactically valid Python.
      2. Lint/Format: Run a linter (e.g., Black, Flake8) over the generated code.
      3. Keyword filtering (optional but good for high-risk scenarios): Implement basic keyword filtering for highly dangerous functions (e.g., os.system, subprocess.run(shell=True), eval, exec) if the risk model for your LLM prompts necessitates it.
      4. Review (manual or automated): The generated tests are not run, only committed. While [skip ci] prevents an immediate run, someone should review the code before merge. If fully automated, consider a separate job that does run the generated tests to catch errors.
  • Severity: MEDIUM

    • Description: The main function overwrites the same file, tests/test_generated.py, every time it runs. This means if the workflow is triggered multiple times for different PRs, or even for the same PR with different changes, any previously generated tests (even from other files or runs) will be deleted and replaced. This makes the generated tests volatile and prevents accumulation of test coverage.
    • Suggested fix: Instead of overwriting a single file, consider generating a unique test file for each Python file that had changes, or for each run.
      • Option A (Recommended): Create tests/test_<original_file_name>_generated.py.
        # Inside main() loop for file_path
        # ...
        if all_tests:
            os.makedirs('tests', exist_ok=True)
            # Create a unique test file name based on the original file
            # Replace dots and slashes to make it a valid filename
            base_name = os.path.basename(file_path).replace('.py', '')
            test_file_name = f"test_{base_name}_generated.py"
            test_file_path = os.path.join('tests', test_file_name)
        
            with open(test_file_path, 'w') as f:
                f.write("import pytest\n")
                f.write(f"from ..{file_path.replace('/', '.')}\n\n") # Import the original module
                f.write("\n\n".join(all_tests))
            print(f"Generated tests written to: {test_file_path}")
            all_tests = [] # Clear for next file
      • Option B: Append to test_generated.py (less ideal as it could lead to very large files and duplicate imports).
      • Option C: Use a timestamped file name (tests/test_generated_<timestamp>.py) and potentially clean up old files.
  • Severity: LOW

    • Description: The script assumes GEMINI_API_KEY is present in the environment when genai.Client() is called. If the script were run outside the GitHub Actions environment without this environment variable set, it would raise a google.api_core.exceptions.GoogleAPIError or similar, potentially causing a less graceful failure.
    • Suggested fix: Explicitly check for the GEMINI_API_KEY environment variable and provide a clear error message if it's missing.
      import os
      # ...
      
      def generate_tests_for_functions(func_info):
          """Use Gemini to generate pytest tests for a function"""
          if "GEMINI_API_KEY" not in os.environ:
              raise ValueError("GEMINI_API_KEY environment variable is not set.")
          
          genai.configure(api_key=os.environ["GEMINI_API_KEY"]) # Configure API key explicitly
          client = genai.GenerativeModel("gemini-2.5-flash") # Direct model access for clarity and common use
      
          # ... rest of the function
          response = client.generate_content(
              contents=[prompt] # Use list for contents
          )
          return response.text
  • Severity: LOW

    • Description: The generate_tests_for_functions function lacks robust error handling for API calls. Network issues, rate limiting, or unexpected responses from the Gemini API could cause the script to crash or return empty/malformed results without proper logging or retry mechanisms.
    • Suggested fix: Wrap the API call in a try-except block to catch potential exceptions (e.g., google.api_core.exceptions.ClientError, requests.exceptions.RequestException). Implement basic retry logic for transient errors.
      from google.api_core import exceptions
      import time
      
      def generate_tests_for_functions(func_info):
          # ... API key check and client setup ...
          
          retries = 3
          for attempt in range(retries):
              try:
                  response = client.generate_content(contents=[prompt])
                  # You might also want to check response.candidates for safety ratings
                  # and whether text exists, e.g., if response.prompt_feedback.block_reason:
                  return response.text
              except exceptions.ResourceExhausted as e: # Rate limit
                  print(f"Rate limit hit, retrying in {2**(attempt+1)} seconds... ({e})")
                  time.sleep(2**(attempt+1))
              except exceptions.ClientError as e:
                  print(f"API Error during test generation for {func_info['name']}: {e}")
                  return f"# ERROR: Failed to generate tests due to API error: {e}"
              except Exception as e:
                  print(f"Unexpected error during test generation: {e}")
                  return f"# ERROR: Unexpected error generating tests: {e}"
          return "# ERROR: Failed to generate tests after multiple retries."

Severity Summary:

SEVERITY_SUMMARY: CRITICAL


Powered by Gemini AI

@github-actions github-actions bot added the ai-review: critical AI review found critical issues label Feb 12, 2026
@Sekiro4321 Sekiro4321 closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review: critical AI review found critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant