Automated nightly codebase scanner #1573

derekmisler · 2026-02-03T03:10:00Z

Summary

Adds an automated nightly codebase scanner that uses a multi-agent architecture to detect code quality issues and automatically create GitHub issues for findings.

Key features:

Multi-agent system with 5 specialized agents: orchestrator (Claude Sonnet), security analyzer (OpenAI o3-mini), bug detector (Gemini Flash), documentation checker (Claude
Haiku), and issue reporter (Claude Haiku)
Persistent memory via SQLite database cached between runs - learns from previous scans to avoid false positives
Automated issue creation using gh CLI with duplicate detection and a strict 2-issue-per-run limit
Multi-provider support leveraging different models' strengths (reasoning models for security, fast models for bugs)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Root Orchestrator                        │
│                   (claude-sonnet-4-5)                       │
│  • Loads memory from previous scans                         │
│  • Delegates to sub-agents                                  │
│  • Filters/prioritizes findings                             │
│  • Adds CATEGORY field before forwarding to reporter        │
└─────────────────────────────────────────────────────────────┘
          │              │                │              │
          ▼              ▼                ▼              ▼
    ┌──────────┐  ┌──────────┐  ┌───────────────┐  ┌──────────┐
    │ Security │  │   Bugs   │  │ Documentation │  │ Reporter │
    │ (o3-mini)│  │ (gemini) │  │    (haiku)    │  │ (haiku)  │
    └──────────┘  └──────────┘  └───────────────┘  └──────────┘

Files Changed

.github/agents/nightly-scanner.yaml - Multi-agent configuration with instructions, toolsets, and permissions
.github/workflows/nightly-scan.yml - GitHub Actions workflow (runs daily at 6am UTC, supports dry-run mode)

Test plan

Trigger workflow manually with dry-run: true to verify agents run without creating issues
Verify the scanner memory database is created and cached
Trigger without dry-run: true on a branch with a known issue to verify issue creation
Verify duplicate detection prevents recreating the same issue

krissetto · 2026-02-03T13:29:10Z

Might be good to try out some alloy models in this and other actions, could potentially give deeper perspective to the agents and better overall results

Adds a read-only agent that scans the codebase daily and creates GitHub issues for security vulnerabilities, bugs, and documentation gaps. - Runs daily at 6am UTC (or manual trigger) - Creates max 2 issues per run to avoid flooding - Deduplicates against existing open issues with 'nightly-scan' label - Dry-run mode for testing - Strong anti-hallucination rules (read-only, must verify files exist)

- Root agent (claude-sonnet) orchestrates scan across sub-agents - Security sub-agent (claude-opus) for vulnerability detection - Bugs sub-agent (claude-sonnet) for logic errors and resource leaks - Documentation sub-agent (claude-haiku) for doc gap detection - Add GitHub Actions cache for persistent scanner memory - Memory stores skip patterns, context, and feedback across runs - Each sub-agent has strict grounding rules to prevent hallucinations

Agent improvements: - Documentation agent reads all markdown files before analysis - All sub-agents explicitly accept empty results as valid outcome - Added read_multiple_files tool to documentation agent Workflow improvements: - Pin actions/cache to v4.2.0 SHA for security - Use static cache key (matches cagent-action pattern)

- Replace custom JSON memory with cagent's SQLite memory toolset - Agent uses get_memories/add_memory tools instead of file I/O - Memory path resolves to .github/agents/scanner-memory.db - Removes manual JSON initialization step

Sub-agents now return findings in simple text format: FILE: path/to/file.go LINE: 123 SEVERITY: high ... Benefits over JSON: - More natural for LLM output - Less prone to formatting errors - Matches cagent-action PR review pattern Root agent still outputs JSON for workflow parsing.

Move issue creation from workflow to agent: - New `reporter` sub-agent uses `gh` CLI to create issues - Checks for duplicates before creating - Selects appropriate labels based on category - Workflow reduced from 175 lines to 55 lines Dry-run mode now passed as prompt to agent.

Model assignments: - Security: openai/o3-mini (reasoning model for subtle vulnerabilities) - Bugs: google/gemini-2.5-flash (fast, good at Go code analysis) - Documentation: anthropic/claude-haiku (sufficient for simpler task) - Orchestrator: anthropic/claude-sonnet (coordination) - Reporter: anthropic/claude-haiku (formatting + gh commands) Workflow now passes all three API keys: - ANTHROPIC_API_KEY - OPENAI_API_KEY - GOOGLE_API_KEY (mapped from GEMINI_API_KEY secret)

Fixes from code review: 1. Restrict shell permissions - Changed `gh *` to `gh issue list *` and `gh issue create *` - Principle of least privilege 2. CATEGORY field mismatch - Root agent now adds CATEGORY when forwarding to reporter - Added explicit "Forwarding to reporter" section with example 3. Inconsistent array/NO_ISSUES terminology - Changed "Return an empty array `[]`" to "Output `NO_ISSUES`" - Consistent across all three analysis agents 4. Documentation trigger clarity - Changed to "ONLY run if BOTH security AND bugs returned `NO_ISSUES`" - Unambiguous trigger condition 5. Better duplicate detection - Changed from downloading 100 issues to `gh issue list --search` - Searches by file path in issue body 6. Sub-agent failure handling - Added explicit error handling strategy - Log errors and continue with other agents - Report partial results if some agents fail - Added FAILED status to reporter output

derekmisler · 2026-02-03T20:17:24Z

/review

github-actions

Review Summary

Found several issues in the new nightly scanner configuration, including a critical cache key bug that will prevent memory persistence from working after the first run, security concerns with command patterns, and reliance on LLM instruction-following for critical error handling.

Key Issues:

🔴 HIGH: Cache key prevents memory updates after first run
🟡 MEDIUM: Shell permission wildcards too broad
🟡 MEDIUM: Potential command injection via file paths
🟡 MEDIUM: Error handling relies on LLM instructions only
🟢 LOW: Issue creation limit not enforced programmatically
🟢 LOW: Timeout may be insufficient for large repos

.github/workflows/nightly-scan.yml

github-actions · 2026-02-03T20:29:24Z

.github/agents/nightly-scanner.yaml

+      - type: shell
+
+permissions:
+  allow:


Overly permissive shell command wildcards

The permissions shell:cmd=gh issue list * and shell:cmd=gh issue create * are overly broad and allow the agent to execute these GitHub CLI commands with ANY arguments. While the GITHUB_TOKEN permissions provide some protection by limiting repository access, the wildcard patterns still allow potentially problematic commands like:

Exfiltration attempts via --repo flags to other accessible repositories

Resource exhaustion via extreme --limit values

Creation of issues with arbitrary/malicious content, labels, or assignees

Fix: Use more restrictive patterns that only allow the specific flag combinations needed:

permissions: allow: - shell:cmd=gh issue list --state * --label * --search * - shell:cmd=gh issue create --title * --label * --body *

This maintains functionality while reducing the attack surface.

github-actions · 2026-02-03T20:29:24Z

.github/agents/nightly-scanner.yaml

+      ```
+
+      ## Workflow
+


Shell command injection vulnerability in duplicate detection

The instruction demonstrates constructing a gh command with file paths embedded in the search query:

gh issue list --label automated --state open --search "in:body \"path/to/file.go\" \"line 123\""

If a malicious file is named with shell metacharacters (e.g., file$(whoami).go, file;rm -rf.go, or files with backticks), these could be interpreted by the shell when the LLM-generated command executes.

While line 430 instructs to "quote file paths and line numbers", this is merely a natural language instruction to an LLM, not enforced code-level protection. LLMs can make mistakes with complex escaping requirements.

Fix: Use structured approaches that avoid shell interpolation:

Use environment variables for dynamic values

Use gh CLI with --json output and programmatic filtering

Implement explicit sanitization/escaping before command construction

Add validation to reject files with suspicious characters before processing

github-actions · 2026-02-03T20:29:24Z

.github/agents/nightly-scanner.yaml

+
+      ## Your workflow
+
+      1. Call `get_memories` to understand what to skip from previous runs


Error handling relies on LLM instruction-following

Lines 52-75 define error handling logic purely through natural language instructions: "If a sub-agent fails (timeout, API error), log the error and continue with other agents".

This is not actual error handling code - it's text instructions that rely on the LLM correctly detecting errors, formatting them properly, and continuing execution. If a sub-agent API call fails, the orchestrator agent might:

Fail to detect the error

Hallucinate success

Skip remaining agents despite instructions

Produce malformed output

For an automated nightly scanner that creates GitHub issues, this could lead to silent failures, missed findings, or incorrect issue creation.

Recommendation: Add system-level error boundaries, retry logic, or at minimum schema-validated output requirements rather than relying solely on LLM instruction-following for critical error handling.

github-actions · 2026-02-03T20:29:24Z

.github/agents/nightly-scanner.yaml

+      SEVERITY: critical|high|medium
+      TITLE: Brief description
+      CODE: exact code snippet
+      PROBLEM: explanation


Issue creation limit not enforced programmatically

Line 346 contains only an LLM instruction ("ENFORCE: Process at most 2 findings") without programmatic enforcement. While the PR description claims a "strict 2-issue-per-run limit," the implementation relies entirely on the LLM following instructions correctly.

LLMs can fail to follow instructions due to prompt injection, model drift, context confusion, or simple misunderstanding, potentially leading to runaway issue creation.

Recommendation: Implement a hard limit at the system level:

Add a counter/validation layer before issue creation

Configure the agent runtime to enforce hard limits on tool usage

Add workflow-level validation of created issues

Note: Severity is low because duplicate detection may catch excess issues, and impact is limited to issue spam.

github-actions · 2026-02-03T20:29:24Z

.github/workflows/nightly-scan.yml

+          prompt: ${{ inputs.dry-run && 'DRY RUN MODE: Do not create any issues. Just report what you would create.' || '' }}
+          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
+          google-api-key: ${{ secrets.GEMINI_API_KEY }}


Timeout may be insufficient for large repositories

The timeout of 600 seconds (10 minutes) controls the entire nightly scan workflow execution. While this may be sufficient for small-to-medium repositories, it is likely insufficient for large repositories due to:

Multi-agent architecture with 5 specialized agents

Security analyzer uses OpenAI o3-mini (reasoning model with 30-60+ second response times)

Directory tree traversal and file I/O for large codebases

SQLite database operations

Multiple API calls to different providers

If scans consistently timeout, no issues will ever be reported.

Recommendation: Increase timeout to 30-60 minutes, or make it configurable based on repository size, especially given the use of reasoning models.

derekmisler self-assigned this Feb 3, 2026

derekmisler marked this pull request as ready for review February 3, 2026 03:13

derekmisler requested a review from a team as a code owner February 3, 2026 03:13

derekmisler added 9 commits February 3, 2026 11:55

refactor: use cagent's built-in memory system

d7ba704

- Replace custom JSON memory with cagent's SQLite memory toolset - Agent uses get_memories/add_memory tools instead of file I/O - Memory path resolves to .github/agents/scanner-memory.db - Removes manual JSON initialization step

refactor: PR feedback

23e46fb

derekmisler force-pushed the nightly-issue-scanner branch from 0d1c6d7 to 23e46fb Compare February 3, 2026 16:55

derekmisler marked this pull request as draft February 3, 2026 17:08

github-actions bot requested changes Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated nightly codebase scanner #1573

Automated nightly codebase scanner #1573

derekmisler commented Feb 3, 2026 •

edited

Loading

Uh oh!

krissetto commented Feb 3, 2026

Uh oh!

derekmisler commented Feb 3, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

github-actions bot Feb 3, 2026

Uh oh!

github-actions bot Feb 3, 2026

Uh oh!

github-actions bot Feb 3, 2026

Uh oh!

github-actions bot Feb 3, 2026

Uh oh!

github-actions bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Your workflow

		1. Call `get_memories` to understand what to skip from previous runs

Automated nightly codebase scanner #1573

Are you sure you want to change the base?

Automated nightly codebase scanner #1573

Conversation

derekmisler commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Files Changed

Test plan

Uh oh!

krissetto commented Feb 3, 2026

Uh oh!

derekmisler commented Feb 3, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derekmisler commented Feb 3, 2026 •

edited

Loading