Skip to content

Add CWE-Bench coding-agent security benchmark + Trent AI#20

Open
v0id-space wants to merge 1 commit into
efij:mainfrom
v0id-space:patch-1
Open

Add CWE-Bench coding-agent security benchmark + Trent AI#20
v0id-space wants to merge 1 commit into
efij:mainfrom
v0id-space:patch-1

Conversation

@v0id-space

Copy link
Copy Markdown

Adds Trent AI's benchmark of five tools (Claude Code Opus 4.7, OpenAI Codex GPT-5.3, Semgrep, CodeQL, Trent) on 28 production CVEs from the CWE-Bench dataset, 3 runs per tool with variance reported. Methodology, caveats, and per-tool breakdown are in the post. Disclosure: I work at Trent AI, whose tool is one of the five benchmarked (and scores best on the strict metric), flagging that clearly so you can weigh it. Happy to reword or move the entry.

Also added Trent AI under 🤖 Agent Orchestration and Loop Safety

Adds Trent AI's benchmark of five tools (Claude Code Opus 4.7, OpenAI Codex GPT-5.3, Semgrep, CodeQL, Trent) on 28 production CVEs from the CWE-Bench dataset, 3 runs per tool with variance reported. Methodology, caveats, and per-tool breakdown are in the post. Disclosure: I work at Trent AI, whose tool is one of the five benchmarked (and scores best on the strict metric) — flagging that clearly so you can weigh it. Happy to reword or move the entry.

Also added Trent AI under 🤖 Agent Orchestration and Loop Safety
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant