OWASP ZAP Benchmark Integration#641
Conversation
📝 WalkthroughWalkthroughThis PR adds comprehensive OWASP ZAP benchmarking support to VulnerableApp. It includes a Python conversion script that transforms ZAP traditional-json reports into benchmark input JSON, a ZAP automation configuration for scanning, complete documentation with workflows and mapping tables, sample data artifacts, and supporting configuration updates. ChangesZAP Benchmarking Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The PR adds substantial new functionality across multiple files with mixed artifact types: a conversion algorithm with dense logic (deduplication, ID normalization), comprehensive test coverage (~235 lines with multiple test categories), configuration files, sample JSON data, and extensive documentation. The changes are coherent and focused on one feature (ZAP benchmarking), but demand reasoning about the conversion logic correctness, test completeness, documentation accuracy, and integration points. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #641 +/- ##
=========================================
Coverage 54.88% 54.88%
Complexity 663 663
=========================================
Files 91 91
Lines 3573 3573
Branches 395 395
=========================================
Hits 1961 1961
Misses 1435 1435
Partials 177 177 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
benchmarks/scripts/test_convert_zap_to_benchmark.py (1)
228-235: ⚡ Quick winAdd a regression test for top-level list input.
Given the converter’s stated support for array-style report roots, add a test for
convert([{"alerts": [...]}])so this contract stays protected.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/scripts/test_convert_zap_to_benchmark.py` around lines 228 - 235, Add a regression test that verifies convert handles a top-level list report root: create a new test (e.g., test_convert_top_level_list_input_returns_findings) in benchmarks/scripts/test_convert_zap_to_benchmark.py that calls convert([{"alerts": [...]}]) using a minimal alert object (matching the keys convert expects) and assert the returned result["findings"] contains the expected finding(s) (or at least is non-empty). Reference the convert function and use a representative alerts payload so the contract for array-style report roots is protected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/scripts/convert_zap_to_benchmark.py`:
- Around line 94-97: The code assumes zap_report is a dict and calls
zap_report.get("site", []), which breaks when zap_report is a list; update the
logic to first check zap_report's type: if isinstance(zap_report, list) set
sites = zap_report, elif isinstance(zap_report, dict) set sites =
zap_report.get("site", []), else set sites = []; then keep the existing
normalization that if isinstance(sites, dict) sites = [sites]. This change
touches the site-extraction logic around the zap_report and sites variables to
safely handle top-level list reports.
In `@benchmarks/ZAP.md`:
- Around line 21-36: Add explicit fence languages to the Markdown code blocks
that contain the ASCII diagrams—specifically the fenced block beginning with
"┌─────────────┐ zap-automation.yaml ┌─────────────────────────┐" and the
other fenced block later (lines 104-124) so they pass MD040; update the opening
fences from ``` to ```text (or another suitable language like ```plain) for both
blocks to improve linter compatibility and readability.
- Around line 202-204: In the benchmarks/ZAP.md descriptive sentence about
`benchmarks/samples/zap-full-findings-sample.json`, fix the wording typo by
changing "includes one deliberately invalid finding" to a clearer phrase such as
"contains one intentionally invalid finding" (or "contains one deliberately
invalid finding") so the sentence reads naturally and avoids the static-analysis
flagged wording issue.
---
Nitpick comments:
In `@benchmarks/scripts/test_convert_zap_to_benchmark.py`:
- Around line 228-235: Add a regression test that verifies convert handles a
top-level list report root: create a new test (e.g.,
test_convert_top_level_list_input_returns_findings) in
benchmarks/scripts/test_convert_zap_to_benchmark.py that calls
convert([{"alerts": [...]}]) using a minimal alert object (matching the keys
convert expects) and assert the returned result["findings"] contains the
expected finding(s) (or at least is non-empty). Reference the convert function
and use a representative alerts payload so the contract for array-style report
roots is protected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 989a3377-d426-4e75-85be-130048f4d495
📒 Files selected for processing (9)
.gitignorebenchmarks/README.mdbenchmarks/ZAP.mdbenchmarks/samples/zap-benchmark-output-example.jsonbenchmarks/samples/zap-full-findings-sample.jsonbenchmarks/scripts/convert_zap_to_benchmark.pybenchmarks/scripts/test_convert_zap_to_benchmark.pybenchmarks/zap-automation.yamldocs/HOW-TO-USE-VulnerableApp.md
| ``` | ||
| ┌─────────────┐ zap-automation.yaml ┌─────────────────────────┐ | ||
| │ VulnerableApp│ ◄────── ZAP scans ────── │ OWASP ZAP (Docker) │ | ||
| │ :9090 │ │ writes zap-raw-report.json│ | ||
| └─────────────┘ └─────────────────────────┘ | ||
| │ | ||
| convert_zap_to_benchmark.py | ||
| │ | ||
| ▼ | ||
| zap-benchmark-input.json | ||
| │ | ||
| POST /scanner/benchmark | ||
| │ | ||
| ▼ | ||
| benchmarks/zap-results.json | ||
| ``` |
There was a problem hiding this comment.
Specify fence languages for markdown code blocks.
The diagram/code fences at Line 21 and Line 104 are missing language identifiers (MD040), which can break lint gates and reduce readability.
Also applies to: 104-124
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 21-21: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmarks/ZAP.md` around lines 21 - 36, Add explicit fence languages to the
Markdown code blocks that contain the ASCII diagrams—specifically the fenced
block beginning with "┌─────────────┐ zap-automation.yaml
┌─────────────────────────┐" and the other fenced block later (lines 104-124) so
they pass MD040; update the opening fences from ``` to ```text (or another
suitable language like ```plain) for both blocks to improve linter compatibility
and readability.
|
|
||
| ### JWT and some cryptographic vulnerability types are not detectable by ZAP | ||
| `CLIENT_SIDE_VULNERABLE_JWT`, `SERVER_SIDE_VULNERABLE_JWT`, | ||
| `INSECURE_CONFIGURATION_JWT`, and `WEB_CACHE_POISONING` have no CWE or WASC |
There was a problem hiding this comment.
can you please create tickets to update these vulnerabilityTypes to add CWE or WASC or ways to compare them?
|
|
||
| The report is also written to `benchmarks/zap-results.json`. | ||
|
|
||
| > **Note on filename:** The current benchmark framework names the output file |
There was a problem hiding this comment.
please create ticket for this. I think we should not have all this in a file. it is difficult to maintain.
| parameters: | ||
| context: VulnerableApp | ||
| url: http://localhost:9090/VulnerableApp | ||
| maxDuration: 5 # minutes |
There was a problem hiding this comment.
can you run zap in insane mode and try to see how many vulnerabulities are covered by zap?
| parameters: | ||
| context: VulnerableApp | ||
| url: http://localhost:9090/VulnerableApp | ||
| maxDuration: 3 # minutes |
There was a problem hiding this comment.
I think we should run zap in insane mode.
| - type: spiderAjax | ||
| parameters: | ||
| context: VulnerableApp | ||
| url: http://localhost:9090/VulnerableApp |
There was a problem hiding this comment.
My suggestion is running zap against modern UI instead of legacy UI
| - type: activeScan | ||
| parameters: | ||
| context: VulnerableApp | ||
| policy: Default Policy |
There was a problem hiding this comment.
I think we should use insane policy
| # Run convert_zap_to_benchmark.py afterwards to produce the benchmark input. | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| env: |
There was a problem hiding this comment.
did we ran ZAP with all the plugins? if not, we should run it with all the plugins as a lot of vulnerabilities are covered by plugins/addons of zap.
| @@ -0,0 +1,102 @@ | |||
| { | |||
There was a problem hiding this comment.
can you please put under benchmark/zap/ ?
also we need to publish final numbers for ZAP benchmark and publicly announce coverage done by ZAP. This is the final report.
preetkaran20
left a comment
There was a problem hiding this comment.
@antriksh-9 Thanks a lot for the PR. can you please trim down a lot of redundant details from ZAP.md. we want ZAP.md to be how to benchmark ZAP steps.
Also ensure that we will be publicizing the numbers so please use the best zap mode with all plugins, we can choose insane mode if needed or we can have a timeframe for which we run each of the DAST tool and run for that time.
You can also add Github action if you think that running locally is complex and run in github action and we can commit it in repo and keep on running to find out how well ZAP is doing against vulnerableapp.
|
@antriksh-9 looking at the code, this is great progress. Thanks a lot for the PR. |
Description
This PR adds a complete OWASP ZAP integration for the benchmarking framework. It enables users to run ZAP against VulnerableApp, convert the scan results, and evaluate ZAP's detection coverage against the application's ground truth using the
POST /scanner/benchmarkendpoint.Closes: #603
Changes
benchmarks/zap-automation.yaml— ZAP Automation Framework plan for scanning VulnerableAppbenchmarks/scripts/convert_zap_to_benchmark.py— converts ZAP JSON report to benchmark input format via CWE/WASC IDsbenchmarks/scripts/test_convert_zap_to_benchmark.py— 25 unit tests for the conversion scriptbenchmarks/samples/zap-full-findings-sample.json— comprehensive sample input validated against real /scanner ground truthbenchmarks/samples/zap-benchmark-output-example.json— real benchmark response showing coverage, missed, and unmatched findingsbenchmarks/ZAP.md— end-to-end ZAP integration and benchmarking guidebenchmarks/README.md— added ZAP to tool-specific guides tabledocs/HOW-TO-USE-VulnerableApp.md— added ZAP benchmark quickstart sectionBenchmark results (validated locally)
Running zap-full-findings-sample.json against a live VulnerableApp instance:
coverage: 35.97% totalExpected: 139 detected: 50 missed: 89 unmatched: 1 ← deliberate false positive (/NonExistentEndpoint/LEVEL_1)NOTE:
The framework currently writes
benchmarks/zap-results.jsonregardless ofscanTypeTests Added:
benchmarks/scripts/test_convert_zap_to_benchmark.py— 25 unit testsRun with:
Expected: valid JSON with
"tool": "ZAP", "scanType": "DAST", and a findings array.Expected:
{ "tool": "ZAP", "coverage": 35.97, "totalExpected": 139, "detected": 50, "missed": 89, "unmatched": 1, "unmatchedItems": [ { "url": "/NonExistentEndpoint/LEVEL_1", ... } ] }Summary by CodeRabbit
New Features
Documentation
Tests
Chores