OWASP ZAP Benchmark Integration by antriksh-9 · Pull Request #641 · SasanLabs/VulnerableApp

antriksh-9 · 2026-05-24T12:28:11Z

Description

This PR adds a complete OWASP ZAP integration for the benchmarking framework. It enables users to run ZAP against VulnerableApp, convert the scan results, and evaluate ZAP's detection coverage against the application's ground truth using the POST /scanner/benchmark endpoint.

Closes: #603

Changes

benchmarks/zap-automation.yaml — ZAP Automation Framework plan for scanning VulnerableApp
benchmarks/scripts/convert_zap_to_benchmark.py — converts ZAP JSON report to benchmark input format via CWE/WASC IDs
benchmarks/scripts/test_convert_zap_to_benchmark.py — 25 unit tests for the conversion script
benchmarks/samples/zap-full-findings-sample.json — comprehensive sample input validated against real /scanner ground truth
benchmarks/samples/zap-benchmark-output-example.json — real benchmark response showing coverage, missed, and unmatched findings
benchmarks/ZAP.md — end-to-end ZAP integration and benchmarking guide
benchmarks/README.md — added ZAP to tool-specific guides table
docs/HOW-TO-USE-VulnerableApp.md — added ZAP benchmark quickstart section

Benchmark results (validated locally)

Running zap-full-findings-sample.json against a live VulnerableApp instance:

coverage:   35.97%
totalExpected: 139
detected:   50
missed:     89
unmatched:  1  ← deliberate false positive (/NonExistentEndpoint/LEVEL_1)

NOTE:

The framework currently writes benchmarks/zap-results.json regardless of scanType

Tests Added:

benchmarks/scripts/test_convert_zap_to_benchmark.py — 25 unit tests

Run with:

Unit tests

python3 -m pytest benchmarks/scripts/test_convert_zap_to_benchmark.py -v

Conversion script test

python3 benchmarks/scripts/convert_zap_to_benchmark.py \
  --input  benchmarks/samples/zap-full-findings-sample.json \
  --output /tmp/zap-test-output.json

cat /tmp/zap-test-output.json | python3 -m json.tool | head -15

Expected: valid JSON with "tool": "ZAP", "scanType": "DAST", and a findings array.

Benchmark endpoint integration

curl -s -X POST http://localhost:9090/VulnerableApp/scanner/benchmark \
  -H "Content-Type: application/json" \
  -d @benchmarks/samples/zap-full-findings-sample.json | python3 -m json.tool

Expected:

{
  "tool": "ZAP",
  "coverage": 35.97,
  "totalExpected": 139,
  "detected": 50,
  "missed": 89,
  "unmatched": 1,
  "unmatchedItems": [
    { "url": "/NonExistentEndpoint/LEVEL_1", ... }
  ]
}

Summary by CodeRabbit

New Features
- Added benchmarking support for OWASP ZAP DAST scanning tool with automated report conversion and result analysis.
Documentation
- Added comprehensive guides for benchmarking OWASP ZAP, including setup instructions, workflow details, and troubleshooting tips.
Tests
- Added test suite validating report conversion and deduplication logic.
Chores
- Updated configuration to ignore Python cache artifacts.

coderabbitai · 2026-05-24T12:28:22Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive OWASP ZAP benchmarking support to VulnerableApp. It includes a Python conversion script that transforms ZAP traditional-json reports into benchmark input JSON, a ZAP automation configuration for scanning, complete documentation with workflows and mapping tables, sample data artifacts, and supporting configuration updates.

Changes

ZAP Benchmarking Integration

Layer / File(s)	Summary
ZAP Report Conversion Implementation `benchmarks/scripts/convert_zap_to_benchmark.py`, `benchmarks/scripts/test_convert_zap_to_benchmark.py`	Core conversion script normalizes ZAP `cweid`/`wascid` values, deduplicates findings by `(url, cwe, wascId, method)`, and generates benchmark-formatted JSON. Comprehensive test suite covers ID normalization, CWE tag formatting, finding expansion, deduplication, field omission rules, and edge cases with empty/missing inputs.
ZAP Automation Plan `benchmarks/zap-automation.yaml`	YAML configuration orchestrates ZAP scanning: spider crawl for endpoint discovery, AJAX spider for JavaScript-rendered pages, active scan with Default Policy, and traditional-json report generation targeting local VulnerableApp instance at `http://localhost:9090/VulnerableApp`.
Sample Data, Mapping, and End-to-End Guide `benchmarks/samples/zap-full-findings-sample.json`, `benchmarks/samples/zap-benchmark-output-example.json`, `benchmarks/ZAP.md`, `benchmarks/README.md`, `docs/HOW-TO-USE-VulnerableApp.md`, `.gitignore`	Complete ZAP guide covering prerequisites, step-by-step Docker/desktop workflows, conversion process, benchmark submission, result interpretation, sample data artifacts, ZAP alert-to-VulnerabilityType mapping table, known limitations (JWT/crypto detection gaps, SSL findings as unmatched), troubleshooting tips, and CI pipeline automation. README and HOW-TO-USE updated with integration links and quick-start steps. .gitignore extended with Python cache exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The PR adds substantial new functionality across multiple files with mixed artifact types: a conversion algorithm with dense logic (deduplication, ID normalization), comprehensive test coverage (~235 lines with multiple test categories), configuration files, sample JSON data, and extensive documentation. The changes are coherent and focused on one feature (ZAP benchmarking), but demand reasoning about the conversion logic correctness, test completeness, documentation accuracy, and integration points.

Poem

🐰 A ZAP bounces through VulnerableApp's walls,
Converting alerts to benchmark gold,
Spider crawls, AJAX calls,
Finding truths the scanners don't behold!
Now coverage tales can be told. 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.90% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'OWASP ZAP Benchmark Integration' directly and clearly summarizes the main change: adding ZAP integration for benchmarking against VulnerableApp.
Linked Issues check	✅ Passed	The PR comprehensively addresses all coding objectives from issue `#603`: ZAP automation configuration, conversion script with CWE/WASC mapping, unit tests, sample inputs/outputs, benchmark endpoint integration, and complete documentation.
Out of Scope Changes check	✅ Passed	All changes are directly related to ZAP benchmark integration. The .gitignore update for Python artifacts is a minor supporting change to maintain repository cleanliness, which is reasonable infrastructure work.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-05-24T12:29:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.88%. Comparing base (08a2d33) to head (4474237).

Additional details and impacted files

@@            Coverage Diff            @@
##             master     #641   +/-   ##
=========================================
  Coverage     54.88%   54.88%           
  Complexity      663      663           
=========================================
  Files            91       91           
  Lines          3573     3573           
  Branches        395      395           
=========================================
  Hits           1961     1961           
  Misses         1435     1435           
  Partials        177      177

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

benchmarks/scripts/test_convert_zap_to_benchmark.py (1)

228-235: ⚡ Quick win

Add a regression test for top-level list input.

Given the converter’s stated support for array-style report roots, add a test for convert([{"alerts": [...]}]) so this contract stays protected.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/scripts/test_convert_zap_to_benchmark.py` around lines 228 - 235,
Add a regression test that verifies convert handles a top-level list report
root: create a new test (e.g.,
test_convert_top_level_list_input_returns_findings) in
benchmarks/scripts/test_convert_zap_to_benchmark.py that calls
convert([{"alerts": [...]}]) using a minimal alert object (matching the keys
convert expects) and assert the returned result["findings"] contains the
expected finding(s) (or at least is non-empty). Reference the convert function
and use a representative alerts payload so the contract for array-style report
roots is protected.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/scripts/convert_zap_to_benchmark.py`:
- Around line 94-97: The code assumes zap_report is a dict and calls
zap_report.get("site", []), which breaks when zap_report is a list; update the
logic to first check zap_report's type: if isinstance(zap_report, list) set
sites = zap_report, elif isinstance(zap_report, dict) set sites =
zap_report.get("site", []), else set sites = []; then keep the existing
normalization that if isinstance(sites, dict) sites = [sites]. This change
touches the site-extraction logic around the zap_report and sites variables to
safely handle top-level list reports.

In `@benchmarks/ZAP.md`:
- Around line 21-36: Add explicit fence languages to the Markdown code blocks
that contain the ASCII diagrams—specifically the fenced block beginning with
"┌─────────────┐    zap-automation.yaml    ┌─────────────────────────┐" and the
other fenced block later (lines 104-124) so they pass MD040; update the opening
fences from ``` to ```text (or another suitable language like ```plain) for both
blocks to improve linter compatibility and readability.
- Around line 202-204: In the benchmarks/ZAP.md descriptive sentence about
`benchmarks/samples/zap-full-findings-sample.json`, fix the wording typo by
changing "includes one deliberately invalid finding" to a clearer phrase such as
"contains one intentionally invalid finding" (or "contains one deliberately
invalid finding") so the sentence reads naturally and avoids the static-analysis
flagged wording issue.

---

Nitpick comments:
In `@benchmarks/scripts/test_convert_zap_to_benchmark.py`:
- Around line 228-235: Add a regression test that verifies convert handles a
top-level list report root: create a new test (e.g.,
test_convert_top_level_list_input_returns_findings) in
benchmarks/scripts/test_convert_zap_to_benchmark.py that calls
convert([{"alerts": [...]}]) using a minimal alert object (matching the keys
convert expects) and assert the returned result["findings"] contains the
expected finding(s) (or at least is non-empty). Reference the convert function
and use a representative alerts payload so the contract for array-style report
roots is protected.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 989a3377-d426-4e75-85be-130048f4d495

📥 Commits

Reviewing files that changed from the base of the PR and between 08a2d33 and 4474237.

📒 Files selected for processing (9)

.gitignore
benchmarks/README.md
benchmarks/ZAP.md
benchmarks/samples/zap-benchmark-output-example.json
benchmarks/samples/zap-full-findings-sample.json
benchmarks/scripts/convert_zap_to_benchmark.py
benchmarks/scripts/test_convert_zap_to_benchmark.py
benchmarks/zap-automation.yaml
docs/HOW-TO-USE-VulnerableApp.md

coderabbitai · 2026-05-24T12:33:50Z

+```
+┌─────────────┐    zap-automation.yaml    ┌─────────────────────────┐
+│ VulnerableApp│ ◄────── ZAP scans ──────  │ OWASP ZAP (Docker)      │
+│ :9090        │                           │ writes zap-raw-report.json│
+└─────────────┘                           └─────────────────────────┘
+                                                       │
+                              convert_zap_to_benchmark.py
+                                                       │
+                                                       ▼
+                                          zap-benchmark-input.json
+                                                       │
+                              POST /scanner/benchmark
+                                                       │
+                                                       ▼
+                                          benchmarks/zap-results.json
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify fence languages for markdown code blocks.

The diagram/code fences at Line 21 and Line 104 are missing language identifiers (MD040), which can break lint gates and reduce readability.

Also applies to: 104-124

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/ZAP.md` around lines 21 - 36, Add explicit fence languages to the Markdown code blocks that contain the ASCII diagrams—specifically the fenced block beginning with "┌─────────────┐ zap-automation.yaml ┌─────────────────────────┐" and the other fenced block later (lines 104-124) so they pass MD040; update the opening fences from ``` to ```text (or another suitable language like ```plain) for both blocks to improve linter compatibility and readability.

preetkaran20 · 2026-05-24T21:23:52Z

+
+### JWT and some cryptographic vulnerability types are not detectable by ZAP
+`CLIENT_SIDE_VULNERABLE_JWT`, `SERVER_SIDE_VULNERABLE_JWT`,
+`INSECURE_CONFIGURATION_JWT`, and `WEB_CACHE_POISONING` have no CWE or WASC


can you please create tickets to update these vulnerabilityTypes to add CWE or WASC or ways to compare them?

preetkaran20 · 2026-05-24T21:30:38Z

+
+The report is also written to `benchmarks/zap-results.json`.
+
+> **Note on filename:** The current benchmark framework names the output file


please create ticket for this. I think we should not have all this in a file. it is difficult to maintain.

preetkaran20 · 2026-05-24T21:32:43Z

+    parameters:
+      context: VulnerableApp
+      url: http://localhost:9090/VulnerableApp
+      maxDuration: 5          # minutes


can you run zap in insane mode and try to see how many vulnerabulities are covered by zap?

preetkaran20 · 2026-05-24T21:33:09Z

+    parameters:
+      context: VulnerableApp
+      url: http://localhost:9090/VulnerableApp
+      maxDuration: 3          # minutes


I think we should run zap in insane mode.

preetkaran20 · 2026-05-24T21:33:27Z

+  - type: spiderAjax
+    parameters:
+      context: VulnerableApp
+      url: http://localhost:9090/VulnerableApp


My suggestion is running zap against modern UI instead of legacy UI

preetkaran20 · 2026-05-24T21:33:43Z

+  - type: activeScan
+    parameters:
+      context: VulnerableApp
+      policy: Default Policy


I think we should use insane policy

preetkaran20 · 2026-05-24T21:34:38Z

+# Run convert_zap_to_benchmark.py afterwards to produce the benchmark input.
+# ---------------------------------------------------------------------------
+
+env:


did we ran ZAP with all the plugins? if not, we should run it with all the plugins as a lot of vulnerabilities are covered by plugins/addons of zap.

preetkaran20 · 2026-05-24T21:38:13Z

@@ -0,0 +1,102 @@
+{


can you please put under benchmark/zap/ ?

also we need to publish final numbers for ZAP benchmark and publicly announce coverage done by ZAP. This is the final report.

preetkaran20

@antriksh-9 Thanks a lot for the PR. can you please trim down a lot of redundant details from ZAP.md. we want ZAP.md to be how to benchmark ZAP steps.

Also ensure that we will be publicizing the numbers so please use the best zap mode with all plugins, we can choose insane mode if needed or we can have a timeframe for which we run each of the DAST tool and run for that time.

You can also add Github action if you think that running locally is complex and run in github action and we can commit it in repo and keep on running to find out how well ZAP is doing against vulnerableapp.

preetkaran20 · 2026-05-24T22:05:07Z

@antriksh-9 looking at the code, this is great progress. Thanks a lot for the PR.

zap-benchmark-integration

4474237

coderabbitai Bot reviewed May 24, 2026

View reviewed changes