Skip to content

OWASP ZAP Benchmark Integration#641

Open
antriksh-9 wants to merge 1 commit into
SasanLabs:masterfrom
antriksh-9:zap-benchmark
Open

OWASP ZAP Benchmark Integration#641
antriksh-9 wants to merge 1 commit into
SasanLabs:masterfrom
antriksh-9:zap-benchmark

Conversation

@antriksh-9
Copy link
Copy Markdown
Member

@antriksh-9 antriksh-9 commented May 24, 2026

Description

This PR adds a complete OWASP ZAP integration for the benchmarking framework. It enables users to run ZAP against VulnerableApp, convert the scan results, and evaluate ZAP's detection coverage against the application's ground truth using the POST /scanner/benchmark endpoint.

Closes: #603

Changes

  • benchmarks/zap-automation.yaml — ZAP Automation Framework plan for scanning VulnerableApp
  • benchmarks/scripts/convert_zap_to_benchmark.py — converts ZAP JSON report to benchmark input format via CWE/WASC IDs
  • benchmarks/scripts/test_convert_zap_to_benchmark.py — 25 unit tests for the conversion script
  • benchmarks/samples/zap-full-findings-sample.json — comprehensive sample input validated against real /scanner ground truth
  • benchmarks/samples/zap-benchmark-output-example.json — real benchmark response showing coverage, missed, and unmatched findings
  • benchmarks/ZAP.md — end-to-end ZAP integration and benchmarking guide
  • benchmarks/README.md — added ZAP to tool-specific guides table
  • docs/HOW-TO-USE-VulnerableApp.md — added ZAP benchmark quickstart section

Benchmark results (validated locally)

Running zap-full-findings-sample.json against a live VulnerableApp instance:

coverage:   35.97%
totalExpected: 139
detected:   50
missed:     89
unmatched:  1  ← deliberate false positive (/NonExistentEndpoint/LEVEL_1)

NOTE:

The framework currently writes benchmarks/zap-results.json regardless of scanType

Tests Added:

benchmarks/scripts/test_convert_zap_to_benchmark.py — 25 unit tests

Run with:

  1. Unit tests
python3 -m pytest benchmarks/scripts/test_convert_zap_to_benchmark.py -v
  1. Conversion script test
python3 benchmarks/scripts/convert_zap_to_benchmark.py \
  --input  benchmarks/samples/zap-full-findings-sample.json \
  --output /tmp/zap-test-output.json

cat /tmp/zap-test-output.json | python3 -m json.tool | head -15

Expected: valid JSON with "tool": "ZAP", "scanType": "DAST", and a findings array.

  1. Benchmark endpoint integration
curl -s -X POST http://localhost:9090/VulnerableApp/scanner/benchmark \
  -H "Content-Type: application/json" \
  -d @benchmarks/samples/zap-full-findings-sample.json | python3 -m json.tool

Expected:

{
  "tool": "ZAP",
  "coverage": 35.97,
  "totalExpected": 139,
  "detected": 50,
  "missed": 89,
  "unmatched": 1,
  "unmatchedItems": [
    { "url": "/NonExistentEndpoint/LEVEL_1", ... }
  ]
}

Summary by CodeRabbit

  • New Features

    • Added benchmarking support for OWASP ZAP DAST scanning tool with automated report conversion and result analysis.
  • Documentation

    • Added comprehensive guides for benchmarking OWASP ZAP, including setup instructions, workflow details, and troubleshooting tips.
  • Tests

    • Added test suite validating report conversion and deduplication logic.
  • Chores

    • Updated configuration to ignore Python cache artifacts.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive OWASP ZAP benchmarking support to VulnerableApp. It includes a Python conversion script that transforms ZAP traditional-json reports into benchmark input JSON, a ZAP automation configuration for scanning, complete documentation with workflows and mapping tables, sample data artifacts, and supporting configuration updates.

Changes

ZAP Benchmarking Integration

Layer / File(s) Summary
ZAP Report Conversion Implementation
benchmarks/scripts/convert_zap_to_benchmark.py, benchmarks/scripts/test_convert_zap_to_benchmark.py
Core conversion script normalizes ZAP cweid/wascid values, deduplicates findings by (url, cwe, wascId, method), and generates benchmark-formatted JSON. Comprehensive test suite covers ID normalization, CWE tag formatting, finding expansion, deduplication, field omission rules, and edge cases with empty/missing inputs.
ZAP Automation Plan
benchmarks/zap-automation.yaml
YAML configuration orchestrates ZAP scanning: spider crawl for endpoint discovery, AJAX spider for JavaScript-rendered pages, active scan with Default Policy, and traditional-json report generation targeting local VulnerableApp instance at http://localhost:9090/VulnerableApp.
Sample Data, Mapping, and End-to-End Guide
benchmarks/samples/zap-full-findings-sample.json, benchmarks/samples/zap-benchmark-output-example.json, benchmarks/ZAP.md, benchmarks/README.md, docs/HOW-TO-USE-VulnerableApp.md, .gitignore
Complete ZAP guide covering prerequisites, step-by-step Docker/desktop workflows, conversion process, benchmark submission, result interpretation, sample data artifacts, ZAP alert-to-VulnerabilityType mapping table, known limitations (JWT/crypto detection gaps, SSL findings as unmatched), troubleshooting tips, and CI pipeline automation. README and HOW-TO-USE updated with integration links and quick-start steps. .gitignore extended with Python cache exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The PR adds substantial new functionality across multiple files with mixed artifact types: a conversion algorithm with dense logic (deduplication, ID normalization), comprehensive test coverage (~235 lines with multiple test categories), configuration files, sample JSON data, and extensive documentation. The changes are coherent and focused on one feature (ZAP benchmarking), but demand reasoning about the conversion logic correctness, test completeness, documentation accuracy, and integration points.

Poem

🐰 A ZAP bounces through VulnerableApp's walls,
Converting alerts to benchmark gold,
Spider crawls, AJAX calls,
Finding truths the scanners don't behold!
Now coverage tales can be told. 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'OWASP ZAP Benchmark Integration' directly and clearly summarizes the main change: adding ZAP integration for benchmarking against VulnerableApp.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding objectives from issue #603: ZAP automation configuration, conversion script with CWE/WASC mapping, unit tests, sample inputs/outputs, benchmark endpoint integration, and complete documentation.
Out of Scope Changes check ✅ Passed All changes are directly related to ZAP benchmark integration. The .gitignore update for Python artifacts is a minor supporting change to maintain repository cleanliness, which is reasonable infrastructure work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.88%. Comparing base (08a2d33) to head (4474237).

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #641   +/-   ##
=========================================
  Coverage     54.88%   54.88%           
  Complexity      663      663           
=========================================
  Files            91       91           
  Lines          3573     3573           
  Branches        395      395           
=========================================
  Hits           1961     1961           
  Misses         1435     1435           
  Partials        177      177           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
benchmarks/scripts/test_convert_zap_to_benchmark.py (1)

228-235: ⚡ Quick win

Add a regression test for top-level list input.

Given the converter’s stated support for array-style report roots, add a test for convert([{"alerts": [...]}]) so this contract stays protected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/scripts/test_convert_zap_to_benchmark.py` around lines 228 - 235,
Add a regression test that verifies convert handles a top-level list report
root: create a new test (e.g.,
test_convert_top_level_list_input_returns_findings) in
benchmarks/scripts/test_convert_zap_to_benchmark.py that calls
convert([{"alerts": [...]}]) using a minimal alert object (matching the keys
convert expects) and assert the returned result["findings"] contains the
expected finding(s) (or at least is non-empty). Reference the convert function
and use a representative alerts payload so the contract for array-style report
roots is protected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/scripts/convert_zap_to_benchmark.py`:
- Around line 94-97: The code assumes zap_report is a dict and calls
zap_report.get("site", []), which breaks when zap_report is a list; update the
logic to first check zap_report's type: if isinstance(zap_report, list) set
sites = zap_report, elif isinstance(zap_report, dict) set sites =
zap_report.get("site", []), else set sites = []; then keep the existing
normalization that if isinstance(sites, dict) sites = [sites]. This change
touches the site-extraction logic around the zap_report and sites variables to
safely handle top-level list reports.

In `@benchmarks/ZAP.md`:
- Around line 21-36: Add explicit fence languages to the Markdown code blocks
that contain the ASCII diagrams—specifically the fenced block beginning with
"┌─────────────┐    zap-automation.yaml    ┌─────────────────────────┐" and the
other fenced block later (lines 104-124) so they pass MD040; update the opening
fences from ``` to ```text (or another suitable language like ```plain) for both
blocks to improve linter compatibility and readability.
- Around line 202-204: In the benchmarks/ZAP.md descriptive sentence about
`benchmarks/samples/zap-full-findings-sample.json`, fix the wording typo by
changing "includes one deliberately invalid finding" to a clearer phrase such as
"contains one intentionally invalid finding" (or "contains one deliberately
invalid finding") so the sentence reads naturally and avoids the static-analysis
flagged wording issue.

---

Nitpick comments:
In `@benchmarks/scripts/test_convert_zap_to_benchmark.py`:
- Around line 228-235: Add a regression test that verifies convert handles a
top-level list report root: create a new test (e.g.,
test_convert_top_level_list_input_returns_findings) in
benchmarks/scripts/test_convert_zap_to_benchmark.py that calls
convert([{"alerts": [...]}]) using a minimal alert object (matching the keys
convert expects) and assert the returned result["findings"] contains the
expected finding(s) (or at least is non-empty). Reference the convert function
and use a representative alerts payload so the contract for array-style report
roots is protected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 989a3377-d426-4e75-85be-130048f4d495

📥 Commits

Reviewing files that changed from the base of the PR and between 08a2d33 and 4474237.

📒 Files selected for processing (9)
  • .gitignore
  • benchmarks/README.md
  • benchmarks/ZAP.md
  • benchmarks/samples/zap-benchmark-output-example.json
  • benchmarks/samples/zap-full-findings-sample.json
  • benchmarks/scripts/convert_zap_to_benchmark.py
  • benchmarks/scripts/test_convert_zap_to_benchmark.py
  • benchmarks/zap-automation.yaml
  • docs/HOW-TO-USE-VulnerableApp.md

Comment thread benchmarks/scripts/convert_zap_to_benchmark.py
Comment thread benchmarks/ZAP.md
Comment on lines +21 to +36
```
┌─────────────┐ zap-automation.yaml ┌─────────────────────────┐
│ VulnerableApp│ ◄────── ZAP scans ────── │ OWASP ZAP (Docker) │
│ :9090 │ │ writes zap-raw-report.json│
└─────────────┘ └─────────────────────────┘
convert_zap_to_benchmark.py
zap-benchmark-input.json
POST /scanner/benchmark
benchmarks/zap-results.json
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify fence languages for markdown code blocks.

The diagram/code fences at Line 21 and Line 104 are missing language identifiers (MD040), which can break lint gates and reduce readability.

Also applies to: 104-124

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 21-21: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/ZAP.md` around lines 21 - 36, Add explicit fence languages to the
Markdown code blocks that contain the ASCII diagrams—specifically the fenced
block beginning with "┌─────────────┐    zap-automation.yaml   
┌─────────────────────────┐" and the other fenced block later (lines 104-124) so
they pass MD040; update the opening fences from ``` to ```text (or another
suitable language like ```plain) for both blocks to improve linter compatibility
and readability.

Comment thread benchmarks/ZAP.md
Comment thread docs/HOW-TO-USE-VulnerableApp.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md

### JWT and some cryptographic vulnerability types are not detectable by ZAP
`CLIENT_SIDE_VULNERABLE_JWT`, `SERVER_SIDE_VULNERABLE_JWT`,
`INSECURE_CONFIGURATION_JWT`, and `WEB_CACHE_POISONING` have no CWE or WASC
Copy link
Copy Markdown
Member

@preetkaran20 preetkaran20 May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please create tickets to update these vulnerabilityTypes to add CWE or WASC or ways to compare them?

Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md

The report is also written to `benchmarks/zap-results.json`.

> **Note on filename:** The current benchmark framework names the output file
Copy link
Copy Markdown
Member

@preetkaran20 preetkaran20 May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create ticket for this. I think we should not have all this in a file. it is difficult to maintain.

Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
Comment thread benchmarks/ZAP.md
parameters:
context: VulnerableApp
url: http://localhost:9090/VulnerableApp
maxDuration: 5 # minutes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run zap in insane mode and try to see how many vulnerabulities are covered by zap?

parameters:
context: VulnerableApp
url: http://localhost:9090/VulnerableApp
maxDuration: 3 # minutes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run zap in insane mode.

- type: spiderAjax
parameters:
context: VulnerableApp
url: http://localhost:9090/VulnerableApp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is running zap against modern UI instead of legacy UI

- type: activeScan
parameters:
context: VulnerableApp
policy: Default Policy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use insane policy

# Run convert_zap_to_benchmark.py afterwards to produce the benchmark input.
# ---------------------------------------------------------------------------

env:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we ran ZAP with all the plugins? if not, we should run it with all the plugins as a lot of vulnerabilities are covered by plugins/addons of zap.

Comment thread benchmarks/scripts/convert_zap_to_benchmark.py
@@ -0,0 +1,102 @@
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please put under benchmark/zap/ ?

also we need to publish final numbers for ZAP benchmark and publicly announce coverage done by ZAP. This is the final report.

Copy link
Copy Markdown
Member

@preetkaran20 preetkaran20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antriksh-9 Thanks a lot for the PR. can you please trim down a lot of redundant details from ZAP.md. we want ZAP.md to be how to benchmark ZAP steps.

Also ensure that we will be publicizing the numbers so please use the best zap mode with all plugins, we can choose insane mode if needed or we can have a timeframe for which we run each of the DAST tool and run for that time.

You can also add Github action if you think that running locally is complex and run in github action and we can commit it in repo and keep on running to find out how well ZAP is doing against vulnerableapp.

@preetkaran20
Copy link
Copy Markdown
Member

@antriksh-9 looking at the code, this is great progress. Thanks a lot for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmark of ZAP against vulnerableapp using benchmark support

3 participants