Add Garak (NVIDIA LLM vulnerability scanner) parser by Dashtid · Pull Request #15013 · DefectDojo/django-DefectDojo

Dashtid · 2026-06-15T18:48:12Z

Description

Adds a parser for garak, NVIDIA's LLM vulnerability scanner, implementing the request in #14878 (and aligned with the AI-testing direction in discussion #13242).

The parser ingests garak's JSON Lines hit log (garak.<run_id>.hitlog.jsonl). Every line in a hit log is a detector hit, so each record becomes a Finding; hits for the same probe/target/detector are aggregated into one Finding (nb_occurences, keeping the most severe rung seen).

Severity is derived from the detector score (0–1) and adjusted by probe family — active-attack/jailbreak families (e.g. promptinject, dan, malwaregen, xss) nudge up; content/quality families (e.g. continuation, misleading) nudge down. Note: many garak detectors emit a binary 1.0, so in practice the probe-family adjustment does most of the differentiating. Happy to tune this to maintainer preference.
CWE is mapped from the probe family as a deliberately coarse starter, verified against MITRE: prompt-injection → CWE-1427, XSS → CWE-79, info-leak → CWE-200, default → CWE-1426 (Improper Validation of Generative AI Output).
Prompt/output text is extracted from garak's nested Conversation/Message structures (per garak/evaluators/base.py), with fallbacks for plain-string payloads.
Registered for hash_code deduplication on title + severity + component_name. description is intentionally excluded because garak samples the prompt/output non-deterministically per run, so including it would prevent the same weakness from deduplicating across repeated scans.

Test results

Adds unittests/tools/test_garak_parser.py covering: zero/one/many findings, the full score→severity matrix, the CWE mapping, aggregation severity-escalation (and that a later lower-scored hit does not downgrade), nested prompt/output extraction, bytes + UTF-8 BOM + non-ASCII input, and rejection of non-JSONL input. Sample hit logs are under unittests/scans/garak/.

Marked as a draft so CI can validate the full DojoTestCase suite before review.

Documentation

Adds docs/content/supported_tools/parsers/file/garak.md.

Checklist

Submitted against dev
Ruff-compliant (ruff.toml)
Python 3.13 compliant
Documentation included
No model changes (no migration needed)
Unit tests added
Labels (for maintainers): suggest Import Scans and settings_changes (touches settings.dist.py for deduplication)

Closes #14878

Adds a "Garak Scan" parser that ingests garak's JSON Lines hit log (garak.<run_id>.hitlog.jsonl). Each detector hit becomes a Finding; hits for the same probe/target/detector are aggregated (nb_occurences, keeping the most severe rung). - Severity derived from the detector score and adjusted by probe family - Probe family mapped to CWE (1427 prompt-injection, 79 xss, 200 leak, 1426 default = Improper Validation of Generative AI Output) - Prompt/output text extracted from garak's nested Conversation/Message - hash_code deduplication on title/severity/component_name (settings.dist.py) Includes unit tests (0/1/many, severity matrix, aggregation escalation, nested extraction, bytes/BOM/unicode, invalid input) and documentation. Signed-off-by: David Dashti <dashti.dat@gmail.com>

Dashtid · 2026-06-16T17:02:23Z

This implements the garak parser requested in #14878. @mcdonaid1379 — you'd mentioned wanting to take it on; I had a working implementation ready, so opened this. Happy to coordinate if you'd already started, and of course defer to the maintainers on direction.

A note for reviewers: I couldn't run the full DojoTestCase suite locally (a network issue pulling the test-DB image), so the unit tests here are pending the first-time-contributor CI approval — keen to see them run and will fix anything that surfaces.

valentijnscholten · 2026-06-18T18:05:48Z

+            if SEVERITY_LADDER.index(severity) > SEVERITY_LADDER.index(finding.severity):
+                finding.severity = severity


Does this mean the severity could change over time when it finds more or less occurences in certain "probe". If yes, this could affect deduplication in its current config. Do we really need severity in the hash code config? It rarely is useful and rarely stable enough to be reliable.

github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Jun 15, 2026

Dashtid marked this pull request as ready for review June 16, 2026 17:02

Dashtid requested review from Maffooch and mtesauro as code owners June 16, 2026 17:02

Maffooch approved these changes Jun 18, 2026

View reviewed changes

Maffooch requested review from dogboat and valentijnscholten June 18, 2026 15:40

Maffooch added this to the 3.1.0 milestone Jun 18, 2026

valentijnscholten reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Garak (NVIDIA LLM vulnerability scanner) parser#15013

Add Garak (NVIDIA LLM vulnerability scanner) parser#15013
Dashtid wants to merge 1 commit into
DefectDojo:devfrom
Dashtid:garak-parser

Dashtid commented Jun 15, 2026

Uh oh!

Dashtid commented Jun 16, 2026

Uh oh!

valentijnscholten Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if SEVERITY_LADDER.index(severity) > SEVERITY_LADDER.index(finding.severity):
		finding.severity = severity

Conversation

Dashtid commented Jun 15, 2026

Description

Test results

Documentation

Checklist

Uh oh!

Dashtid commented Jun 16, 2026

Uh oh!

valentijnscholten Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants