feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback by Marc-cn · Pull Request #137 · kusari-oss/darnit

Marc-cn · 2026-03-25T18:20:55Z

Summary

Extends Darnit into a self-driving agentic orchestrator. Adds a LangGraph state machine that drives the full audit pipeline autonomously, bring-your-own LLM key support for standalone mode, a darnit run CLI command, and a pluggable human feedback mechanism.

Type of Change

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update
Refactoring (no functional changes)

Framework Changes Checklist

If this PR modifies the darnit framework (packages/darnit/):

Updated framework spec (openspec/specs/framework-design/spec.md) if behavior changed
Ran uv run python scripts/validate_sync.py --verbose and it passes
Ran uv run python scripts/generate_docs.py and committed any doc changes

Control/TOML Changes Checklist

Not applicable — no controls or TOML modified in this PR.

If this PR modifies controls or TOML configuration:

Control metadata defined in TOML (not Python code)
SARIF fields (description, severity, help_url) included where appropriate
Ran validation to confirm TOML schema compliance

Testing

Tests pass locally (uv run pytest tests/ -v)
Added tests for new functionality (if applicable)
Linting passes (uv run ruff check .)

What was built

LangGraph state machine — darnit/agent/graph.py drives: load context → run checks → collect context → remediate → finish
Bring-your-own LLM — Anthropic, OpenAI, Ollama backends. API key from environment variable, never hardcoded
darnit run CLI command — triggers the full pipeline from the terminal
Pluggable human feedback — interactive (CLI prompts mid-run) and noninteractive (collects questions, prints at end). Auto-detects mode based on whether running in a terminal or CI

Usage

darnit run .
darnit run . --llm-backend openai
darnit run . --feedback interactive
darnit run . --feedback noninteractive

Verified on this repo

62 controls checked: 27 passed, 13 failed (gh CLI not installed), 1 warned
4 plugins discovered: openssf-baseline, example-hygiene, gittuf, reproducibility

Known gaps

Remediate node logs fixes but does not call RemediationExecutor yet
Human feedback answers stored but do not trigger a re-audit of the control

Additional Notes

validate_sync.py and generate_docs.py fail on Windows due to a pre-existing cp1252 encoding issue unrelated to this PR

mlieberman85

PR Review: LangGraph Agentic Orchestrator

Tested end-to-end — darnit run . --feedback noninteractive successfully discovers 2 implementations, checks 62 controls (41 pass, 15 fail, 6 warn), queues feedback questions, and logs remediation candidates. The state machine works.

Bugs

DarnitState.check_results type mismatch (state.py:22): Annotated as list but default_factory=dict. Doesn't crash because run_checks overwrites it, but any code using check_results before that node runs will get a dict.

plugin.py indentation (lines 114, 136, 151, 165): Same issue as #136 — register_controls and the 3 new handler methods are dedented out of the ComplianceImplementation Protocol class. They become orphaned module-level functions.

Design issues

langgraph is a hard dependency for all CLI commands. cli.py:34 imports darnit_graph at module level, so darnit serve, darnit audit, darnit list all require langgraph installed. This also means uvx darnit run fails unless langgraph happens to be in the environment. Fix: lazy import inside cmd_run() + move langgraph to [project.optional-dependencies] as an agent extra.

darnit_graph compiles at import time (graph.py:226): build_graph() runs as a module-level side effect. Makes testing harder and means any LangGraph initialization failure breaks the entire module.

Scope gaps

These are acknowledged in the PR description, but they should either be fixed before merge or have tracking issues opened so they don't get lost:

remediate node is a placeholder — logs what it would fix but doesn't call RemediationExecutor
collect_context doesn't act on answers — human feedback is stored in state but doesn't trigger re-audit
Feedback answers are write-only — answers are collected but never read back by any downstream node

Without these, the darnit run pipeline discovers problems but can't close the loop on any of them. If the intent is to merge now and iterate, please open issues for each so they're tracked.

Minor

run_checks catches all exceptions broadly (except Exception) — bugs in the audit pipeline get swallowed into state.errors
plugin.py, loader.py, detectors.py changes are identical to #136 — should be a shared base PR
No tests in the diff — I wrote 61 covering state, feedback, LLM backends, graph nodes, and routing (all pass). Happy to contribute.

What's good

Clean state machine with clear node separation
LLM backend abstraction is solid (prompt building, response parsing, 3 backends + factory)
Feedback system nicely handles interactive vs CI with auto-detection
Conditional routing logic is simple and correct

Marc-cn · 2026-03-28T20:27:53Z

Fixes pushed:

check_results type mismatch: changed default_factory=dict to default_factory=list
plugin.py indentation: same fix as feat: Gittuf plugin — policy checks and commit signing #136, optional handlers moved out of Protocol to avoid breaking isinstance checks on existing plugins
Lazy langgraph import: moved import inside cmd_run(), moved langgraph to [project.optional-dependencies] as darnit[agent]
darnit_graph at module level: removed the singleton, build_graph() now called lazily inside cmd_run()

For the scope gaps (remediate placeholder, feedback answers not triggering re-audit, answers write-only), agreed these need tracking. Should I open issues on the main repo or would you prefer to track them differently?
Tests are now in the diff.

…otocol, loader forge/build storage, add tests

Marc-cn · 2026-04-02T15:26:46Z

Merge conflicts resolved and uv.lock regenerated. Opened tracking issues for the three scope gaps:
#144 — remediate node does not call RemediationExecutor
#145 — human feedback answers do not trigger re-audit
#146 — feedback answers are write-only

mlieberman85 · 2026-04-04T17:54:42Z

Review: Rebased & Fixed Test Failures

I've rebased this branch onto upstream/main (resolved 5 merge conflicts) and fixed a test collection failure caused by a top-level langgraph import in graph.py. All 25 PR tests now pass, and the full suite is green (1225/1226 — the 1 failure is a pre-existing upstream spec hash drift).

Fixes applied

Lazy langgraph import — moved from langgraph.graph import END, StateGraph from module-level into build_graph(). Without this, any test importing routing functions from graph.py crashes with ModuleNotFoundError since langgraph is optional.
langgraph dependency group — moved from [attestation] extras to a new [agent] extras group so pip install darnit[agent] works correctly.
Protocol conflict — kept optional handlers as comments (per commit 4358400 "move optional handlers out of Protocol to avoid breaking isinstance checks") rather than adding them as concrete Protocol methods.

Two bugs still present in the code

Bug 1: cmd_run() crashes on error (cli.py:541–548)

If graph.invoke() raises an exception, the except block logs the error but falls through to line 548 which accesses final_state — a variable that was never assigned. This will crash with UnboundLocalError.

try:
    graph = build_graph()
    final_state = graph.invoke(state)
except Exception as e:
    logger.error(f"Agent run failed: {e}")
    # BUG: falls through, final_state is unbound → UnboundLocalError

# line 548 — uses final_state unconditionally
check_results = final_state.get("check_results") or []

Fix: add return 1 in the except block, or initialize final_state = {} before the try.

Bug 2: collect_context() uses wrong key name (graph.py:122)

run_checks() stores results with key "control_id" (line 94), but collect_context() reads result.get("id") (line 122). This means control_id will always be "unknown" in feedback messages.

# graph.py:122 — should be "control_id", not "id"
control_id = result.get("id", "unknown")

…, key name, conflict resolution)

Marc-cn requested a review from mlieberman85 as a code owner March 25, 2026 18:43

mlieberman85 reviewed Mar 28, 2026

View reviewed changes

mlieberman85 mentioned this pull request Mar 28, 2026

feat: scientific reproducibility plugin #138

Open

10 tasks

Marc-cn added a commit to Marc-cn/darnit that referenced this pull request Mar 28, 2026

Proactive fixes from kusari-oss#136/kusari-oss#137 reviews: plugin pr…

2d7d334

…otocol, loader forge/build storage, add tests

Marc-cn mentioned this pull request Mar 28, 2026

feat: pluggable storage backends — file, Archivista, memory #139

Merged

7 tasks

mlieberman85 force-pushed the feature/langgraph-agent branch from e377933 to 9b44608 Compare April 4, 2026 17:54

mlieberman85 mentioned this pull request Apr 4, 2026

feat: agentic orchestrator — forge detector, plugin protocol, LangGraph agent, Gittuf (example) and reproducibility plugins #130

Closed

11 tasks

feat: LangGraph agentic orchestrator — fixes from review (lazy import…

f609473

…, key name, conflict resolution)

Marc-cn force-pushed the feature/langgraph-agent branch from c4f1d65 to f609473 Compare April 9, 2026 18:05

Jaydeep869 mentioned this pull request Apr 15, 2026

fix(agent): resolve graph compilation and cli bugs from review Marc-cn/darnit#2

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137
Marc-cn wants to merge 1 commit intokusari-oss:mainfrom
Marc-cn:feature/langgraph-agent

Marc-cn commented Mar 25, 2026

Uh oh!

mlieberman85 left a comment

Uh oh!

Marc-cn commented Mar 28, 2026

Uh oh!

Marc-cn commented Apr 2, 2026

Uh oh!

mlieberman85 commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marc-cn commented Mar 25, 2026

Summary

Type of Change

Framework Changes Checklist

Control/TOML Changes Checklist

Testing

What was built

Usage

Verified on this repo

Known gaps

Additional Notes

Uh oh!

mlieberman85 left a comment

Choose a reason for hiding this comment

PR Review: LangGraph Agentic Orchestrator

Bugs

Design issues

Scope gaps

Minor

What's good

Uh oh!

Marc-cn commented Mar 28, 2026

Uh oh!

Marc-cn commented Apr 2, 2026

Uh oh!

mlieberman85 commented Apr 4, 2026

Review: Rebased & Fixed Test Failures

Fixes applied

Two bugs still present in the code

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants