Skip to content

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137

Open
Marc-cn wants to merge 1 commit intokusari-oss:mainfrom
Marc-cn:feature/langgraph-agent
Open

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137
Marc-cn wants to merge 1 commit intokusari-oss:mainfrom
Marc-cn:feature/langgraph-agent

Conversation

@Marc-cn
Copy link
Copy Markdown
Collaborator

@Marc-cn Marc-cn commented Mar 25, 2026

Summary

Extends Darnit into a self-driving agentic orchestrator. Adds a LangGraph state machine that drives the full audit pipeline autonomously, bring-your-own LLM key support for standalone mode, a darnit run CLI command, and a pluggable human feedback mechanism.

Type of Change

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)

Framework Changes Checklist

If this PR modifies the darnit framework (packages/darnit/):

  • Updated framework spec (openspec/specs/framework-design/spec.md) if behavior changed
  • Ran uv run python scripts/validate_sync.py --verbose and it passes
  • Ran uv run python scripts/generate_docs.py and committed any doc changes

Control/TOML Changes Checklist

Not applicable — no controls or TOML modified in this PR.

If this PR modifies controls or TOML configuration:

  • Control metadata defined in TOML (not Python code)
  • SARIF fields (description, severity, help_url) included where appropriate
  • Ran validation to confirm TOML schema compliance

Testing

  • Tests pass locally (uv run pytest tests/ -v)
  • Added tests for new functionality (if applicable)
  • Linting passes (uv run ruff check .)

What was built

  • LangGraph state machinedarnit/agent/graph.py drives: load context → run checks → collect context → remediate → finish
  • Bring-your-own LLM — Anthropic, OpenAI, Ollama backends. API key from environment variable, never hardcoded
  • darnit run CLI command — triggers the full pipeline from the terminal
  • Pluggable human feedback — interactive (CLI prompts mid-run) and noninteractive (collects questions, prints at end). Auto-detects mode based on whether running in a terminal or CI

Usage

darnit run .
darnit run . --llm-backend openai
darnit run . --feedback interactive
darnit run . --feedback noninteractive

Verified on this repo

62 controls checked: 27 passed, 13 failed (gh CLI not installed), 1 warned
4 plugins discovered: openssf-baseline, example-hygiene, gittuf, reproducibility

Known gaps

  • Remediate node logs fixes but does not call RemediationExecutor yet
  • Human feedback answers stored but do not trigger a re-audit of the control

Additional Notes

  • validate_sync.py and generate_docs.py fail on Windows due to a pre-existing cp1252 encoding issue unrelated to this PR

@Marc-cn Marc-cn requested a review from mlieberman85 as a code owner March 25, 2026 18:43
Copy link
Copy Markdown
Contributor

@mlieberman85 mlieberman85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: LangGraph Agentic Orchestrator

Tested end-to-end — darnit run . --feedback noninteractive successfully discovers 2 implementations, checks 62 controls (41 pass, 15 fail, 6 warn), queues feedback questions, and logs remediation candidates. The state machine works.

Bugs

DarnitState.check_results type mismatch (state.py:22): Annotated as list but default_factory=dict. Doesn't crash because run_checks overwrites it, but any code using check_results before that node runs will get a dict.

plugin.py indentation (lines 114, 136, 151, 165): Same issue as #136register_controls and the 3 new handler methods are dedented out of the ComplianceImplementation Protocol class. They become orphaned module-level functions.

Design issues

langgraph is a hard dependency for all CLI commands. cli.py:34 imports darnit_graph at module level, so darnit serve, darnit audit, darnit list all require langgraph installed. This also means uvx darnit run fails unless langgraph happens to be in the environment. Fix: lazy import inside cmd_run() + move langgraph to [project.optional-dependencies] as an agent extra.

darnit_graph compiles at import time (graph.py:226): build_graph() runs as a module-level side effect. Makes testing harder and means any LangGraph initialization failure breaks the entire module.

Scope gaps

These are acknowledged in the PR description, but they should either be fixed before merge or have tracking issues opened so they don't get lost:

  • remediate node is a placeholder — logs what it would fix but doesn't call RemediationExecutor
  • collect_context doesn't act on answers — human feedback is stored in state but doesn't trigger re-audit
  • Feedback answers are write-only — answers are collected but never read back by any downstream node

Without these, the darnit run pipeline discovers problems but can't close the loop on any of them. If the intent is to merge now and iterate, please open issues for each so they're tracked.

Minor

  • run_checks catches all exceptions broadly (except Exception) — bugs in the audit pipeline get swallowed into state.errors
  • plugin.py, loader.py, detectors.py changes are identical to #136 — should be a shared base PR
  • No tests in the diff — I wrote 61 covering state, feedback, LLM backends, graph nodes, and routing (all pass). Happy to contribute.

What's good

  • Clean state machine with clear node separation
  • LLM backend abstraction is solid (prompt building, response parsing, 3 backends + factory)
  • Feedback system nicely handles interactive vs CI with auto-detection
  • Conditional routing logic is simple and correct

@Marc-cn
Copy link
Copy Markdown
Collaborator Author

Marc-cn commented Mar 28, 2026

Fixes pushed:

  • check_results type mismatch: changed default_factory=dict to default_factory=list
  • plugin.py indentation: same fix as feat: Gittuf plugin — policy checks and commit signing #136, optional handlers moved out of Protocol to avoid breaking isinstance checks on existing plugins
  • Lazy langgraph import: moved import inside cmd_run(), moved langgraph to [project.optional-dependencies] as darnit[agent]
  • darnit_graph at module level: removed the singleton, build_graph() now called lazily inside cmd_run()

For the scope gaps (remediate placeholder, feedback answers not triggering re-audit, answers write-only), agreed these need tracking. Should I open issues on the main repo or would you prefer to track them differently?
Tests are now in the diff.

Marc-cn added a commit to Marc-cn/darnit that referenced this pull request Mar 28, 2026
…otocol, loader forge/build storage, add tests
@Marc-cn
Copy link
Copy Markdown
Collaborator Author

Marc-cn commented Apr 2, 2026

Merge conflicts resolved and uv.lock regenerated. Opened tracking issues for the three scope gaps:
#144 — remediate node does not call RemediationExecutor
#145 — human feedback answers do not trigger re-audit
#146 — feedback answers are write-only

@mlieberman85 mlieberman85 force-pushed the feature/langgraph-agent branch from e377933 to 9b44608 Compare April 4, 2026 17:54
@mlieberman85
Copy link
Copy Markdown
Contributor

Review: Rebased & Fixed Test Failures

I've rebased this branch onto upstream/main (resolved 5 merge conflicts) and fixed a test collection failure caused by a top-level langgraph import in graph.py. All 25 PR tests now pass, and the full suite is green (1225/1226 — the 1 failure is a pre-existing upstream spec hash drift).

Fixes applied

  1. Lazy langgraph import — moved from langgraph.graph import END, StateGraph from module-level into build_graph(). Without this, any test importing routing functions from graph.py crashes with ModuleNotFoundError since langgraph is optional.
  2. langgraph dependency group — moved from [attestation] extras to a new [agent] extras group so pip install darnit[agent] works correctly.
  3. Protocol conflict — kept optional handlers as comments (per commit 4358400 "move optional handlers out of Protocol to avoid breaking isinstance checks") rather than adding them as concrete Protocol methods.

Two bugs still present in the code

Bug 1: cmd_run() crashes on error (cli.py:541–548)

If graph.invoke() raises an exception, the except block logs the error but falls through to line 548 which accesses final_state — a variable that was never assigned. This will crash with UnboundLocalError.

try:
    graph = build_graph()
    final_state = graph.invoke(state)
except Exception as e:
    logger.error(f"Agent run failed: {e}")
    # BUG: falls through, final_state is unbound → UnboundLocalError

# line 548 — uses final_state unconditionally
check_results = final_state.get("check_results") or []

Fix: add return 1 in the except block, or initialize final_state = {} before the try.

Bug 2: collect_context() uses wrong key name (graph.py:122)

run_checks() stores results with key "control_id" (line 94), but collect_context() reads result.get("id") (line 122). This means control_id will always be "unknown" in feedback messages.

# graph.py:122 — should be "control_id", not "id"
control_id = result.get("id", "unknown")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants