feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137
feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137Marc-cn wants to merge 1 commit intokusari-oss:mainfrom
Conversation
mlieberman85
left a comment
There was a problem hiding this comment.
PR Review: LangGraph Agentic Orchestrator
Tested end-to-end — darnit run . --feedback noninteractive successfully discovers 2 implementations, checks 62 controls (41 pass, 15 fail, 6 warn), queues feedback questions, and logs remediation candidates. The state machine works.
Bugs
DarnitState.check_results type mismatch (state.py:22): Annotated as list but default_factory=dict. Doesn't crash because run_checks overwrites it, but any code using check_results before that node runs will get a dict.
plugin.py indentation (lines 114, 136, 151, 165): Same issue as #136 — register_controls and the 3 new handler methods are dedented out of the ComplianceImplementation Protocol class. They become orphaned module-level functions.
Design issues
langgraph is a hard dependency for all CLI commands. cli.py:34 imports darnit_graph at module level, so darnit serve, darnit audit, darnit list all require langgraph installed. This also means uvx darnit run fails unless langgraph happens to be in the environment. Fix: lazy import inside cmd_run() + move langgraph to [project.optional-dependencies] as an agent extra.
darnit_graph compiles at import time (graph.py:226): build_graph() runs as a module-level side effect. Makes testing harder and means any LangGraph initialization failure breaks the entire module.
Scope gaps
These are acknowledged in the PR description, but they should either be fixed before merge or have tracking issues opened so they don't get lost:
remediatenode is a placeholder — logs what it would fix but doesn't callRemediationExecutorcollect_contextdoesn't act on answers — human feedback is stored in state but doesn't trigger re-audit- Feedback answers are write-only — answers are collected but never read back by any downstream node
Without these, the darnit run pipeline discovers problems but can't close the loop on any of them. If the intent is to merge now and iterate, please open issues for each so they're tracked.
Minor
run_checkscatches all exceptions broadly (except Exception) — bugs in the audit pipeline get swallowed intostate.errorsplugin.py,loader.py,detectors.pychanges are identical to #136 — should be a shared base PR- No tests in the diff — I wrote 61 covering state, feedback, LLM backends, graph nodes, and routing (all pass). Happy to contribute.
What's good
- Clean state machine with clear node separation
- LLM backend abstraction is solid (prompt building, response parsing, 3 backends + factory)
- Feedback system nicely handles interactive vs CI with auto-detection
- Conditional routing logic is simple and correct
|
Fixes pushed:
For the scope gaps (remediate placeholder, feedback answers not triggering re-audit, answers write-only), agreed these need tracking. Should I open issues on the main repo or would you prefer to track them differently? |
…otocol, loader forge/build storage, add tests
e377933 to
9b44608
Compare
Review: Rebased & Fixed Test FailuresI've rebased this branch onto Fixes applied
Two bugs still present in the codeBug 1: If try:
graph = build_graph()
final_state = graph.invoke(state)
except Exception as e:
logger.error(f"Agent run failed: {e}")
# BUG: falls through, final_state is unbound → UnboundLocalError
# line 548 — uses final_state unconditionally
check_results = final_state.get("check_results") or []Fix: add Bug 2:
# graph.py:122 — should be "control_id", not "id"
control_id = result.get("id", "unknown") |
…, key name, conflict resolution)
c4f1d65 to
f609473
Compare
Summary
Extends Darnit into a self-driving agentic orchestrator. Adds a LangGraph state machine that drives the full audit pipeline autonomously, bring-your-own LLM key support for standalone mode, a
darnit runCLI command, and a pluggable human feedback mechanism.Type of Change
Framework Changes Checklist
If this PR modifies the darnit framework (
packages/darnit/):openspec/specs/framework-design/spec.md) if behavior changeduv run python scripts/validate_sync.py --verboseand it passesuv run python scripts/generate_docs.pyand committed any doc changesControl/TOML Changes Checklist
Not applicable — no controls or TOML modified in this PR.
If this PR modifies controls or TOML configuration:
Testing
uv run pytest tests/ -v)uv run ruff check .)What was built
darnit/agent/graph.pydrives: load context → run checks → collect context → remediate → finishdarnit runCLI command — triggers the full pipeline from the terminalUsage
Verified on this repo
Known gaps
Additional Notes