Wolo/workflow hitl integration test#980
Draft
wolo-lab wants to merge 9 commits into
Draft
Conversation
…tion Workflow-engine support for human-in-the-loop, unified on a single mechanism — history rehydration — matching adk-python (no persisted run-state event, no PendingRequest field). - scheduler: per-event back-pressure handshake (a non-partial function-response is persisted before the node's flow rebuilds the next model request, fixing a non-deterministic re-issue race); pause a node on accumulated Event.LongRunningToolIDs (RequestInput rides on them); stamp NodeInfo.Path = node name on static node events so rehydration can attribute interrupts (dynamic children fold into their static ancestor). - persistence: ReconstructRunState ports adk-python's _reconstruct_node_states + _infer_node_state — per-node scan (interrupts, resolved user responses, schemas, output), status inference (WAITING / PENDING+ResumedInputs re-entry / COMPLETED+Output handoff), backward-edge predecessor input, and schema validation on the surviving (last-wins) response. - resume: single path over the rehydrated state, gated on the current turn's responses for idempotency; already-run handoff successors are skipped (RunState.completed). - state: NodeState.Interrupts + unexported interruptSchemas; RunState.completed; HasWaiting. No PendingRequest, no persisted run-state blob. - workflowagent: detectResume uses ReconstructRunState and surfaces reconstruction (schema-validation) errors. A node may raise multiple interrupts per activation. workflow and workflowagent suites pass with -race.
…, Routes) AppendEvent (in-memory) and the database storage layer dropped Event's workflow fields when persisting: the in-memory copy omitted NodeInfo, RequestedInput and Routes, and the database layer never serialized NodeInfo or RequestedInput. History-based resume attributes interrupts by NodeInfo.Path, so losing it broke HITL resume — a RequestInput workflow (e.g. examples/workflow/hitl_simple) would re-prompt instead of continuing after the reply. Persist all three fields in both backends and add round-trip regression tests for each.
Two resume-correctness fixes for dynamic orchestrators and HITL. 1. Cross-resume dedup. A dynamic node body re-runs from the top on resume, so every RunNode before the pause point would re-execute its child. rehydrateCache rebuilds the sub-scheduler's resultByPath from session events (child terminal events carry NodeInfo.Path + Output), so completed children with a stable WithRunID are served from cache. Mirrors adk-python's _rehydrate_from_events / DynamicNodeScheduler. 2. Terminal handoff asker now resumes. Resume only bumped its scheduled counter per scheduled successor, so a single-asker workflow (no successors) wrongly returned ErrNothingToResume. A matched handoff asker now counts as an effective resume itself, gated on answeredThisTurn (from a per-interrupt resolvedCount during rehydration) so a duplicate resume stays an idempotent no-op.
17aa0ce to
6861671
Compare
634b9aa to
4f7cd27
Compare
Replaced deprecated tool.Context with agent.ToolContext
Completes the symmetric input/output validation contract on the Node interface, alongside the existing ValidateInput. The scheduler is expected to invoke ValidateOutput on every yielded event whose output is non-nil before forwarding the event to the consumer (wired up in a follow-up). Interface and conformance - Add ValidateOutput(output any) (any, error) to the Node interface. - Add explicit stubs on the two implementations that do not embed BaseNode: startNode and the test-only dummyNode. - Extend the compile-time Node-conformance assertions in base_node_test.go to cover AgentNode, ToolNode, JoinNode, ParallelWorker, and WorkflowNode. Default implementation on BaseNode - BaseNode.ValidateOutput delegates to a shared defaultValidateOutput helper that validates the output against the node's outputSchema field (added in #911) when set, otherwise returns the output unchanged. - The default deliberately performs no type coercion or Content/JSON fallback handling; ToolNode will override ValidateOutput to add its FunctionTool {"result": X} unwrap fallback in a follow-up.
Extends the console launcher's HITL prompt dispatch to handle
tool confirmation interrupts (toolconfirmation.FunctionCallName)
alongside the workflow input path added in the previous commit.
Detection path is unchanged — collectPendingInterrupts already
walks events name-agnostically via Event.LongRunningToolIDs.
This commit only adds a per-name case to the render and response
switches:
* renderToolConfirmationPrompt prints the confirmation hint after
the standard "Agent -> " banner, or "Confirm <name>?" derived
from the original function call as fallback.
* toolConfirmationResponseFromUserInput maps yes/y/true/confirm
(case-insensitive) to {"confirmed": true}, everything else
(including blank lines) to {"confirmed": false}.
Without this commit tool confirmation hits the generic fallback
which wraps the reply as {"result": <text>} — the transport works
(reply routes back by FunctionCall.ID) but the envelope does not
match what ctx.ToolConfirmation() expects, so the confirmation is
effectively unparseable.
Three end-to-end tests exercising the full pause/resume round-trip through a real runner.Runner — not the lightweight mocks the agent/workflowagent unit tests use. They verify that the contract the engine relies on (FunctionCall.ID round-trips into a follow-up FunctionResponse, runner.findAgentToRun routes by that ID, RunState survives the turn boundary via session.State delta) actually holds when the runner, session.InMemoryService, and the workflow agent are wired together as a production user would wire them. * TestRunner_WorkflowHITL_Roundtrip_Handoff exercises the default handoff resume path: turn 1 yields an event with LongRunningToolIDs and a synthesised adk_request_workflow_input FunctionCall part; turn 2 sends a matching FunctionResponse, the runner routes it back to the same agent, and the asker's successor receives the response payload as its input. * TestRunner_WorkflowHITL_Roundtrip_ReEntry covers the re-entry path (NodeConfig.RerunOnResume = true) with the same runner setup: the asker is re-activated, observes the response via ctx.ResumedInput, and emits it as an output that flows to the successor. * TestRunner_WorkflowHITL_FunctionResponseRoutedByID pins the runner-level routing contract: it asserts the interrupt event's Author equals the workflow agent name (used by findAgentToRun) and that the second turn does not produce a fresh interrupt (it would if findAgentToRun fell back to the root agent and treated the FunctionResponse as a new user message). Tests are in runner_test (external test package) so they exercise only the public Runner API, no internals.
The re-entry asker emitted the resumed response via the obsolete Event.Actions.StateDelta["output"] channel, which the v2 scheduler no longer reads (node output now flows through Event.Output). On v2 the handler received an empty input and the test failed. Switch to ev.Output, matching the canonical HITL tests in workflowagent.
Adds TestRunner_WorkflowHITL_DynamicOrchestrator_DedupAndResume covering the end-to-end acceptance scenario for b/515644762: a dynamic orchestrator runs two children via RunNode, the second suspends on a HITL interrupt, and on resume the first child must be served from cache (not re-executed) while the second observes the user's response.
4f7cd27 to
681eb17
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
If applicable, please follow the issue templates to provide as much detail as
possible.
Problem:
A clear and concise description of what the problem is.
Solution:
A clear and concise description of what you want to happen and why you choose
this solution.
Testing Plan
Please describe the tests that you ran to verify your changes. This is required
for all PRs that are not small documentation or typo fixes.
Unit Tests:
Please include a summary of passed go test results.
Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix.
Checklist
Additional context
Add any other context or screenshots about the feature request here.