Skip to content

feat(#130): on_result expression: matcher + matches regex operator#157

Merged
AlexChesser merged 2 commits intomainfrom
claude/investigate-confidence-scores-NhS78
Apr 15, 2026
Merged

feat(#130): on_result expression: matcher + matches regex operator#157
AlexChesser merged 2 commits intomainfrom
claude/investigate-confidence-scores-NhS78

Conversation

@AlexChesser
Copy link
Copy Markdown
Owner

Closes part of #130 — the consumption half (how users gate on a signal). The production half (capturing logprobs/confidence from native runners) is deliberately deferred until the native LLM runner (#128) lands with tool support.

Summary

Decouples two concerns that #130 conflated:

  1. Producing a confidence signal — runner-specific, requires wire-format support (blocked on Design & implement native LLM provider support (§21) #128). Claude CLI and Codex CLI don't expose logprobs at all; only a native HTTP-based runner can.
  2. Consuming a signal to gate flow — a generic matcher usable today for exit codes, stdout/stderr checks, cross-step conditions, and eventually confidence scores once producers exist.

This PR ships (2). When (1) lands, expression: "{{ step.x.confidence }} >= 0.75" drops in without further spec or grammar work on on_result.

What's new

  • §5.4 expression: matcher — arbitrary §12.2 condition against any template variable in the turn log. Unlocks branching on stdout/stderr (which contains: can't reach on context steps), specific exit codes beyond 0/any, and cross-step references.
  • §5.4 matches: /PAT/FLAGS named matcher — regex shorthand for the response.
  • §12.2 matches operator — regex comparison, shared with condition: so the two grammars cannot drift.
  • §12.3 Regex Syntax — single source of truth for /PAT/FLAGS. Flags i/m/s accepted. g rejected at parse time with a specific diagnostic (boolean matching — "global" is meaningless). Other Perl flags rejected; guidance points at inline (?x) for verbose.

Design choices worth flagging

  • Regex compiled at parse time. Malformed patterns fail pipeline load, not match time. A regex sitting dormant in on_result for months can't surprise you at 3am when a specific response finally triggers it.
  • Condition drops PartialEq. regex::Regex has no PartialEq; the one existing assert_eq! on Option<Condition> was rewritten as a matches! pattern. No production code compared Condition values for equality.
  • ResultMatcher::Expression { source, condition } — the source field preserves the original literal for materialize round-trips and diagnostics. condition reuses the full Condition union, so comparison and regex forms go through one evaluator.
  • Named matches: desugars at parse time into the expression form. Runtime has exactly one regex evaluation path.
  • evaluate_on_result signature change — now takes &Session and returns Result<Option<ResultAction>, AilError>. Unresolvable template in an expression: LHS aborts via CONDITION_INVALID, matching the §11 template-resolution contract. No silent-non-match fallback.
  • g flag rejected with a specific error, not silently ignored. Half the web has muscle memory for /foo/gi; silently dropping it would let patterns look like they mean something.
  • Unanchored, case-sensitive by default. /PASS/ matches "tests PASSED". Matches JavaScript/Perl/Ruby convention rather than inventing a different default. (?i) or /foo/i for case-insensitive.

Deliberately out of scope

  • DoWhile::exit_when stays ConditionExpr — no regex support in loop exits yet. Widening it to the full Condition union needs its own validation rules (exit_when rejects Always/Never) and is a separate follow-up.
  • Numeric operators (<, <=, >, >=) and boolean combinators (&&, ||). The spec's planned-extensions block flags these as the joint §12.2 + §5.4 extension that unlocks confidence-score gating once native runners surface the signals.
  • Logprobs / confidence signals themselves — blocked on Design & implement native LLM provider support (§21) #128 native LLM runner with tool support.

Test plan

  • 911 tests pass (419 lib + 492 integration)
  • New regex literal parser: 20 unit tests (valid literals, i/m/s flags, embedded slashes, \/ escaping, g rejection, invalid patterns, empty-pattern guard)
  • Condition::Regex evaluation: 4 new unit tests
  • §12.2 matches integration tests: basic, case-sensitivity defaults, i flag, full parser path, invalid regex → parse failure, g-flag rejection
  • §5.4 matcher integration tests: expression: on stderr, named matches: shorthand, expression: with matches operator, unresolvable template aborts, parse-time multi-matcher rejection
  • cargo clippy — zero new warnings introduced (29 pre-existing errors in unrelated files unchanged)
  • cargo fmt --check clean

Commit history on the branch

Commit Type Change
a434577 spec §5.4 expression: matcher
0c4e660 spec §12.2 matches operator
17450dd spec /PATTERN/FLAGS regex-literal syntax
0086ac5 spec §12.3 regex spec as single source of truth
e1e1540 feat Full implementation

https://claude.ai/code/session_01GX2TW85n2yzAyTZ8TyM1Wj

…ex operator

Implements the spec committed earlier on this branch:
- §5.4 `expression:` matcher — arbitrary §12.2 condition against any
  template variable accessible in the turn log.
- §5.4 `matches: /PAT/FLAGS` named matcher — shorthand for
  `expression: '{{ step.<id>.response }} matches /.../flags'`.
- §12.2 `matches` operator — regex comparison, shared with `condition:`.
- §12.3 regex syntax — single source of truth for `/PAT/FLAGS` form;
  flags i/m/s accepted, g rejected at parse time, other Perl flags
  rejected with a specific error that points to inline `(?x)` for
  verbose mode.

Design notes:
- Regex is compiled at parse time (in `parse_regex_literal`) so
  malformed patterns fail pipeline load, not match time. Source literal
  preserved alongside the compiled `regex::Regex` for diagnostics and
  materialize output.
- `Condition` gains a `Regex(RegexCondition)` variant. `PartialEq` is
  dropped from `Condition` (regex::Regex has no PartialEq); the one
  existing `assert_eq!` on `Option<Condition>` was rewritten as a
  `matches!` pattern match. No production code compared Condition
  values for equality.
- `ResultMatcher::Expression { source, condition }` reuses the
  condition evaluator for both comparison and regex forms, so the two
  grammars cannot drift apart.
- `evaluate_on_result()` now takes `&Session` and returns
  `Result<Option<ResultAction>, AilError>`. Unresolvable template
  variables in an `expression:` LHS abort the pipeline via
  CONDITION_INVALID — same contract as `condition:` (SPEC §11).
- Named `matches:` is desugared at parse time into the expression form,
  so the runtime has exactly one regex evaluation path.
- Materialize round-trips `expression:` using the preserved source.
- do_while's `exit_when` deliberately NOT extended — it stays
  ConditionExpr-only for now. Regex in loop exits is out of scope for
  this change; can be added later by widening exit_when to `Condition`.

Testing: 911 tests pass (419 lib + 492 integration). New coverage:
- regex_literal: 20 unit tests (parsing, flags, error cases)
- condition.rs: 4 new tests for Condition::Regex evaluation
- s12_step_conditions: 6 new integration tests (matches operator,
  case sensitivity, parser path, invalid regex, g-flag rejection)
- s05_3_on_result: 5 new integration tests (expression: matcher on
  stderr, named matches: shorthand, expression with matches op,
  unresolvable template, parse-time matcher count enforcement)
@AlexChesser AlexChesser force-pushed the claude/investigate-confidence-scores-NhS78 branch from e1e1540 to 19c0a8c Compare April 15, 2026 02:32
…l site

The cherry-pick missed the second `evaluate_on_result` call site, which
was added by PR #154 (parallel execution) after this branch originally
forked. The §29 join-step code path needs the same signature update
the sequential dispatch path already got: pass `&Session` + `&step_id`,
route `Err` through the parallel outcome cell instead of `?`.

The borrow shape differs slightly — the parallel path can't just
re-borrow the turn_log entry while also passing `&session`, because the
enclosing closure captures session by mutable reference. Clone the
last entry up front to release the immutable borrow before calling the
evaluator.
@AlexChesser AlexChesser merged commit c7852a6 into main Apr 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants