feat(#130): on_result `expression:` matcher + `matches` regex operator by AlexChesser · Pull Request #157 · AlexChesser/ail

AlexChesser · 2026-04-15T02:19:11Z

Closes part of #130 — the consumption half (how users gate on a signal). The production half (capturing logprobs/confidence from native runners) is deliberately deferred until the native LLM runner (#128) lands with tool support.

Summary

Decouples two concerns that #130 conflated:

Producing a confidence signal — runner-specific, requires wire-format support (blocked on Design & implement native LLM provider support (§21) #128). Claude CLI and Codex CLI don't expose logprobs at all; only a native HTTP-based runner can.
Consuming a signal to gate flow — a generic matcher usable today for exit codes, stdout/stderr checks, cross-step conditions, and eventually confidence scores once producers exist.

This PR ships (2). When (1) lands, expression: "{{ step.x.confidence }} >= 0.75" drops in without further spec or grammar work on on_result.

What's new

§5.4 expression: matcher — arbitrary §12.2 condition against any template variable in the turn log. Unlocks branching on stdout/stderr (which contains: can't reach on context steps), specific exit codes beyond 0/any, and cross-step references.
§5.4 matches: /PAT/FLAGS named matcher — regex shorthand for the response.
§12.2 matches operator — regex comparison, shared with condition: so the two grammars cannot drift.
§12.3 Regex Syntax — single source of truth for /PAT/FLAGS. Flags i/m/s accepted. g rejected at parse time with a specific diagnostic (boolean matching — "global" is meaningless). Other Perl flags rejected; guidance points at inline (?x) for verbose.

Design choices worth flagging

Regex compiled at parse time. Malformed patterns fail pipeline load, not match time. A regex sitting dormant in on_result for months can't surprise you at 3am when a specific response finally triggers it.
Condition drops PartialEq. regex::Regex has no PartialEq; the one existing assert_eq! on Option<Condition> was rewritten as a matches! pattern. No production code compared Condition values for equality.
ResultMatcher::Expression { source, condition } — the source field preserves the original literal for materialize round-trips and diagnostics. condition reuses the full Condition union, so comparison and regex forms go through one evaluator.
Named matches: desugars at parse time into the expression form. Runtime has exactly one regex evaluation path.
evaluate_on_result signature change — now takes &Session and returns Result<Option<ResultAction>, AilError>. Unresolvable template in an expression: LHS aborts via CONDITION_INVALID, matching the §11 template-resolution contract. No silent-non-match fallback.
g flag rejected with a specific error, not silently ignored. Half the web has muscle memory for /foo/gi; silently dropping it would let patterns look like they mean something.
Unanchored, case-sensitive by default. /PASS/ matches "tests PASSED". Matches JavaScript/Perl/Ruby convention rather than inventing a different default. (?i) or /foo/i for case-insensitive.

Deliberately out of scope

DoWhile::exit_when stays ConditionExpr — no regex support in loop exits yet. Widening it to the full Condition union needs its own validation rules (exit_when rejects Always/Never) and is a separate follow-up.
Numeric operators (<, <=, >, >=) and boolean combinators (&&, ||). The spec's planned-extensions block flags these as the joint §12.2 + §5.4 extension that unlocks confidence-score gating once native runners surface the signals.
Logprobs / confidence signals themselves — blocked on Design & implement native LLM provider support (§21) #128 native LLM runner with tool support.

Test plan

911 tests pass (419 lib + 492 integration)
New regex literal parser: 20 unit tests (valid literals, i/m/s flags, embedded slashes, \/ escaping, g rejection, invalid patterns, empty-pattern guard)
Condition::Regex evaluation: 4 new unit tests
§12.2 matches integration tests: basic, case-sensitivity defaults, i flag, full parser path, invalid regex → parse failure, g-flag rejection
§5.4 matcher integration tests: expression: on stderr, named matches: shorthand, expression: with matches operator, unresolvable template aborts, parse-time multi-matcher rejection
cargo clippy — zero new warnings introduced (29 pre-existing errors in unrelated files unchanged)
cargo fmt --check clean

Commit history on the branch

Commit	Type	Change
`a434577`	spec	§5.4 `expression:` matcher
`0c4e660`	spec	§12.2 `matches` operator
`17450dd`	spec	`/PATTERN/FLAGS` regex-literal syntax
`0086ac5`	spec	§12.3 regex spec as single source of truth
`e1e1540`	feat	Full implementation

https://claude.ai/code/session_01GX2TW85n2yzAyTZ8TyM1Wj

…ex operator Implements the spec committed earlier on this branch: - §5.4 `expression:` matcher — arbitrary §12.2 condition against any template variable accessible in the turn log. - §5.4 `matches: /PAT/FLAGS` named matcher — shorthand for `expression: '{{ step.<id>.response }} matches /.../flags'`. - §12.2 `matches` operator — regex comparison, shared with `condition:`. - §12.3 regex syntax — single source of truth for `/PAT/FLAGS` form; flags i/m/s accepted, g rejected at parse time, other Perl flags rejected with a specific error that points to inline `(?x)` for verbose mode. Design notes: - Regex is compiled at parse time (in `parse_regex_literal`) so malformed patterns fail pipeline load, not match time. Source literal preserved alongside the compiled `regex::Regex` for diagnostics and materialize output. - `Condition` gains a `Regex(RegexCondition)` variant. `PartialEq` is dropped from `Condition` (regex::Regex has no PartialEq); the one existing `assert_eq!` on `Option<Condition>` was rewritten as a `matches!` pattern match. No production code compared Condition values for equality. - `ResultMatcher::Expression { source, condition }` reuses the condition evaluator for both comparison and regex forms, so the two grammars cannot drift apart. - `evaluate_on_result()` now takes `&Session` and returns `Result<Option<ResultAction>, AilError>`. Unresolvable template variables in an `expression:` LHS abort the pipeline via CONDITION_INVALID — same contract as `condition:` (SPEC §11). - Named `matches:` is desugared at parse time into the expression form, so the runtime has exactly one regex evaluation path. - Materialize round-trips `expression:` using the preserved source. - do_while's `exit_when` deliberately NOT extended — it stays ConditionExpr-only for now. Regex in loop exits is out of scope for this change; can be added later by widening exit_when to `Condition`. Testing: 911 tests pass (419 lib + 492 integration). New coverage: - regex_literal: 20 unit tests (parsing, flags, error cases) - condition.rs: 4 new tests for Condition::Regex evaluation - s12_step_conditions: 6 new integration tests (matches operator, case sensitivity, parser path, invalid regex, g-flag rejection) - s05_3_on_result: 5 new integration tests (expression: matcher on stderr, named matches: shorthand, expression with matches op, unresolvable template, parse-time matcher count enforcement)

…l site The cherry-pick missed the second `evaluate_on_result` call site, which was added by PR #154 (parallel execution) after this branch originally forked. The §29 join-step code path needs the same signature update the sequential dispatch path already got: pass `&Session` + `&step_id`, route `Err` through the parallel outcome cell instead of `?`. The borrow shape differs slightly — the parallel path can't just re-borrow the turn_log entry while also passing `&session`, because the enclosing closure captures session by mutable reference. Clone the last entry up front to release the immutable borrow before calling the evaluator.

AlexChesser force-pushed the claude/investigate-confidence-scores-NhS78 branch from e1e1540 to 19c0a8c Compare April 15, 2026 02:32

AlexChesser merged commit c7852a6 into main Apr 15, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(#130): on_result `expression:` matcher + `matches` regex operator#157

feat(#130): on_result `expression:` matcher + `matches` regex operator#157
AlexChesser merged 2 commits intomainfrom
claude/investigate-confidence-scores-NhS78

AlexChesser commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AlexChesser commented Apr 15, 2026

Summary

What's new

Design choices worth flagging

Deliberately out of scope

Test plan

Commit history on the branch

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants