Skip to content

feat(verify): add provider enumeration mode to ans-verify#32

Merged
csnitker-godaddy merged 4 commits into
godaddy:mainfrom
nicknacnic:feat/ans-verify-list-provider
Jun 10, 2026
Merged

feat(verify): add provider enumeration mode to ans-verify#32
csnitker-godaddy merged 4 commits into
godaddy:mainfrom
nicknacnic:feat/ans-verify-list-provider

Conversation

@nicknacnic

Copy link
Copy Markdown
Contributor

Summary

Adds ans-verify list -provider <host> — a client-side enumeration mode that walks the log's entry tiles, decodes V1/V2 producer envelopes, and reports every agent whose agent.host falls under the given provider suffix. Optional flags collapse to currently-live agents, verify each match's SCITT receipt, and bound concurrency. No server-side changes; this is purely a new CLI surface on the existing cmd/ans-verify binary.

Use case: an operator running a provider domain (an enterprise, a demo zone, a CTF) wants to see every agent the TL has logged under that domain without registering each agentId out-of-band. The reference verifier already exposes per-agent reads; this lets an offline verifier reconstruct the by-provider view from the log itself.

What's new

CLI:

ans-verify list -provider <host> [-url <tl>] [-live=false] [-verify] [-concurrency N]
  • -provider — host suffix to match. Exact-match plus strict subdomain (x.suffix); rejects substring spoofs like evilsuffix.com against suffix.com. Case-insensitive, trailing-dot tolerant.
  • -live (default true) — collapse to one row per ansName keeping the latest leaf, drop AGENT_REVOKED and AGENT_DEPRECATED. Unknown event types are kept (forward-compat against future active states).
  • -verify (default false) — for each match, fetch and verify the SCITT receipt.
  • -concurrency N (default 8, clamped 1–64) — parallelism for tile fetches and verify fetches.

The existing single-agent ans-verify [-agent] <uuid> form is unchanged.

Security properties

The walker fails closed on every checkable property. The threat model assumes the TL endpoint could be hostile or compromised — the verifier is the trust anchor, not the server.

  1. Checkpoint signature is verified before any tile fetch. A new verifiedCheckpoint fetches /checkpoint (raw signed note), parses it, and verifies the C2SP ECDSA-P256 signature against /root-keys via logstore.VerifyC2SPECDSA. Without this step, a hostile TL could lie about logSize and hide tiles containing agents the attacker wants to omit. Pure parser is verifyCheckpointNote.

  2. Leaf-substitution guard. When -verify is on, the walker stores the JCS-canonical envelope bytes it read from each tile. After receipt verification, it asserts bytes.Equal(receipt.ExtractPayload(rec), m.LeafBytes). Mismatch is a hard per-match failure with a clear "possible leaf substitution" error. Without this, a TL could serve a forged tile (fake host claim) plus a real receipt for an unrelated agent and the receipt-only check would pass.

  3. Path-injection guard on agentId. The agentId on each match comes from a TL leaf the verifier doesn't fully trust. Anything that isn't a UUID is refused before being interpolated into the receipt URL.

  4. Response-size cap. All HTTP bodies are read through io.LimitReader(body, 32 MiB+1); oversize responses are rejected, not truncated.

  5. Log-injection-safe output. All TL-supplied strings (ansName, host, eventType, agentId) are printed via %q, so ANSI escapes and embedded newlines cannot spoof CLI output.

  6. External context cancellation surfaces an error. A cancelled or timed-out parent context returns the ctx error rather than a silently truncated match list. The concurrent fetcher captures the first triggering error atomically so the user sees the root cause, not whichever tile happened to be lowest-indexed at the moment of cancel.

Out of scope

  • The pre-existing ans-verify <uuid> single-agent path still fetches its receipt directly without first verifying a checkpoint. That's a parallel gap worth closing in a follow-up; this PR doesn't change that path.
  • EQUIVALENCE_LINK events (PR [AI assisted] feat(event): EquivalenceLink V2 event for cross-anchor binding #20) are deliberately skipped by extractAgentIdentity since they carry no agent block. A future -include-links mode could surface them; not needed today.
  • No new TL server route. A future TL-side index of agents-by-host is on the table for scale; this client-side walk is the right primitive while logs are small to mid-sized and lets verifiers reconstruct without trusting any server index.

Files touched

  • cmd/ans-verify/walk.go (new) — walker, dedup, verifier, checkpoint parser, list subcommand.
  • cmd/ans-verify/walk_test.go (new) — unit coverage for every helper plus integration of walker + verifier against httptest.Server fixtures.
  • cmd/ans-verify/main.go — three-line subcommand dispatch at the top of main(). No other changes to existing behavior.

Test plan

  • make check passes (fmt, vet, golangci-lint, test-cover).
  • go test -race ./cmd/ans-verify/... passes.
  • Walker correctness across tile boundary (256 → 257 leaves, parallel fetch with out-of-order completion).
  • Checkpoint verification: happy path, tampered body, unknown key, malformed body (subtests for each parse failure).
  • Leaf-substitution catch verified with a deliberately forged tile-vs-receipt mismatch.
  • Path-injection guard tested by failing the test if the server is ever hit with a non-UUID agentId.
  • Body-size cap exercised by streaming past the limit.
  • External context.Cancel surfaces a non-nil error rather than partial results.
  • End-to-end run against a live local TL: registered 6 agents across two providers via scripts/demo/start.sh + register.sh, ran list and list -verify, revoked one and confirmed live-mode reduction drops it, confirmed raw mode preserves the full lifecycle. Checkpoint signature verified on every invocation, all receipts verified including leaf-substitution cross-check.

New `ans-verify list -provider <host>` subcommand walks the log's
entry tiles via the tlog-tiles spec, decodes V1/V2 producer
envelopes, and reports every agent whose `agent.host` falls under
the given suffix. Optional flags collapse to currently-live agents,
verify each match's SCITT receipt, and bound concurrency. No
server-side changes; this is purely a new CLI surface on the
existing cmd/ans-verify binary.

The walker fails closed on every checkable property:

  * Checkpoint signature verified against /root-keys via
    logstore.VerifyC2SPECDSA before any tile fetch (omission-attack
    guard against a tampered logSize).
  * Receipt payload cross-checked byte-for-byte against the tile
    leaf bytes during -verify (leaf-substitution guard).
  * agentId interpolated into URLs only after passing a UUID guard.
  * All HTTP bodies read through io.LimitReader at 32 MiB.
  * TL-supplied strings printed via %q so embedded ANSI/newlines
    cannot spoof CLI output.

The existing single-agent ans-verify [-agent] <uuid> form is
unchanged.

Signed-off-by: Layer8 <NWillAU900@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ans-verify list -provider <host> subcommand to the existing offline verifier CLI, enabling client-side enumeration of agents under a provider suffix by walking transparency-log entry tiles (optionally collapsing to “live” state and verifying SCITT receipts).

Changes:

  • Introduces a tile-walker that fetches/decodes entry bundles, filters by provider host suffix, and optionally reduces results to “live” agents.
  • Adds checkpoint-note parsing + signature verification against /root-keys, and optional per-match receipt verification with a leaf-substitution cross-check.
  • Adds comprehensive unit/integration-style tests for helper logic and the walker/verification flows, plus minimal subcommand dispatch in main.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
cmd/ans-verify/walk.go Implements provider enumeration (list), tile walking, live-reduction, checkpoint verification, and optional receipt verification.
cmd/ans-verify/walk_test.go Adds tests for provider matching, tile walking, checkpoint note verification, response caps, and receipt verification behaviors.
cmd/ans-verify/main.go Adds subcommand dispatch to route ans-verify list ... to the new implementation without changing the existing single-agent path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/ans-verify/walk.go
Comment on lines +222 to +229
entries, err := decodeEntryBundle(raw)
if err != nil {
wrapped := fmt.Errorf("decode %s: %w", path, err)
results[tileIdx] = tileResult{err: wrapped}
recordErr(wrapped)
continue
}
results[tileIdx] = tileResult{entries: entries}
Comment thread cmd/ans-verify/walk.go
Comment on lines +233 to +237
for tileIdx := range nTiles {
jobs <- tileIdx
}
close(jobs)
wg.Wait()
Comment thread cmd/ans-verify/walk.go Outdated
Comment on lines +614 to +635
fs := flag.NewFlagSet("list", flag.ExitOnError)
var (
baseURL string
provider string
live bool
doVerify bool
concurrency int
)
fs.StringVar(&baseURL, "url", "http://localhost:18081",
"Base URL of the transparency log")
fs.StringVar(&provider, "provider", "",
"Provider host suffix to filter on (e.g. darknetian.com)")
fs.BoolVar(&live, "live", true,
"Collapse to one row per agent and drop revoked/deprecated agents")
fs.BoolVar(&doVerify, "verify", false,
"After listing, fetch and verify each matched agent's SCITT receipt")
fs.IntVar(&concurrency, "concurrency", 8,
"Number of parallel HTTP workers (1-64)")
if err := fs.Parse(args); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
…ware producer, flag handling

Three findings from the automated review on PR godaddy#32:

1. Tile-size validation. After decoding an entry bundle, assert the
   leaf count matches the expected width — EntryBundleWidth (256) for
   a full tile, or the path's `.p/<N>` width for a partial tile. The
   checkpoint signature binds the tree shape but not the contents of
   any individual tile; without this guard a hostile or buggy TL can
   serve a truncated bundle (omitting leaves) or an oversized one
   (injecting extras) and the walker would silently accept it,
   undermining the "fail closed" property even after the checkpoint
   passes. Two new regression tests pin both directions.

2. Producer respects cancellation. The producer loop now selects on
   `wctx.Done()` between sends, so once a worker records the first
   error and cancels the context, the producer breaks out of the
   enqueue loop instead of pushing every remaining tile index. For a
   large log this avoids significant churn after the first failure.
   Workers still drain whatever is already in the channel via their
   existing wctx.Err() check, so close(jobs) never deadlocks.

3. Flag-parse handling. flag.NewFlagSet was constructed with
   flag.ExitOnError, which calls os.Exit(2) internally before Parse
   returns — the `if err := fs.Parse(...); err != nil` block was
   dead code. Switched to flag.ContinueOnError with an explicit
   fs.Usage so a parse failure prints consistent usage text and
   exits 1, matching the rest of the binary's error handling.

Signed-off-by: Layer8 <NWillAU900@gmail.com>
@nicknacnic

Copy link
Copy Markdown
Contributor Author

Thanks for the review — all three are good catches. Addressed in d7c288f.

1. Tile-size validation (walk.go:229). Agreed — the checkpoint signature binds tree shape but not the contents of any individual tile, so without an explicit length assertion a hostile TL can truncate or inflate a bundle silently. Added a guard that compares decoded entry count against the expected width: EntryBundleWidth (256) for a full tile, the .p/<N> width for a partial tile. Both directions covered by new regression tests (TestWalkProviderAgents_RejectsWrongTileSize, TestWalkProviderAgents_RejectsOversizedTile).

2. Producer respects cancellation (walk.go:237). Right — the producer was pushing all N indices through the channel regardless of cancel state. Workers were drain-only on the consumer side, so for a large log this added unnecessary channel traffic and delayed teardown after the first failure. Producer now selects on <-wctx.Done() between sends and breaks out of the loop on cancel. Workers still drain whatever's already buffered via their existing wctx.Err() check, so close(jobs) never deadlocks.

3. Flag handling (walk.go:635). Good catch — flag.ExitOnError calls os.Exit(2) before Parse returns, so the err != nil block was dead. Switched to flag.ContinueOnError with an explicit fs.Usage so parse failures print consistent usage text and exit with code 1, matching the rest of the binary.

Diff: nicknacnic:feat/ans-verify-list-provider → d7c288f

@csnitker-godaddy

Copy link
Copy Markdown
Collaborator

Thanks for the contribution! This PR actually surfaced a bug in the checkpoint signing that differed from the intended signature format. Could you please update your PR with the latest changes from main and update the parsing to handle the correct format?

csnitker-godaddy and others added 2 commits June 5, 2026 09:13
Updates signTestCheckpoint in walk_test.go to produce ASN.1 DER
ECDSA signatures, matching the production wire shape restored in
godaddy/ans PR godaddy#38 ("fix(tl): emit DER C2SP checkpoint signatures").

VerifyC2SPECDSA on main still accepts IEEE P1363 r||s as a legacy
fallback for older local-dev checkpoints, so the previous P1363
fixture continued to pass — but the tests should pin the format
verifiers will see in production, not the deprecated one.

Walker production code is unchanged: verifyCheckpointNote delegates
ECDSA verification to logstore.VerifyC2SPECDSA, which after PR godaddy#38
transparently accepts DER (primary) and P1363 (legacy). No other
adjustments needed.

Signed-off-by: Layer8 <NWillAU900@gmail.com>
@nicknacnic

Copy link
Copy Markdown
Contributor Author

Thanks Connor — pulled in 9717c0e (your merge of main) which brings PR #38 under us.

The walker's production code didn't need changes: verifyCheckpointNote delegates ECDSA verification to logstore.VerifyC2SPECDSA, and after #38 that function accepts ASN.1 DER as primary plus IEEE P1363 as a legacy fallback. So your merge alone made the walker correct against the corrected wire format.

What did need updating was the test fixturesignTestCheckpoint in walk_test.go was synthesizing checkpoint notes with P1363 r||s signatures (the old buggy format). It was still passing because of the P1363 fallback, but the test was pinning the deprecated shape, not the production one. Updated it to emit DER in d0fa648, on top of your merge.

Verified end-to-end against a fresh build of this branch: brought up scripts/demo/start.sh, registered three agents under connor-test.example, ran ans-verify list -verify. Output:

Tree size:   3 leaves (checkpoint signature ✓)
Matched:     3 live agents (from 3 raw leaves)
...
Verified 3/3 receipts (0 failed)

Branch tip: d0fa648.

@csnitker-godaddy csnitker-godaddy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting that updated! The changes look good and I verified they work against the local and production transparency log.

I think we probably need to come back to this cli sooner than later and implement something like cobra / viper to manage the CLI flags and help text but I can circle back on that when I get a bit of time

@csnitker-godaddy csnitker-godaddy added this pull request to the merge queue Jun 10, 2026
Merged via the queue into godaddy:main with commit 0d80e66 Jun 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants