Skip to content

feat: batch profiles + backpressure for lazy-parallel runs#37

Open
lopadova wants to merge 40 commits intotask/enterprise-operations-scalabilityfrom
task/enterprise-operations-scalability-batch-profiles
Open

feat: batch profiles + backpressure for lazy-parallel runs#37
lopadova wants to merge 40 commits intotask/enterprise-operations-scalabilityfrom
task/enterprise-operations-scalability-batch-profiles

Conversation

@lopadova
Copy link
Copy Markdown
Contributor

@lopadova lopadova commented May 4, 2026

Summary

Macro Task 8 (Enterprise Operations And Scalability Add-On) v1.x slice 1.
Adds named operational batch profiles plus producer-side backpressure
controls for lazy-parallel runs, without taking a hard Horizon
dependency.

  • BatchProfile + BatchProfileResolver with built-in ci, smoke,
    nightly defaults and optional eval-harness.batches.profiles.*
    host-app overrides. Explicit CLI options always win over profile
    defaults; lazy-parallel-only profile fields drop silently when the
    resolved mode is serial. Single-field overrides can flip a built-in
    lazy profile to serial without nulling every inherited field.
    Profile validation mirrors BatchOptions cross-validation, rejects
    unknown profile keys (typos like concurency), rejects profiles
    that set chunk_size without an explicit concurrency, and
    rejects a misdeclared (non-array) eval-harness.batches.profiles
    config so host-app overrides cannot silently fall back to built-ins.
  • BatchOptions extended with chunkSize, rateLimit,
    rateWindowSeconds, checkpointEvery. Serial mode rejects them all,
    lazy-parallel mode rejects rateWindowSeconds without rateLimit,
    and chunkSize cannot exceed concurrency so --concurrency stays
    a real fan-out cap. Validation messages stay CLI-neutral so
    programmatic EvalEngine::runBatch() / runEvalSet() callers do
    not see CLI-only diagnostics.
  • RateLimitWindow (pure sliding-window math, head-offset queue with
    lazy compaction so prune is amortized O(1) per dispatch) and a
    BatchProgressReporter contract with a no-op default plus an
    optional BatchTerminalProgressReporter sub-contract that adds
    reportTerminal(batchId, samplesCompleted, totalSamples, status)
    with STATUS_SUCCESS / STATUS_FAILURE / STATUS_EMPTY constants
    for dashboards that need to distinguish a finished failed batch
    from a stalled one. The service provider prefers the sub-contract
    binding when both keys are present so host apps can register under
    either.
  • LazyParallelBatch::run() throttles dispatch via the limiter, uses
    effectiveChunkSize() for the producer window and a tailored TTL
    estimate, and emits checkpoints once per crossed multiple of
    checkpointEvery plus a guaranteed terminal end-of-batch event
    (forced on the failure path so dashboards distinguish failed from
    stalled even when totalSamples is an exact multiple of
    checkpointEvery). The wait timeout is a hard per-window
    wall-clock cap: the chunk deadline propagates into
    dispatchSampleJobs() so the loop aborts early when exceeded, and
    throttleDispatch() caps its sleep at the remaining deadline budget.
  • LazyParallelBatch::dispatch() (external collection flow)
    intentionally does NOT throttle on rate limits and uses a
    concurrency-keyed drain-time TTL: `ceil(sampleCount / concurrency)
    • timeout, lifted to the safety floor max(default, drainSeconds, timeout, configuredTTL). --batch-timeoutis intentionally NOT in the floor: it boundsrun()s producer wait, not worker drain time, so leaving it in the dispatch path TTL would inflate cache retention by hours/days for any caller raising --batch-timeout. Operators with larger or smaller worker pools should override the floor via BatchOptions::lazyParallel(resultTtlSeconds: ...)`.
  • New CLI flags on eval-harness:run and eval-harness:adversarial:
    --batch-profile, --chunk-size, --rate-limit,
    --rate-window-seconds, --checkpoint-every,
    --result-ttl-seconds. The --concurrency help text describes the
    new contract: producer fan-out cap with --chunk-size only narrowing
    the dispatch window further. Pass none (or null) on a numeric
    flag (--timeout, --batch-timeout, --chunk-size,
    --rate-limit, --rate-window-seconds, --result-ttl-seconds,
    --checkpoint-every) to explicitly clear an inherited profile
    value for a one-off run; queue names stay arbitrary strings (no
    sentinel for --queue) so apps already using a queue literally
    called none continue to work.
  • Cross-field reconciliation in BuildsBatchOptions keeps the
    documented "explicit CLI wins" intuitive even when the explicit
    override changes the validity of an inherited profile field: an
    explicit lower --concurrency caps the inherited chunk_size, and
    an explicit --rate-limit=none drops the inherited
    rate_window_seconds. Explicit-vs-explicit conflicts (e.g.
    --rate-limit=none --rate-window-seconds=30) still surface through
    BatchOptions validation.
  • Docs: README + docs/HORIZON_BATCH_QUEUES.md gain operational
    profile and backpressure sections; the producer-window paragraphs
    call out that the runner waits after each chunk so --chunk-size
    is the effective in-flight limit when smaller than --concurrency.
    Both docs document the --flag=none escape hatch and the optional
    BatchTerminalProgressReporter sub-contract for terminal-status
    dashboards. The README adversarial section clarifies that
    --outputs bypasses the batch contract entirely.

No new env vars in this slice; sync queue and queue fakes still cover
the test suite. No Horizon dependency added.

Test plan

  • composer validate --strict => valid
  • vendor/bin/phpunit => OK (693 tests, 1940 assertions)
  • vendor/bin/phpstan analyse --memory-limit=512M --no-progress => no errors
  • vendor/bin/pint --test => passed

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants