feat: batch profiles + backpressure for lazy-parallel runs#37
Open
lopadova wants to merge 40 commits intotask/enterprise-operations-scalabilityfrom
Open
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Macro Task 8 (Enterprise Operations And Scalability Add-On) v1.x slice 1.
Adds named operational batch profiles plus producer-side backpressure
controls for lazy-parallel runs, without taking a hard Horizon
dependency.
BatchProfile+BatchProfileResolverwith built-inci,smoke,nightlydefaults and optionaleval-harness.batches.profiles.*host-app overrides. Explicit CLI options always win over profile
defaults; lazy-parallel-only profile fields drop silently when the
resolved mode is serial. Single-field overrides can flip a built-in
lazy profile to serial without nulling every inherited field.
Profile validation mirrors BatchOptions cross-validation, rejects
unknown profile keys (typos like
concurency), rejects profilesthat set
chunk_sizewithout an explicitconcurrency, andrejects a misdeclared (non-array)
eval-harness.batches.profilesconfig so host-app overrides cannot silently fall back to built-ins.
BatchOptionsextended withchunkSize,rateLimit,rateWindowSeconds,checkpointEvery. Serial mode rejects them all,lazy-parallel mode rejects
rateWindowSecondswithoutrateLimit,and
chunkSizecannot exceedconcurrencyso--concurrencystaysa real fan-out cap. Validation messages stay CLI-neutral so
programmatic
EvalEngine::runBatch()/runEvalSet()callers donot see CLI-only diagnostics.
RateLimitWindow(pure sliding-window math, head-offset queue withlazy compaction so prune is amortized O(1) per dispatch) and a
BatchProgressReportercontract with a no-op default plus anoptional
BatchTerminalProgressReportersub-contract that addsreportTerminal(batchId, samplesCompleted, totalSamples, status)with
STATUS_SUCCESS/STATUS_FAILURE/STATUS_EMPTYconstantsfor dashboards that need to distinguish a finished failed batch
from a stalled one. The service provider prefers the sub-contract
binding when both keys are present so host apps can register under
either.
LazyParallelBatch::run()throttles dispatch via the limiter, useseffectiveChunkSize()for the producer window and a tailored TTLestimate, and emits checkpoints once per crossed multiple of
checkpointEveryplus a guaranteed terminal end-of-batch event(forced on the failure path so dashboards distinguish failed from
stalled even when
totalSamplesis an exact multiple ofcheckpointEvery). The wait timeout is a hard per-windowwall-clock cap: the chunk deadline propagates into
dispatchSampleJobs()so the loop aborts early when exceeded, andthrottleDispatch()caps its sleep at the remaining deadline budget.LazyParallelBatch::dispatch()(external collection flow)intentionally does NOT throttle on rate limits and uses a
concurrency-keyed drain-time TTL: `ceil(sampleCount / concurrency)
, lifted to the safety floormax(default, drainSeconds, timeout, configuredTTL).--batch-timeoutis intentionally NOT in the floor: it boundsrun()s producer wait, not worker drain time, so leaving it in the dispatch path TTL would inflate cache retention by hours/days for any caller raising--batch-timeout. Operators with larger or smaller worker pools should override the floor viaBatchOptions::lazyParallel(resultTtlSeconds: ...)`.eval-harness:runandeval-harness:adversarial:--batch-profile,--chunk-size,--rate-limit,--rate-window-seconds,--checkpoint-every,--result-ttl-seconds. The--concurrencyhelp text describes thenew contract: producer fan-out cap with
--chunk-sizeonly narrowingthe dispatch window further. Pass
none(ornull) on a numericflag (
--timeout,--batch-timeout,--chunk-size,--rate-limit,--rate-window-seconds,--result-ttl-seconds,--checkpoint-every) to explicitly clear an inherited profilevalue for a one-off run; queue names stay arbitrary strings (no
sentinel for
--queue) so apps already using a queue literallycalled
nonecontinue to work.BuildsBatchOptionskeeps thedocumented "explicit CLI wins" intuitive even when the explicit
override changes the validity of an inherited profile field: an
explicit lower
--concurrencycaps the inherited chunk_size, andan explicit
--rate-limit=nonedrops the inheritedrate_window_seconds. Explicit-vs-explicit conflicts (e.g.
--rate-limit=none --rate-window-seconds=30) still surface throughBatchOptions validation.
docs/HORIZON_BATCH_QUEUES.mdgain operationalprofile and backpressure sections; the producer-window paragraphs
call out that the runner waits after each chunk so
--chunk-sizeis the effective in-flight limit when smaller than
--concurrency.Both docs document the
--flag=noneescape hatch and the optionalBatchTerminalProgressReportersub-contract for terminal-statusdashboards. The README adversarial section clarifies that
--outputsbypasses the batch contract entirely.No new env vars in this slice;
syncqueue and queue fakes still coverthe test suite. No Horizon dependency added.
Test plan
composer validate --strict=> validvendor/bin/phpunit=>OK (693 tests, 1940 assertions)vendor/bin/phpstan analyse --memory-limit=512M --no-progress=> no errorsvendor/bin/pint --test=> passed