chore(deps): bump actions/checkout from 4 to 6 by dependabot[bot] · Pull Request #2 · psychopathdev/parley

dependabot · 2026-06-09T10:00:46Z

Release notes

Sourced from actions/checkout's releases.

v6.0.0

What's Changed

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

Persist creds to a separate file by @ericsciple in actions/checkout#2286

v6-beta by @ericsciple in actions/checkout#2298

update readme/changelog for v6 by @ericsciple in actions/checkout#2311

Full Changelog: actions/checkout@v5.0.0...v6.0.0

v6-beta

What's Changed

Updated persist-credentials to store the credentials under $RUNNER_TEMP instead of directly in the local git config.

This requires a minimum Actions Runner version of v2.329.0 to access the persisted credentials for Docker container action scenarios.

v5.0.1

What's Changed

Port v6 cleanup to v5 by @ericsciple in actions/checkout#2301

Full Changelog: actions/checkout@v5...v5.0.1

v5.0.0

What's Changed

Update actions checkout to use node 24 by @salmanmkc in actions/checkout#2226

Prepare v5.0.0 release by @salmanmkc in actions/checkout#2238

⚠️ Minimum Compatible Runner Version

v2.327.1
Release Notes

Make sure your runner is updated to this version or newer to use this release.

Full Changelog: actions/checkout@v4...v5.0.0

v4.3.1

What's Changed

Port v6 cleanup to v4 by @ericsciple in actions/checkout#2305

Full Changelog: actions/checkout@v4...v4.3.1

v4.3.0

What's Changed

docs: update README.md by @motss in actions/checkout#1971

Add internal repos for checking out multiple repositories by @mouismail in actions/checkout#1977

Documentation update - add recommended permissions to Readme by @benwells in actions/checkout#2043

... (truncated)

Changelog

Sourced from actions/checkout's changelog.

Changelog

v6.0.3

Fix checkout init for SHA-256 repositories by @yaananth in actions/checkout#2439

fix: expand merge commit SHA regex and add SHA-256 test cases by @yaananth in actions/checkout#2414

v6.0.2

Fix tag handling: preserve annotations and explicit fetch-tags by @ericsciple in actions/checkout#2356

v6.0.1

Add worktree support for persist-credentials includeIf by @ericsciple in actions/checkout#2327

v6.0.0

Persist creds to a separate file by @ericsciple in actions/checkout#2286

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

v5.0.1

Port v6 cleanup to v5 by @ericsciple in actions/checkout#2301

v5.0.0

Update actions checkout to use node 24 by @salmanmkc in actions/checkout#2226

v4.3.1

Port v6 cleanup to v4 by @ericsciple in actions/checkout#2305

v4.3.0

docs: update README.md by @motss in actions/checkout#1971

Add internal repos for checking out multiple repositories by @mouismail in actions/checkout#1977

Documentation update - add recommended permissions to Readme by @benwells in actions/checkout#2043

Adjust positioning of user email note and permissions heading by @joshmgross in actions/checkout#2044

Update README.md by @nebuk89 in actions/checkout#2194

Update CODEOWNERS for actions by @TingluoHuang in actions/checkout#2224

Update package dependencies by @salmanmkc in actions/checkout#2236

v4.2.2

url-helper.ts now leverages well-known environment variables by @jww3 in actions/checkout#1941

Expand unit test coverage for isGhes by @jww3 in actions/checkout#1946

v4.2.1

Check out other refs/* by commit if provided, fall back to ref by @orhantoy in actions/checkout#1924

v4.2.0

Add Ref and Commit outputs by @lucacome in actions/checkout#1180

Dependency updates by @dependabot- actions/checkout#1777, actions/checkout#1872

v4.1.7

Bump the minor-npm-dependencies group across 1 directory with 4 updates by @dependabot in actions/checkout#1739

Bump actions/checkout from 3 to 4 by @dependabot in actions/checkout#1697

Check out other refs/* by commit by @orhantoy in actions/checkout#1774

... (truncated)

Commits

df4cb1c Update changelog for v6.0.3 (#2446)
1cce339 Fix checkout init for SHA-256 repositories (#2439)
900f221 fix: expand merge commit SHA regex and add SHA-256 test cases (#2414)
0c366fd Update changelog (#2357)
de0fac2 Fix tag handling: preserve annotations and explicit fetch-tags (#2356)
064fe7f Add orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is set (...
8e8c483 Clarify v6 README (#2328)
033fa0d Add worktree support for persist-credentials includeIf (#2327)
c2d88d3 Update all references from v5 and v4 to v6 (#2314)
1af3b93 update readme/changelog for v6 (#2311)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

MIT-licensed Python 3.10+ project laid out with hatchling, ruff, mypy, pytest+coverage, and a versioned single-source __version__ module. First README is a stub; it will be rewritten as the toolkit grows.

These five modules are the wire format between every later subsystem (speech, grounding, policy, env, metrics). They're deliberately small and dependency-free so importing parley.core stays cheap. - types: Audio/Transcript/Grounding/Frame/Action/Trace dataclasses. - errors: ParleyError tree (ConfigError / RegistryError / ValidationError). - rng: RngManager with BLAKE2b-derived named sub-streams — process-stable. - registry: typed per-kind registries (speech / perturb / policy / metric / ...). - config: pydantic v2 strict models + YAML loader with source-path errors.

Including determinism checks for derive_seed, stream-identity invariants, extra-field rejection, and YAML/config error paths.

The runtime-checkable Protocol nails down the audio->Transcript contract. The mock frontend (perfect-ASR) lets users isolate downstream errors; the Whisper adapter is import-lazy and gated behind the 'whisper' extra so the default install stays light.

Encodes text as a sequence of windowed sine tones on a log-spaced frequency grid; decodes by FFT peak-picking over vocab bins. Clean audio round-trips exactly, while noise / mu-law / clipping perturbations collapse bin SNR and yield realistic substitutions. This is the trick that lets Parley measure WER end-to-end in CI without shipping a real acoustic model — and it's deliberately separate from the frontend wrapper so other tools can call encode/decode directly.

…ntend

Audio: Gain, Clip, AdditiveNoise(SNR-calibrated), MuLawCodec, Reverb, TimeStretch, PitchShift. All implemented in pure numpy so the toolkit keeps zero heavy deps; quality is benchmark-grade, not production audio. Linguistic: Disfluency (word stutter), FillerInsertion ("uhm"/"uh"), AccentSubstitution (configurable lexical remap). Each respects the rate parameter and uses the supplied numpy Generator only — no hidden state. Compose threads a single RNG through a list and is identity on empty, so the "clean" baseline row is just Compose([]).

Covers VERB <COLOR> <SHAPE> [to the DIRECTION] with verb canonicalization and a graceful <unknown> sentinel for failed parses. Six tests covering full / minimal / canonical / unknown / partial inputs. Acts as the reference baseline an LLM grounder can be benched against.

The bare 'env/' rule was eating the parley.env package. Anchor it.

A unit-square workspace populated with colored shape objects. Continuous xy_delta moves; explicit pick/place actions interact with whichever object the effector is on. Success predicates compare the final state against a GoalSpec (pick: holding the right object, place/push: target inside the named direction zone). This is *not* a robotics simulator — it's a controllable testbed that keeps the whole speech->ASR->grounding->policy->action->success chain runnable in milliseconds. Real sims (LIBERO/ManiSkill/RLBench) plug in via the same Environment Protocol.

Three reference policies implementing the VLAPolicy Protocol: - ScriptedPolicy: small approach->pick->transit->place state machine parameterised by Grounding+Scene. Acts as the success-rate ceiling on synthetic data given a particular grounder. - RandomPolicy: uniform xy + 5% pick/place — the floor baseline. - NoisyPolicy: Gaussian-perturbs another policy's actions. Useful for probing metric sensitivity and as an 'imperfect VLA' surrogate. Real models (OpenVLA/Octo/pi-0) plug in here via the same protocol.

…ormat generate_dataset() emits Episodes whose audio is the codec-encoded form of their instruction, closing the loop with the codec ASR: clean audio decodes to the original text, so any non-zero WER attributable purely to perturbations. Templates: pick/place/push the <COLOR> <SHAPE> [to the <DIRECTION>]. The closed vocab returned by vocab_for() also covers filler/disfluency/accent tokens so perturbed text round-trips cleanly. On-disk format: a jsonl index + a sibling .audio.npz blob. The split lets index-only operations (count, list, validate) skip the audio entirely and keeps random-access cheap.

…ness, bootstrap Per stage: - ASR: WER (with sub/ins/del breakdown), CER, KeywordRecall. - Grounding: GroundingExactMatch and GroundingSlotF1 over verb/target/dest. - Action: SuccessRate, ActionMSE/MAE (gated on a reference action sequence in trace.metadata), DTW with path-length normalization. - Efficiency: LatencyPercentiles (p50/p95/p99 + total + RTF). - Robustness: RobustnessDelta — clean-vs-perturbed deltas + mean/max degradation (post-hoc aggregator over per-perturbation means). - Aggregate: summarize() returns mean/SEM/percentile-bootstrap CI; paired_bootstrap_pvalue() for marking significant pipeline pairs. WER/CER use a textbook DP edit-distance with backpointer reconstruction so we can report sub/ins/del counts. Pure numpy; zero new deps. 27 tests covering each error type, partial-credit, and bootstrap edge cases.

The orchestrator that turns a config into per-episode results: - Pipeline composes a SpeechFrontend + Grounder + VLAPolicy by name and records per-stage wall-clock timings on the Trace. - run_episode walks an Episode through Perturbation -> Speech -> Grounding -> Env-rollout, populating Trace fields and timings_ms used by the latency metric. Per-stage RNG streams are derived from the global seed with distinct names so e.g. additive noise can't leak into env spawn. - ContentCache: file-backed (atomic-write) JSON cache keyed by a (pipeline, perturbation, episode_id, seed) hash; cache_dir=None means disabled. - expand_suite: cartesian product of pipelines x (clean + perturb groups) x episodes. Clean is always row 0 so robustness deltas have a baseline. - BenchmarkEngine builds pipelines lazily (codec frontend gets the dataset vocab injected automatically), persists per-run trace JSON to output_dir/traces/, and supports an optional ThreadPoolExecutor for workers > 1. Smoke run on synth(n=4) with 2 pipelines x 3 perturb groups = 24 results; scripted policy is the success-rate ceiling and random is the floor as expected. 13 tests cover cache, suite expansion, end-to-end runs, the threadpool path, and the unknown-pipeline error path.

…d JSON - aggregate_results() groups EpisodeResults by (pipeline, perturbation) and runs each metric through summarize() for mean/SEM/bootstrap CI. Skipped metrics (e.g. action_mse without a reference) are tracked per episode so the n in each cell is honest. - render_markdown() emits a GitHub-friendly table with mean [low, high] cells; render_csv() emits one row per cell with explicit ci_low/high columns for spreadsheet use. - build_leaderboard() ranks pipelines by clean success-rate, ties broken by lower mean degradation across perturbation groups. - dump_report/load_report use a versioned JSON schema (schema_version=1) carrying parley_version + suite_name + rows + leaderboard. Refusing unknown schema_versions on load means future format changes don't silently misread old reports.

Thin glue over the library. Highlights: - run: loads YAML config, optionally overrides dataset path / seed, runs the engine, dumps report.json + a config.resolved.yaml snapshot next to it, and prints a Rich-rendered table inline. - report: re-renders previously-written report.json as markdown / csv / json. The CSV path reconstitutes Summary objects from the JSON dump so it's available without re-running the suite. - list: enumerates registered plugins by kind for discoverability. - validate: parses a YAML and prints a one-line summary. `parley` is exposed as a console script via [project.scripts].

… smoke Includes a synth -> run -> report --format json end-to-end round-trip that exercises the full toolkit through the CLI.

Adds whitespace and line-break normalization. No semantic changes — all 130 tests stay green and mypy strict is still clean. Future commits can assume `ruff format` is the canonical layout, which is what CI enforces.

Dependabot is grouped so dev-tooling bumps (ruff, mypy, pytest) arrive as one PR per week instead of a stream of singletons. CODEOWNERS keeps review routing trivial for now (single maintainer).

- ci.yml: ruff check, ruff format --check, mypy on a Python 3.12 lint job; pytest+coverage matrix across Python 3.10..3.13 on Ubuntu plus a macOS 3.12 sanity job; a separate smoke-cli job runs the full `parley synth + run + validate` round trip against the quickstart example so CI breaks loudly if any end-to-end glue regresses. - codeql.yml: weekly security scan + on every push/PR. - release.yml: tag-triggered build + PyPI publish via trusted publishing (id-token: write, no API tokens needed).

Mirrors what CI enforces so committers catch lint/type breakage before push instead of waiting on the runners.

…cimate) + sweep helpers Three channel-flavored perturbations cribbed from the VoIP / telephony literature: - PacketLoss: drop contiguous packets at a configurable rate. Zeros rather than concealment — we're not doing PLC. - BandLimit: brick-wall FFT band-pass; default 300-3400 Hz matches the ITU-T G.712 narrowband telephony passband. - SpectralDecimate: zero out the top fraction of FFT bins, a poor-man's perceptual-codec proxy that's deterministic and dep-free. suites.py adds three programmatic sweep builders: - snr_sweep over CHiME/MUSAN-style SNR ladders. - codec_sweep: mu_law / telephone / spectral_decimate / packet_loss. - linguistic_sweep: disfluency / filler / accent_subst. These are for ad-hoc / notebook use; YAML configs can still list each PerturbationGroup explicitly when that's clearer. 9 new tests.

Two robustness-science staples cribbed from the speech/fairness eval literature (see docs/design-notes.md): - sensitivity_index() computes ΔTask / ΔInput per (pipeline, perturbation) using each pipeline's own "clean" row as baseline. The slope is interpretable: a 1-point increase in the upstream metric (WER, ...) costs N points of downstream task success. A degenerate ΔInput=0 with non-zero ΔTask becomes math.inf — surfaces 'perturbation didn't move the input metric but the task collapsed anyway' (a pipeline brittleness signal that survives WER staying flat). - worst_group_report() — per pipeline, returns the minimum value of a target metric across grouped rows. Currently grouped by perturbation but the surface accommodates future axes (accent stratum, speaker id) without redesign. 7 tests cover fragility comparison, zero-delta handling, missing-baseline skip, and the secondary-metric path.

The synth generator was rolling a 10% chance of omitting the direction for place/push verbs, but the env's success predicate requires a destination zone for those verbs, making those episodes unsatisfiable and pushing the scripted policy's clean success-rate to ~80% instead of 100%. Found it via the programmatic example's worst-group report — a nice example of the toolkit catching its own bugs. Now: pick verbs have no destination; place/push always carry one. include_directions=False degrades place/push to picks rather than emitting an unsatisfiable goal.

…sh sequence

- quickstart.yaml: smallest interesting run (1 pipeline, 16 episodes, one mild perturbation). - robustness_panel.yaml: scripted vs random across 11 perturbations covering acoustic / channel / linguistic axes. - snr_sweep.yaml: a five-rung SNR ladder — the canonical degradation curve. - programmatic/custom_suite.py: builds the same kind of config in code using snr_sweep() / codec_sweep(), runs, prints headline table + sensitivity index + worst-group report. All three configs pass `parley validate`. examples/README.md indexes them.

ASCII diagram of the call graph from CLI through runner into the four pluggable subsystems, with a wire-format table mapping each parley.core.types dataclass to its origin -> sink, and sections on determinism, caching, parallelism, and the eval-toolkit lineage the design borrows from (HELM, lm-eval, Inspect).

- usage.md: install, the five subcommands, what `parley run` writes, programmatic usage, plugging in a real frontend, reproducibility. - metrics.md: every metric — what it measures, what scale, when to use, grouped by stage (ASR / grounding / action / efficiency / robustness / bookkeeping). - api-reference.md: curated public surface across core / data / runner / report / metrics / perturb. Anything not listed there is internal.

Grounds Parley's design in current (2023-2026) systems with verifiable URLs: VLA policies (RT-1/2, OpenVLA, Octo, pi_0, GR00T N1, RDT-1B...), speech LLMs (Whisper, Qwen2-Audio, SALMONN, Moshi, SeamlessM4T...), robot benchmarks (LIBERO, CALVIN, RLBench, ManiSkill, SimplerEnv, VLABench...), and eval-harness architecture patterns (HELM, lm-evaluation-harness, MTEB, Inspect). Also documents what the design is deliberately *not* good at — synth env isn't a physics sim, the codec ASR is robust to surprisingly low SNRs, action chunking and real RIRs are out of scope for v0. Better to be honest than oversell.

- README: badges, why-this-toolkit-exists, 60-second tour (CLI + Python), feature summary, doc index. - CHANGELOG: Keep-a-Changelog format with the 0.1.0 inventory under Added. - CONTRIBUTING: the four CI gates, plugin-registration recipe, commit style guide. - CODE_OF_CONDUCT: Contributor Covenant 2.1. - SECURITY: GitHub private-vuln-reporting flow, plus an honest note on the actual attack surface (npz + YAML).

…key, per-thread pipelines Three correctness issues found in review: 1. Linguistic perturbations were silent. They rewrite instruction.text, but run_episode kept feeding the *original* codec audio to the ASR, so disfluency/filler/accent_subst all measured WER=0. Now: when the frontend is the codec, re-encode the perturbed text before transcribe. (Real-audio adapters like Whisper are unaffected — they don't touch instruction.text in their perturbation path.) 2. The cache key was (pipeline, perturbation, episode, seed) by NAME only. Two suites reusing a group name 'noise' with different snr_db collided. cache_key now takes a config_fingerprint folding in the resolved pipeline + perturbation params + env + metrics + max_steps. 3. The engine shared one lazily-built Pipeline instance across the ThreadPoolExecutor, but policies hold per-episode state (reset/act). Under workers>1 we now build a private pipeline per run unit.

n_in=0 produced a negative linspace and out-of-bounds indexing in TimeStretch/PitchShift. Guard and return the empty array unchanged.

- linguistic perturbation moves WER - cache key param fingerprint + no cross-param collision - empty-input resample - threaded run matches serial (policy-state race guard) Plus CHANGELOG entries under [Unreleased].

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-06-09T10:58:10Z

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

Ka Yiu Cheung added 30 commits June 9, 2026 16:05

chore: bootstrap parley package skeleton

962a6ba

MIT-licensed Python 3.10+ project laid out with hatchling, ruff, mypy, pytest+coverage, and a versioned single-source __version__ module. First README is a stub; it will be rewritten as the toolkit grows.

test(core): cover types, RNG, registry, config (26 tests)

8d461a3

Including determinism checks for derive_seed, stream-identity invariants, extra-field rejection, and YAML/config error paths.

test(speech): cover codec round-trip, noise degradation, and mock fro…

e8f5a17

…ntend

test(perturb): cover SNR calibration, replay determinism, and Compose

86339cb

fix(gitignore): scope 'env/' to top-level only

ff6906e

The bare 'env/' rule was eating the parley.env package. Anchor it.

test(env,grounding): cover scripted-success, random, noise wrap, parser

820ea32

test(data): determinism, round-trip, codec-clean transcribability

8eab186

test(report,cli): aggregation, leaderboard ranking, schema check, CLI…

a7c6346

… smoke Includes a synth -> run -> report --format json end-to-end round-trip that exercises the full toolkit through the CLI.

chore: apply ruff-format house style across the tree

02b5db3

Adds whitespace and line-break normalization. No semantic changes — all 130 tests stay green and mypy strict is still clean. Future commits can assume `ruff format` is the canonical layout, which is what CI enforces.

chore(github): add issue + PR templates, CODEOWNERS, dependabot config

ede55f8

Dependabot is grouped so dev-tooling bumps (ruff, mypy, pytest) arrive as one PR per week instead of a stream of singletons. CODEOWNERS keeps review routing trivial for now (single maintainer).

chore: add pre-commit config (ruff, ruff-format, mypy, hygiene hooks)

b15ed74

Mirrors what CI enforces so committers catch lint/type breakage before push instead of waiting on the runners.

test(runner): bump max_steps to 40 to accommodate the longer place/pu…

7443621

…sh sequence

Ka Yiu Cheung and others added 8 commits June 9, 2026 17:03

fix(perturb): _resample_linear no longer crashes on empty input

f234ed3

n_in=0 produced a negative linspace and out-of-bounds indexing in TimeStretch/PitchShift. Guard and return the empty array unchanged.

test(runner): regression tests for the four review fixes

23a55c3

- linguistic perturbation moves WER - cache key param fingerprint + no cross-param collision - empty-input resample - threaded run matches serial (policy-state race guard) Plus CHANGELOG entries under [Unreleased].

docs: point repo URLs at the publishing account

57f8fe6

dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Jun 9, 2026

psychopathdev closed this Jun 9, 2026

psychopathdev force-pushed the main branch from 57f8fe6 to 57b5b40 Compare June 9, 2026 10:58

dependabot Bot deleted the dependabot/github_actions/actions/checkout-6 branch June 9, 2026 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump actions/checkout from 4 to 6#2

chore(deps): bump actions/checkout from 4 to 6#2
dependabot[bot] wants to merge 38 commits into
mainfrom
dependabot/github_actions/actions/checkout-6

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Jun 9, 2026

v6.0.0

What's Changed

v6-beta

What's Changed

v5.0.1

What's Changed

v5.0.0

What's Changed

⚠️ Minimum Compatible Runner Version

v4.3.1

What's Changed

v4.3.0

What's Changed

Changelog

v6.0.3

v6.0.2

v6.0.1

v6.0.0

v5.0.1

v5.0.0

v4.3.1

v4.3.0

v4.2.2

v4.2.1

v4.2.0

v4.1.7

Uh oh!

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant