chore(deps): bump actions/upload-artifact from 4 to 7 by dependabot[bot] · Pull Request #4 · psychopathdev/parley

dependabot · 2026-06-09T10:00:54Z

Bumps actions/upload-artifact from 4 to 7.

Release notes

Sourced from actions/upload-artifact's releases.

v7.0.0

v7 What's new

Direct Uploads

Adds support for uploading single files directly (unzipped). Callers can set the new archive parameter to false to skip zipping the file during upload. Right now, we only support single files. The action will fail if the glob passed resolves to multiple files. The name parameter is also ignored with this setting. Instead, the name of the artifact will be the name of the uploaded file.

ESM

To support new versions of the @actions/* packages, we've upgraded the package to ESM.

What's Changed

Add proxy integration test by @Link- in actions/upload-artifact#754

Upgrade the module to ESM and bump dependencies by @danwkennedy in actions/upload-artifact#762

Support direct file uploads by @danwkennedy in actions/upload-artifact#764

New Contributors

@Link- made their first contribution in actions/upload-artifact#754

Full Changelog: actions/upload-artifact@v6...v7.0.0

v6.0.0

v6 - What's new

[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (runs.using: node24) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.

Node.js 24

This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.

What's Changed

Upload Artifact Node 24 support by @salmanmkc in actions/upload-artifact#719

fix: update @actions/artifact for Node.js 24 punycode deprecation by @salmanmkc in actions/upload-artifact#744

prepare release v6.0.0 for Node.js 24 support by @salmanmkc in actions/upload-artifact#745

Full Changelog: actions/upload-artifact@v5.0.0...v6.0.0

v5.0.0

What's Changed

BREAKING CHANGE: this update supports Node v24.x. This is not a breaking change per-se but we're treating it as such.

Update README.md by @GhadimiR in actions/upload-artifact#681

Update README.md by @nebuk89 in actions/upload-artifact#712

Readme: spell out the first use of GHES by @danwkennedy in actions/upload-artifact#727

Update GHES guidance to include reference to Node 20 version by @patrikpolyak in actions/upload-artifact#725

Bump @actions/artifact to v4.0.0

Prepare v5.0.0 by @danwkennedy in actions/upload-artifact#734

... (truncated)

Commits

043fb46 Merge pull request #797 from actions/yacaovsnc/update-dependency
634250c Include changes in typespec/ts-http-runtime 0.3.5
e454baa Readme: bump all the example versions to v7 (#796)
74fad66 Update the readme with direct upload details (#795)
bbbca2d Support direct file uploads (#764)
589182c Upgrade the module to ESM and bump dependencies (#762)
47309c9 Merge pull request #754 from actions/Link-/add-proxy-integration-tests
02a8460 Add proxy integration test
b7c566a Merge pull request #745 from actions/upload-artifact-v6-release
e516bc8 docs: correct description of Node.js 24 support in README
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

MIT-licensed Python 3.10+ project laid out with hatchling, ruff, mypy, pytest+coverage, and a versioned single-source __version__ module. First README is a stub; it will be rewritten as the toolkit grows.

These five modules are the wire format between every later subsystem (speech, grounding, policy, env, metrics). They're deliberately small and dependency-free so importing parley.core stays cheap. - types: Audio/Transcript/Grounding/Frame/Action/Trace dataclasses. - errors: ParleyError tree (ConfigError / RegistryError / ValidationError). - rng: RngManager with BLAKE2b-derived named sub-streams — process-stable. - registry: typed per-kind registries (speech / perturb / policy / metric / ...). - config: pydantic v2 strict models + YAML loader with source-path errors.

Including determinism checks for derive_seed, stream-identity invariants, extra-field rejection, and YAML/config error paths.

The runtime-checkable Protocol nails down the audio->Transcript contract. The mock frontend (perfect-ASR) lets users isolate downstream errors; the Whisper adapter is import-lazy and gated behind the 'whisper' extra so the default install stays light.

Encodes text as a sequence of windowed sine tones on a log-spaced frequency grid; decodes by FFT peak-picking over vocab bins. Clean audio round-trips exactly, while noise / mu-law / clipping perturbations collapse bin SNR and yield realistic substitutions. This is the trick that lets Parley measure WER end-to-end in CI without shipping a real acoustic model — and it's deliberately separate from the frontend wrapper so other tools can call encode/decode directly.

…ntend

Audio: Gain, Clip, AdditiveNoise(SNR-calibrated), MuLawCodec, Reverb, TimeStretch, PitchShift. All implemented in pure numpy so the toolkit keeps zero heavy deps; quality is benchmark-grade, not production audio. Linguistic: Disfluency (word stutter), FillerInsertion ("uhm"/"uh"), AccentSubstitution (configurable lexical remap). Each respects the rate parameter and uses the supplied numpy Generator only — no hidden state. Compose threads a single RNG through a list and is identity on empty, so the "clean" baseline row is just Compose([]).

Covers VERB <COLOR> <SHAPE> [to the DIRECTION] with verb canonicalization and a graceful <unknown> sentinel for failed parses. Six tests covering full / minimal / canonical / unknown / partial inputs. Acts as the reference baseline an LLM grounder can be benched against.

The bare 'env/' rule was eating the parley.env package. Anchor it.

A unit-square workspace populated with colored shape objects. Continuous xy_delta moves; explicit pick/place actions interact with whichever object the effector is on. Success predicates compare the final state against a GoalSpec (pick: holding the right object, place/push: target inside the named direction zone). This is *not* a robotics simulator — it's a controllable testbed that keeps the whole speech->ASR->grounding->policy->action->success chain runnable in milliseconds. Real sims (LIBERO/ManiSkill/RLBench) plug in via the same Environment Protocol.

Three reference policies implementing the VLAPolicy Protocol: - ScriptedPolicy: small approach->pick->transit->place state machine parameterised by Grounding+Scene. Acts as the success-rate ceiling on synthetic data given a particular grounder. - RandomPolicy: uniform xy + 5% pick/place — the floor baseline. - NoisyPolicy: Gaussian-perturbs another policy's actions. Useful for probing metric sensitivity and as an 'imperfect VLA' surrogate. Real models (OpenVLA/Octo/pi-0) plug in here via the same protocol.

…ormat generate_dataset() emits Episodes whose audio is the codec-encoded form of their instruction, closing the loop with the codec ASR: clean audio decodes to the original text, so any non-zero WER attributable purely to perturbations. Templates: pick/place/push the <COLOR> <SHAPE> [to the <DIRECTION>]. The closed vocab returned by vocab_for() also covers filler/disfluency/accent tokens so perturbed text round-trips cleanly. On-disk format: a jsonl index + a sibling .audio.npz blob. The split lets index-only operations (count, list, validate) skip the audio entirely and keeps random-access cheap.

…ness, bootstrap Per stage: - ASR: WER (with sub/ins/del breakdown), CER, KeywordRecall. - Grounding: GroundingExactMatch and GroundingSlotF1 over verb/target/dest. - Action: SuccessRate, ActionMSE/MAE (gated on a reference action sequence in trace.metadata), DTW with path-length normalization. - Efficiency: LatencyPercentiles (p50/p95/p99 + total + RTF). - Robustness: RobustnessDelta — clean-vs-perturbed deltas + mean/max degradation (post-hoc aggregator over per-perturbation means). - Aggregate: summarize() returns mean/SEM/percentile-bootstrap CI; paired_bootstrap_pvalue() for marking significant pipeline pairs. WER/CER use a textbook DP edit-distance with backpointer reconstruction so we can report sub/ins/del counts. Pure numpy; zero new deps. 27 tests covering each error type, partial-credit, and bootstrap edge cases.

The orchestrator that turns a config into per-episode results: - Pipeline composes a SpeechFrontend + Grounder + VLAPolicy by name and records per-stage wall-clock timings on the Trace. - run_episode walks an Episode through Perturbation -> Speech -> Grounding -> Env-rollout, populating Trace fields and timings_ms used by the latency metric. Per-stage RNG streams are derived from the global seed with distinct names so e.g. additive noise can't leak into env spawn. - ContentCache: file-backed (atomic-write) JSON cache keyed by a (pipeline, perturbation, episode_id, seed) hash; cache_dir=None means disabled. - expand_suite: cartesian product of pipelines x (clean + perturb groups) x episodes. Clean is always row 0 so robustness deltas have a baseline. - BenchmarkEngine builds pipelines lazily (codec frontend gets the dataset vocab injected automatically), persists per-run trace JSON to output_dir/traces/, and supports an optional ThreadPoolExecutor for workers > 1. Smoke run on synth(n=4) with 2 pipelines x 3 perturb groups = 24 results; scripted policy is the success-rate ceiling and random is the floor as expected. 13 tests cover cache, suite expansion, end-to-end runs, the threadpool path, and the unknown-pipeline error path.

…d JSON - aggregate_results() groups EpisodeResults by (pipeline, perturbation) and runs each metric through summarize() for mean/SEM/bootstrap CI. Skipped metrics (e.g. action_mse without a reference) are tracked per episode so the n in each cell is honest. - render_markdown() emits a GitHub-friendly table with mean [low, high] cells; render_csv() emits one row per cell with explicit ci_low/high columns for spreadsheet use. - build_leaderboard() ranks pipelines by clean success-rate, ties broken by lower mean degradation across perturbation groups. - dump_report/load_report use a versioned JSON schema (schema_version=1) carrying parley_version + suite_name + rows + leaderboard. Refusing unknown schema_versions on load means future format changes don't silently misread old reports.

Thin glue over the library. Highlights: - run: loads YAML config, optionally overrides dataset path / seed, runs the engine, dumps report.json + a config.resolved.yaml snapshot next to it, and prints a Rich-rendered table inline. - report: re-renders previously-written report.json as markdown / csv / json. The CSV path reconstitutes Summary objects from the JSON dump so it's available without re-running the suite. - list: enumerates registered plugins by kind for discoverability. - validate: parses a YAML and prints a one-line summary. `parley` is exposed as a console script via [project.scripts].

… smoke Includes a synth -> run -> report --format json end-to-end round-trip that exercises the full toolkit through the CLI.

Adds whitespace and line-break normalization. No semantic changes — all 130 tests stay green and mypy strict is still clean. Future commits can assume `ruff format` is the canonical layout, which is what CI enforces.

Dependabot is grouped so dev-tooling bumps (ruff, mypy, pytest) arrive as one PR per week instead of a stream of singletons. CODEOWNERS keeps review routing trivial for now (single maintainer).

- ci.yml: ruff check, ruff format --check, mypy on a Python 3.12 lint job; pytest+coverage matrix across Python 3.10..3.13 on Ubuntu plus a macOS 3.12 sanity job; a separate smoke-cli job runs the full `parley synth + run + validate` round trip against the quickstart example so CI breaks loudly if any end-to-end glue regresses. - codeql.yml: weekly security scan + on every push/PR. - release.yml: tag-triggered build + PyPI publish via trusted publishing (id-token: write, no API tokens needed).

Mirrors what CI enforces so committers catch lint/type breakage before push instead of waiting on the runners.

…cimate) + sweep helpers Three channel-flavored perturbations cribbed from the VoIP / telephony literature: - PacketLoss: drop contiguous packets at a configurable rate. Zeros rather than concealment — we're not doing PLC. - BandLimit: brick-wall FFT band-pass; default 300-3400 Hz matches the ITU-T G.712 narrowband telephony passband. - SpectralDecimate: zero out the top fraction of FFT bins, a poor-man's perceptual-codec proxy that's deterministic and dep-free. suites.py adds three programmatic sweep builders: - snr_sweep over CHiME/MUSAN-style SNR ladders. - codec_sweep: mu_law / telephone / spectral_decimate / packet_loss. - linguistic_sweep: disfluency / filler / accent_subst. These are for ad-hoc / notebook use; YAML configs can still list each PerturbationGroup explicitly when that's clearer. 9 new tests.

Two robustness-science staples cribbed from the speech/fairness eval literature (see docs/design-notes.md): - sensitivity_index() computes ΔTask / ΔInput per (pipeline, perturbation) using each pipeline's own "clean" row as baseline. The slope is interpretable: a 1-point increase in the upstream metric (WER, ...) costs N points of downstream task success. A degenerate ΔInput=0 with non-zero ΔTask becomes math.inf — surfaces 'perturbation didn't move the input metric but the task collapsed anyway' (a pipeline brittleness signal that survives WER staying flat). - worst_group_report() — per pipeline, returns the minimum value of a target metric across grouped rows. Currently grouped by perturbation but the surface accommodates future axes (accent stratum, speaker id) without redesign. 7 tests cover fragility comparison, zero-delta handling, missing-baseline skip, and the secondary-metric path.

The synth generator was rolling a 10% chance of omitting the direction for place/push verbs, but the env's success predicate requires a destination zone for those verbs, making those episodes unsatisfiable and pushing the scripted policy's clean success-rate to ~80% instead of 100%. Found it via the programmatic example's worst-group report — a nice example of the toolkit catching its own bugs. Now: pick verbs have no destination; place/push always carry one. include_directions=False degrades place/push to picks rather than emitting an unsatisfiable goal.

…sh sequence

- quickstart.yaml: smallest interesting run (1 pipeline, 16 episodes, one mild perturbation). - robustness_panel.yaml: scripted vs random across 11 perturbations covering acoustic / channel / linguistic axes. - snr_sweep.yaml: a five-rung SNR ladder — the canonical degradation curve. - programmatic/custom_suite.py: builds the same kind of config in code using snr_sweep() / codec_sweep(), runs, prints headline table + sensitivity index + worst-group report. All three configs pass `parley validate`. examples/README.md indexes them.

ASCII diagram of the call graph from CLI through runner into the four pluggable subsystems, with a wire-format table mapping each parley.core.types dataclass to its origin -> sink, and sections on determinism, caching, parallelism, and the eval-toolkit lineage the design borrows from (HELM, lm-eval, Inspect).

- usage.md: install, the five subcommands, what `parley run` writes, programmatic usage, plugging in a real frontend, reproducibility. - metrics.md: every metric — what it measures, what scale, when to use, grouped by stage (ASR / grounding / action / efficiency / robustness / bookkeeping). - api-reference.md: curated public surface across core / data / runner / report / metrics / perturb. Anything not listed there is internal.

Grounds Parley's design in current (2023-2026) systems with verifiable URLs: VLA policies (RT-1/2, OpenVLA, Octo, pi_0, GR00T N1, RDT-1B...), speech LLMs (Whisper, Qwen2-Audio, SALMONN, Moshi, SeamlessM4T...), robot benchmarks (LIBERO, CALVIN, RLBench, ManiSkill, SimplerEnv, VLABench...), and eval-harness architecture patterns (HELM, lm-evaluation-harness, MTEB, Inspect). Also documents what the design is deliberately *not* good at — synth env isn't a physics sim, the codec ASR is robust to surprisingly low SNRs, action chunking and real RIRs are out of scope for v0. Better to be honest than oversell.

- README: badges, why-this-toolkit-exists, 60-second tour (CLI + Python), feature summary, doc index. - CHANGELOG: Keep-a-Changelog format with the 0.1.0 inventory under Added. - CONTRIBUTING: the four CI gates, plugin-registration recipe, commit style guide. - CODE_OF_CONDUCT: Contributor Covenant 2.1. - SECURITY: GitHub private-vuln-reporting flow, plus an honest note on the actual attack surface (npz + YAML).

…key, per-thread pipelines Three correctness issues found in review: 1. Linguistic perturbations were silent. They rewrite instruction.text, but run_episode kept feeding the *original* codec audio to the ASR, so disfluency/filler/accent_subst all measured WER=0. Now: when the frontend is the codec, re-encode the perturbed text before transcribe. (Real-audio adapters like Whisper are unaffected — they don't touch instruction.text in their perturbation path.) 2. The cache key was (pipeline, perturbation, episode, seed) by NAME only. Two suites reusing a group name 'noise' with different snr_db collided. cache_key now takes a config_fingerprint folding in the resolved pipeline + perturbation params + env + metrics + max_steps. 3. The engine shared one lazily-built Pipeline instance across the ThreadPoolExecutor, but policies hold per-episode state (reset/act). Under workers>1 we now build a private pipeline per run unit.

n_in=0 produced a negative linspace and out-of-bounds indexing in TimeStretch/PitchShift. Guard and return the empty array unchanged.

- linguistic perturbation moves WER - cache key param fingerprint + no cross-param collision - empty-input resample - threaded run matches serial (policy-state race guard) Plus CHANGELOG entries under [Unreleased].

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-06-09T10:58:10Z

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

Ka Yiu Cheung added 30 commits June 9, 2026 16:05

chore: bootstrap parley package skeleton

962a6ba

MIT-licensed Python 3.10+ project laid out with hatchling, ruff, mypy, pytest+coverage, and a versioned single-source __version__ module. First README is a stub; it will be rewritten as the toolkit grows.

test(core): cover types, RNG, registry, config (26 tests)

8d461a3

Including determinism checks for derive_seed, stream-identity invariants, extra-field rejection, and YAML/config error paths.

test(speech): cover codec round-trip, noise degradation, and mock fro…

e8f5a17

…ntend

test(perturb): cover SNR calibration, replay determinism, and Compose

86339cb

fix(gitignore): scope 'env/' to top-level only

ff6906e

The bare 'env/' rule was eating the parley.env package. Anchor it.

test(env,grounding): cover scripted-success, random, noise wrap, parser

820ea32

test(data): determinism, round-trip, codec-clean transcribability

8eab186

test(report,cli): aggregation, leaderboard ranking, schema check, CLI…

a7c6346

… smoke Includes a synth -> run -> report --format json end-to-end round-trip that exercises the full toolkit through the CLI.

chore: apply ruff-format house style across the tree

02b5db3

Adds whitespace and line-break normalization. No semantic changes — all 130 tests stay green and mypy strict is still clean. Future commits can assume `ruff format` is the canonical layout, which is what CI enforces.

chore(github): add issue + PR templates, CODEOWNERS, dependabot config

ede55f8

Dependabot is grouped so dev-tooling bumps (ruff, mypy, pytest) arrive as one PR per week instead of a stream of singletons. CODEOWNERS keeps review routing trivial for now (single maintainer).

chore: add pre-commit config (ruff, ruff-format, mypy, hygiene hooks)

b15ed74

Mirrors what CI enforces so committers catch lint/type breakage before push instead of waiting on the runners.

test(runner): bump max_steps to 40 to accommodate the longer place/pu…

7443621

…sh sequence

Ka Yiu Cheung and others added 8 commits June 9, 2026 17:03

fix(perturb): _resample_linear no longer crashes on empty input

f234ed3

n_in=0 produced a negative linspace and out-of-bounds indexing in TimeStretch/PitchShift. Guard and return the empty array unchanged.

test(runner): regression tests for the four review fixes

23a55c3

- linguistic perturbation moves WER - cache key param fingerprint + no cross-param collision - empty-input resample - threaded run matches serial (policy-state race guard) Plus CHANGELOG entries under [Unreleased].

docs: point repo URLs at the publishing account

57f8fe6

dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Jun 9, 2026

psychopathdev closed this Jun 9, 2026

psychopathdev force-pushed the main branch from 57f8fe6 to 57b5b40 Compare June 9, 2026 10:58

dependabot Bot deleted the dependabot/github_actions/actions/upload-artifact-7 branch June 9, 2026 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump actions/upload-artifact from 4 to 7#4

chore(deps): bump actions/upload-artifact from 4 to 7#4
dependabot[bot] wants to merge 38 commits into
mainfrom
dependabot/github_actions/actions/upload-artifact-7

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Jun 9, 2026

v7.0.0

v7 What's new

Direct Uploads

ESM

What's Changed

New Contributors

v6.0.0

v6 - What's new

Node.js 24

What's Changed

v5.0.0

What's Changed

Uh oh!

dependabot Bot commented on behalf of github Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant