Skip to content

chore(deps): bump actions/upload-artifact from 4 to 7#4

Closed
dependabot[bot] wants to merge 38 commits into
mainfrom
dependabot/github_actions/actions/upload-artifact-7
Closed

chore(deps): bump actions/upload-artifact from 4 to 7#4
dependabot[bot] wants to merge 38 commits into
mainfrom
dependabot/github_actions/actions/upload-artifact-7

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jun 9, 2026

Copy link
Copy Markdown

Bumps actions/upload-artifact from 4 to 7.

Release notes

Sourced from actions/upload-artifact's releases.

v7.0.0

v7 What's new

Direct Uploads

Adds support for uploading single files directly (unzipped). Callers can set the new archive parameter to false to skip zipping the file during upload. Right now, we only support single files. The action will fail if the glob passed resolves to multiple files. The name parameter is also ignored with this setting. Instead, the name of the artifact will be the name of the uploaded file.

ESM

To support new versions of the @actions/* packages, we've upgraded the package to ESM.

What's Changed

New Contributors

Full Changelog: actions/upload-artifact@v6...v7.0.0

v6.0.0

v6 - What's new

[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (runs.using: node24) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.

Node.js 24

This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.

What's Changed

Full Changelog: actions/upload-artifact@v5.0.0...v6.0.0

v5.0.0

What's Changed

BREAKING CHANGE: this update supports Node v24.x. This is not a breaking change per-se but we're treating it as such.

... (truncated)

Commits
  • 043fb46 Merge pull request #797 from actions/yacaovsnc/update-dependency
  • 634250c Include changes in typespec/ts-http-runtime 0.3.5
  • e454baa Readme: bump all the example versions to v7 (#796)
  • 74fad66 Update the readme with direct upload details (#795)
  • bbbca2d Support direct file uploads (#764)
  • 589182c Upgrade the module to ESM and bump dependencies (#762)
  • 47309c9 Merge pull request #754 from actions/Link-/add-proxy-integration-tests
  • 02a8460 Add proxy integration test
  • b7c566a Merge pull request #745 from actions/upload-artifact-v6-release
  • e516bc8 docs: correct description of Node.js 24 support in README
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Ka Yiu Cheung added 30 commits June 9, 2026 16:05
MIT-licensed Python 3.10+ project laid out with hatchling, ruff, mypy,
pytest+coverage, and a versioned single-source __version__ module.
First README is a stub; it will be rewritten as the toolkit grows.
These five modules are the wire format between every later subsystem
(speech, grounding, policy, env, metrics). They're deliberately small
and dependency-free so importing parley.core stays cheap.

- types: Audio/Transcript/Grounding/Frame/Action/Trace dataclasses.
- errors: ParleyError tree (ConfigError / RegistryError / ValidationError).
- rng: RngManager with BLAKE2b-derived named sub-streams — process-stable.
- registry: typed per-kind registries (speech / perturb / policy / metric / ...).
- config: pydantic v2 strict models + YAML loader with source-path errors.
Including determinism checks for derive_seed, stream-identity invariants,
extra-field rejection, and YAML/config error paths.
The runtime-checkable Protocol nails down the audio->Transcript contract.
The mock frontend (perfect-ASR) lets users isolate downstream errors;
the Whisper adapter is import-lazy and gated behind the 'whisper' extra
so the default install stays light.
Encodes text as a sequence of windowed sine tones on a log-spaced
frequency grid; decodes by FFT peak-picking over vocab bins. Clean audio
round-trips exactly, while noise / mu-law / clipping perturbations
collapse bin SNR and yield realistic substitutions. This is the trick
that lets Parley measure WER end-to-end in CI without shipping a real
acoustic model — and it's deliberately separate from the frontend wrapper
so other tools can call encode/decode directly.
Audio: Gain, Clip, AdditiveNoise(SNR-calibrated), MuLawCodec, Reverb,
TimeStretch, PitchShift. All implemented in pure numpy so the toolkit
keeps zero heavy deps; quality is benchmark-grade, not production audio.

Linguistic: Disfluency (word stutter), FillerInsertion ("uhm"/"uh"),
AccentSubstitution (configurable lexical remap). Each respects the rate
parameter and uses the supplied numpy Generator only — no hidden state.

Compose threads a single RNG through a list and is identity on empty,
so the "clean" baseline row is just Compose([]).
Covers VERB <COLOR> <SHAPE> [to the DIRECTION] with verb canonicalization
and a graceful <unknown> sentinel for failed parses. Six tests covering
full / minimal / canonical / unknown / partial inputs. Acts as the
reference baseline an LLM grounder can be benched against.
The bare 'env/' rule was eating the parley.env package. Anchor it.
A unit-square workspace populated with colored shape objects. Continuous
xy_delta moves; explicit pick/place actions interact with whichever
object the effector is on. Success predicates compare the final state
against a GoalSpec (pick: holding the right object, place/push: target
inside the named direction zone).

This is *not* a robotics simulator — it's a controllable testbed that
keeps the whole speech->ASR->grounding->policy->action->success chain
runnable in milliseconds. Real sims (LIBERO/ManiSkill/RLBench) plug in
via the same Environment Protocol.
Three reference policies implementing the VLAPolicy Protocol:

- ScriptedPolicy: small approach->pick->transit->place state machine
  parameterised by Grounding+Scene. Acts as the success-rate ceiling
  on synthetic data given a particular grounder.
- RandomPolicy: uniform xy + 5% pick/place — the floor baseline.
- NoisyPolicy: Gaussian-perturbs another policy's actions. Useful for
  probing metric sensitivity and as an 'imperfect VLA' surrogate.

Real models (OpenVLA/Octo/pi-0) plug in here via the same protocol.
…ormat

generate_dataset() emits Episodes whose audio is the codec-encoded form
of their instruction, closing the loop with the codec ASR: clean audio
decodes to the original text, so any non-zero WER attributable purely to
perturbations. Templates: pick/place/push the <COLOR> <SHAPE> [to the
<DIRECTION>]. The closed vocab returned by vocab_for() also covers
filler/disfluency/accent tokens so perturbed text round-trips cleanly.

On-disk format: a jsonl index + a sibling .audio.npz blob. The split
lets index-only operations (count, list, validate) skip the audio
entirely and keeps random-access cheap.
…ness, bootstrap

Per stage:
- ASR: WER (with sub/ins/del breakdown), CER, KeywordRecall.
- Grounding: GroundingExactMatch and GroundingSlotF1 over verb/target/dest.
- Action: SuccessRate, ActionMSE/MAE (gated on a reference action sequence
  in trace.metadata), DTW with path-length normalization.
- Efficiency: LatencyPercentiles (p50/p95/p99 + total + RTF).
- Robustness: RobustnessDelta — clean-vs-perturbed deltas + mean/max
  degradation (post-hoc aggregator over per-perturbation means).
- Aggregate: summarize() returns mean/SEM/percentile-bootstrap CI;
  paired_bootstrap_pvalue() for marking significant pipeline pairs.

WER/CER use a textbook DP edit-distance with backpointer reconstruction
so we can report sub/ins/del counts. Pure numpy; zero new deps. 27
tests covering each error type, partial-credit, and bootstrap edge cases.
The orchestrator that turns a config into per-episode results:

- Pipeline composes a SpeechFrontend + Grounder + VLAPolicy by name and
  records per-stage wall-clock timings on the Trace.
- run_episode walks an Episode through Perturbation -> Speech -> Grounding
  -> Env-rollout, populating Trace fields and timings_ms used by the
  latency metric. Per-stage RNG streams are derived from the global seed
  with distinct names so e.g. additive noise can't leak into env spawn.
- ContentCache: file-backed (atomic-write) JSON cache keyed by a
  (pipeline, perturbation, episode_id, seed) hash; cache_dir=None means
  disabled.
- expand_suite: cartesian product of pipelines x (clean + perturb groups)
  x episodes. Clean is always row 0 so robustness deltas have a baseline.
- BenchmarkEngine builds pipelines lazily (codec frontend gets the
  dataset vocab injected automatically), persists per-run trace JSON to
  output_dir/traces/, and supports an optional ThreadPoolExecutor for
  workers > 1.

Smoke run on synth(n=4) with 2 pipelines x 3 perturb groups = 24 results;
scripted policy is the success-rate ceiling and random is the floor as
expected. 13 tests cover cache, suite expansion, end-to-end runs, the
threadpool path, and the unknown-pipeline error path.
…d JSON

- aggregate_results() groups EpisodeResults by (pipeline, perturbation)
  and runs each metric through summarize() for mean/SEM/bootstrap CI.
  Skipped metrics (e.g. action_mse without a reference) are tracked per
  episode so the n in each cell is honest.
- render_markdown() emits a GitHub-friendly table with mean [low, high]
  cells; render_csv() emits one row per cell with explicit ci_low/high
  columns for spreadsheet use.
- build_leaderboard() ranks pipelines by clean success-rate, ties broken
  by lower mean degradation across perturbation groups.
- dump_report/load_report use a versioned JSON schema (schema_version=1)
  carrying parley_version + suite_name + rows + leaderboard. Refusing
  unknown schema_versions on load means future format changes don't
  silently misread old reports.
Thin glue over the library. Highlights:
- run: loads YAML config, optionally overrides dataset path / seed,
  runs the engine, dumps report.json + a config.resolved.yaml snapshot
  next to it, and prints a Rich-rendered table inline.
- report: re-renders previously-written report.json as markdown / csv /
  json. The CSV path reconstitutes Summary objects from the JSON dump
  so it's available without re-running the suite.
- list: enumerates registered plugins by kind for discoverability.
- validate: parses a YAML and prints a one-line summary.

`parley` is exposed as a console script via [project.scripts].
… smoke

Includes a synth -> run -> report --format json end-to-end round-trip
that exercises the full toolkit through the CLI.
Adds whitespace and line-break normalization. No semantic changes — all
130 tests stay green and mypy strict is still clean. Future commits can
assume `ruff format` is the canonical layout, which is what CI
enforces.
Dependabot is grouped so dev-tooling bumps (ruff, mypy, pytest) arrive
as one PR per week instead of a stream of singletons. CODEOWNERS keeps
review routing trivial for now (single maintainer).
- ci.yml: ruff check, ruff format --check, mypy on a Python 3.12 lint
  job; pytest+coverage matrix across Python 3.10..3.13 on Ubuntu plus a
  macOS 3.12 sanity job; a separate smoke-cli job runs the full `parley
  synth + run + validate` round trip against the quickstart example so
  CI breaks loudly if any end-to-end glue regresses.
- codeql.yml: weekly security scan + on every push/PR.
- release.yml: tag-triggered build + PyPI publish via trusted publishing
  (id-token: write, no API tokens needed).
Mirrors what CI enforces so committers catch lint/type breakage before
push instead of waiting on the runners.
…cimate) + sweep helpers

Three channel-flavored perturbations cribbed from the VoIP / telephony
literature:

- PacketLoss: drop contiguous packets at a configurable rate. Zeros
  rather than concealment — we're not doing PLC.
- BandLimit: brick-wall FFT band-pass; default 300-3400 Hz matches the
  ITU-T G.712 narrowband telephony passband.
- SpectralDecimate: zero out the top fraction of FFT bins, a poor-man's
  perceptual-codec proxy that's deterministic and dep-free.

suites.py adds three programmatic sweep builders:
- snr_sweep over CHiME/MUSAN-style SNR ladders.
- codec_sweep: mu_law / telephone / spectral_decimate / packet_loss.
- linguistic_sweep: disfluency / filler / accent_subst.

These are for ad-hoc / notebook use; YAML configs can still list each
PerturbationGroup explicitly when that's clearer. 9 new tests.
Two robustness-science staples cribbed from the speech/fairness eval
literature (see docs/design-notes.md):

- sensitivity_index() computes ΔTask / ΔInput per (pipeline, perturbation)
  using each pipeline's own "clean" row as baseline. The slope is
  interpretable: a 1-point increase in the upstream metric (WER, ...)
  costs N points of downstream task success. A degenerate ΔInput=0 with
  non-zero ΔTask becomes math.inf — surfaces 'perturbation didn't move
  the input metric but the task collapsed anyway' (a pipeline brittleness
  signal that survives WER staying flat).
- worst_group_report() — per pipeline, returns the minimum value of a
  target metric across grouped rows. Currently grouped by perturbation
  but the surface accommodates future axes (accent stratum, speaker id)
  without redesign.

7 tests cover fragility comparison, zero-delta handling, missing-baseline
skip, and the secondary-metric path.
The synth generator was rolling a 10% chance of omitting the direction
for place/push verbs, but the env's success predicate requires a
destination zone for those verbs, making those episodes unsatisfiable
and pushing the scripted policy's clean success-rate to ~80% instead of
100%. Found it via the programmatic example's worst-group report — a
nice example of the toolkit catching its own bugs.

Now: pick verbs have no destination; place/push always carry one.
include_directions=False degrades place/push to picks rather than
emitting an unsatisfiable goal.
- quickstart.yaml: smallest interesting run (1 pipeline, 16 episodes,
  one mild perturbation).
- robustness_panel.yaml: scripted vs random across 11 perturbations
  covering acoustic / channel / linguistic axes.
- snr_sweep.yaml: a five-rung SNR ladder — the canonical degradation
  curve.
- programmatic/custom_suite.py: builds the same kind of config in code
  using snr_sweep() / codec_sweep(), runs, prints headline table +
  sensitivity index + worst-group report.

All three configs pass `parley validate`. examples/README.md indexes them.
ASCII diagram of the call graph from CLI through runner into the four
pluggable subsystems, with a wire-format table mapping each
parley.core.types dataclass to its origin -> sink, and sections on
determinism, caching, parallelism, and the eval-toolkit lineage the
design borrows from (HELM, lm-eval, Inspect).
Ka Yiu Cheung and others added 8 commits June 9, 2026 17:03
- usage.md: install, the five subcommands, what `parley run` writes,
  programmatic usage, plugging in a real frontend, reproducibility.
- metrics.md: every metric — what it measures, what scale, when to use,
  grouped by stage (ASR / grounding / action / efficiency / robustness /
  bookkeeping).
- api-reference.md: curated public surface across core / data / runner /
  report / metrics / perturb. Anything not listed there is internal.
Grounds Parley's design in current (2023-2026) systems with verifiable
URLs: VLA policies (RT-1/2, OpenVLA, Octo, pi_0, GR00T N1, RDT-1B...),
speech LLMs (Whisper, Qwen2-Audio, SALMONN, Moshi, SeamlessM4T...),
robot benchmarks (LIBERO, CALVIN, RLBench, ManiSkill, SimplerEnv,
VLABench...), and eval-harness architecture patterns (HELM,
lm-evaluation-harness, MTEB, Inspect).

Also documents what the design is deliberately *not* good at — synth env
isn't a physics sim, the codec ASR is robust to surprisingly low SNRs,
action chunking and real RIRs are out of scope for v0. Better to be
honest than oversell.
- README: badges, why-this-toolkit-exists, 60-second tour (CLI + Python),
  feature summary, doc index.
- CHANGELOG: Keep-a-Changelog format with the 0.1.0 inventory under
  Added.
- CONTRIBUTING: the four CI gates, plugin-registration recipe, commit
  style guide.
- CODE_OF_CONDUCT: Contributor Covenant 2.1.
- SECURITY: GitHub private-vuln-reporting flow, plus an honest note on
  the actual attack surface (npz + YAML).
…key, per-thread pipelines

Three correctness issues found in review:

1. Linguistic perturbations were silent. They rewrite instruction.text,
   but run_episode kept feeding the *original* codec audio to the ASR,
   so disfluency/filler/accent_subst all measured WER=0. Now: when the
   frontend is the codec, re-encode the perturbed text before transcribe.
   (Real-audio adapters like Whisper are unaffected — they don't touch
   instruction.text in their perturbation path.)

2. The cache key was (pipeline, perturbation, episode, seed) by NAME
   only. Two suites reusing a group name 'noise' with different snr_db
   collided. cache_key now takes a config_fingerprint folding in the
   resolved pipeline + perturbation params + env + metrics + max_steps.

3. The engine shared one lazily-built Pipeline instance across the
   ThreadPoolExecutor, but policies hold per-episode state (reset/act).
   Under workers>1 we now build a private pipeline per run unit.
n_in=0 produced a negative linspace and out-of-bounds indexing in
TimeStretch/PitchShift. Guard and return the empty array unchanged.
- linguistic perturbation moves WER
- cache key param fingerprint + no cross-param collision
- empty-input resample
- threaded run matches serial (policy-state race guard)

Plus CHANGELOG entries under [Unreleased].
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Jun 9, 2026
@dependabot @github

dependabot Bot commented on behalf of github Jun 9, 2026

Copy link
Copy Markdown
Author

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/github_actions/actions/upload-artifact-7 branch June 9, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant