Skip to content

Add time-series-forecasting task schema#2126

Open
pjhul wants to merge 13 commits into
huggingface:mainfrom
pjhul:task/time-series-forecasting
Open

Add time-series-forecasting task schema#2126
pjhul wants to merge 13 commits into
huggingface:mainfrom
pjhul:task/time-series-forecasting

Conversation

@pjhul
Copy link
Copy Markdown

@pjhul pjhul commented Apr 25, 2026

Scaffolds the new time-series-forecasting task with input/output JSON schemas, generated TypeScript types, task metadata, and registry wiring. pnpm check, test, and lint:check on the tasks package all pass locally.

Design decisions worth flagging

  1. series is a list of structured series objects, not a string or binary payload. This diverges from existing HF tasks but mirrors how mature time-series libraries represent a batch: GluonTS datasets are iterables of {target, start, …} entries, Darts uses List[TimeSeries], and Chronos takes a list of series. Parallel-array encodings don't survive contact with ragged lengths or per-series optional fields (timestamps, covariates, metadata).

  2. target is always 2D [num_timesteps][num_channels]. Univariate series use a length-1 channel axis. This is a deliberate normalization, not a shape any single library uses — GluonTS/DeepAR are 1D for univariate, Darts is 3D (time, component, sample), and Chronos-1 is univariate-1D (Chronos-2 is the one that adds multivariate). Fixing target at 2D avoids oneOf 1D/2D ambiguity at length-1 inputs and keeps output indexing uniform (out[t][c]).

  3. Missing observations are null inline inside target, not a separate observed_mask. Matches pandas, R, SQL, Arrow convention. HF's TimeSeriesTransformer uses past_observed_mask because PyTorch tensors can't hold null — but for API consumers dealing with a separate mask is much more cumbersome than in-line null's.

  4. Input uses start + parameters.frequency rather than an explicit timestamps array. Matches the Darts TimeSeries model.

  5. quantile_predictions is an array of {level, values} objects rather than a dict keyed by string-float. Matches HF's pattern for enumerated scored outputs (text-generation logprobs, classification scores, fill-mask tokens). AWS Chronos uses a dict-keyed-by-string; we diverge intentionally for HF-ecosystem consistency.

  6. Three uncertainty channels on outputmean (required), quantile_predictions, samples. All optional; they may coexist. No parametric distribution parameters. samples is the universal fallback for any distributional output.

Validation

  • pnpm --filter tasks-gen inference-codegen regenerates clean inference.ts from the JSON schemas
  • pnpm --filter @huggingface/tasks check (tsc)
  • pnpm --filter @huggingface/tasks test
  • pnpm --filter @huggingface/tasks lint:check
  • Registered in packages/tasks/src/tasks/index.ts

Note

Low Risk
Additive task-package schemas and registry wiring only; no auth, runtime inference, or existing task behavior changes.

Overview
Introduces a full time-series-forecasting task definition in @huggingface/tasks: JSON Schema input/output specs, codegen’d TypeScript inference types, task metadata (data.ts), and docs (about.md).

Registry: time-series-forecasting moves from a placeholder (undefined) to a real task page via getData(...), with public exports for TimeSeriesForecastingInput / Output and related types.

API shape (new contract): Requests use a series array of structured objects (not a flat tensor). Each item requires target as 2D [timesteps][channels] (univariate = single channel per step), optional start, covariates, and echoed metadata. Parameters cover prediction_length, frequency, quantile_levels, num_samples, and seed. Responses are outputs aligned 1:1 with input order: required mean, optional quantile_predictions ({level, values}), samples, and timestamps when start + frequency are set.

Reviewed by Cursor Bugbot for commit 9317092. Bugbot is set up for automated code reviews on this repo. Configure here.

@pjhul pjhul marked this pull request as ready for review April 25, 2026 19:58
Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pjhul, did a first pass to the PR, thanks a lot for working at this!

could you share links to the schemas/docs you used as reference?
When I asked an AI agent to cross-check, some of the claims didn't quite hold against current docs:

  • Darts is 3D, not 2D. docs say "Return a 3-D array of dimension (time,
    component, sample)"
  • GluonTS is 1D for univariate and 2D (dim, num_time_steps) for multivariate
  • Chronos-1 is 1D-only, predict accepts "a 1D tensor, or a list of 1D
    tensors, or a 2D tensor whose first dimension is batch"* Chronos-2 adds
    multivariate, so if you used Chronos-2 the prose just needs narrowing.
  • DeepAR is 1D with "NaN" strings: "target": [4.0, 10.0, "NaN", 100.0, 113.0]

want to make sure it's comparing against the same references you used.

Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/input.json Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/output.json Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/input.json
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/output.json
Comment thread packages/tasks/src/tasks/time-series-forecasting/data.ts Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/data.ts Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/data.ts Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/input.json
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/input.json Outdated
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/input.json
Comment thread packages/tasks/src/tasks/time-series-forecasting/about.md Outdated
pjhul and others added 3 commits May 29, 2026 11:42
format: "date-time" makes quicktype generate `Date` types, but
@huggingface/inference parses responses with a bare JSON.parse and
never hydrates fields into Date objects. Drop the format so the
generated types (`start`, `timestamps`) match the raw string values
returned at runtime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
monash_tsf only resolves via a 307 redirect and Salesforce/gift-eval
returns 401 (not a public repo). Use the canonical ids
Monash-University/monash_tsf and Salesforce/GiftEval, both verified 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add minItems:1 at both the time and channel axes so the schemas reject
nonsensical empty inputs/outputs (target:[], target:[[]], mean:[]).
prediction_length already enforces minimum:1, so a non-empty mean is
always expected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/output.json
pjhul and others added 5 commits May 29, 2026 12:09
Align with Darts and GluonTS, where future covariates must cover the
target's historical window plus the forecast horizon (total length
num_timesteps + prediction_length), not just the horizon. Allow null in
the historical portion to match past_covariates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The demo input/output were flat 1D arrays, but the spec and about.md
example use the always-2D [time][channel] shape. Show the univariate
demo as a length-1 channel axis so the example is internally consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reject duplicate quantile levels via uniqueItems; duplicates would
produce redundant quantile_predictions entries. Keep the (0,1) range
(matches Darts) and document that the response sorts predictions
ascending by level, so callers needn't pre-sort.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a seed parameter so sample-based probabilistic forecasts can be
reproduced across retries. Unconstrained integer to match the repo
convention (text-generation, text-to-image). Note that support is
best-effort: not all probabilistic models/providers honor seeding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add Hugging Face transformers (PatchTST, PatchTSMixer, Autoformer,
Informer, Time Series Transformer) and transformers.js (PatchTST/
PatchTSMixer via ONNX) to the Direct Inference section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f7fdc57. Configure here.

Comment thread packages/tasks/src/tasks/time-series-forecasting/spec/output.json
pjhul and others added 2 commits May 29, 2026 13:01
Unlike other tasks where `inputs` is the data itself, here it is a batch
container whose elements are each one time series. Rename it to `series`
to make that explicit, matching the term of art in Darts (TimeSeries)
and GluonTS (each entry is one time series). Also rename the element
type TimeSeriesForecastingInputItem -> TimeSeriesForecastingSeries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the minItems:1 constraints from mean/series onto the parallel
output fields: quantile_predictions.values and samples (which are
documented as "same shape as mean"), and the outputs array itself
(1:1 with series). Closes the gaps flagged by Cursor Bugbot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pjhul
Copy link
Copy Markdown
Author

pjhul commented Jun 2, 2026

Hey @hanouticelina!

Thank you for the detailed review 🙏 I think I addressed all the comments you left on the PR, but let me know if I missed anything.

In regards to your comment, happy to expand more here!

Darts is 3D, not 2D. docs say "Return a 3-D array of dimension (time,
component, sample)"

Yup this is true - I should have been more specific here when I referenced Darts. The reason for the third axis is to support stochastic time series [ref]. While this makes sense for time series output, it doesn't really apply for input time series. Since we already have an output structure to wrap quantiles/samples there wasn't a need for this axis.

When calling .values() the return is a 2-D (time, component) array [ref].

GluonTS is 1D for univariate and 2D (dim, num_time_steps) for multivariate

Great callout - this was actually something I went back and forth on: whether or not to use a 1D array for univariate. I eventually landed on 2D for everything to avoid the need for consumers to deal with 'polymorphic' return types. Basically to avoid calling something like Array.isArray(series[0]) to narrow types.

Honestly I'd love to get your take on how big of a deal something like is in your experience. There's also the counter argument that needing to marshal 1D series into a 2D array is painful, so I could go either way here.

Chronos-1 is 1D-only, predict accepts "a 1D tensor, or a list of 1D
tensors, or a 2D tensor whose first dimension is batch"* Chronos-2 adds
multivariate, so if you used Chronos-2 the prose just needs narrowing.

Great flag - I updated that reference.

DeepAR is 1D with "NaN" strings: "target": [4.0, 10.0, "NaN", 100.0, 113.0]

Yup went back and forth a bit on this one as well. Unfortunately because NaN isn't a valid token in strict JSON that didn't seem like an option. I decided to go with null instead, once again to avoid the polymorphic array element type. Along with that, I couldn't think of a need to distinguish between NaN and null as they both typically represent empty/missing values.

Hopefully this helps out and please let me know if you have any follow up questions!

@pjhul pjhul requested review from hanouticelina and julien-c June 2, 2026 08:18
@kashif
Copy link
Copy Markdown

kashif commented Jun 2, 2026

yes all seem like sensible choices. one can think of univariate a multivariate with just 1 time series so that abstraction also works out I believe. I am checking in regards to the probabilisitic forecasts and quantile based forecasts next and will report back

{
description:
"Continuous Ranked Probability Score, which evaluates probabilistic forecasts by measuring the difference between forecast and observation CDFs.",
id: "crps",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: crps and wql end up being essentially the same number once you're scoring at fixed quantile levels. GIFT-Eval pairs MASE with MSIS as the interval metric — might be worth msis here instead so the three aren't redundant. No strong opinion though.

"items": { "type": ["number", "null"] }
}
},
"static_covariates": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this allow categorical statics too, or is numeric-only intentional for v1? transformers' own TimeSeriesTransformer splits these into static_categorical + static_real, and item/store-id type features are pretty common. Totally fine to punt — just flagging it has nowhere to go right now.

}
},
"future_covariates": {
"description": "Optional named covariates known over both the historical window and the forecast horizon. Each key maps to a 1D array spanning the full timeline: the first `num_timesteps` values align 1:1 with `target`, followed by `parameters.prediction_length` values over the horizon (total length `num_timesteps + prediction_length`). Missing values in the historical portion are encoded as `null`.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small ambiguity: if prediction_length is omitted (native horizon), the client can't tell how long the future portion here needs to be. Maybe require prediction_length when future_covariates is present?

"type": "object",
"required": ["level", "values"],
"properties": {
"level": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice — numeric level reads cleaner than the stringified-float keys gluonts/chronos use internally. Avoids the 0.1 → "0.1" round-trip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants