Skip to content

SEG-52: add v2 async helpers (submit_async, run_async, AsyncJob)#1

Merged
shrey-rajvanshi merged 4 commits into
mainfrom
shrey/seg-52-v2-async
Jun 14, 2026
Merged

SEG-52: add v2 async helpers (submit_async, run_async, AsyncJob)#1
shrey-rajvanshi merged 4 commits into
mainfrom
shrey/seg-52-v2-async

Conversation

@shrey-rajvanshi

Copy link
Copy Markdown
Contributor

Summary

Adds v2 async support to the Python SDK — submit-and-poll for any heimdall `/v2/{slug}` model — so callers can stop wrapping their own request/poll loops around `requests.post`.

```python
import segmind

One-shot: submit + poll until COMPLETED

result = segmind.run_async("seedance-1-pro", prompt="A sunset", timeout=300)

Or split for finer control (parallelism, request_id tracking)

job = segmind.submit_async("seedance-1-pro", prompt="A sunset")
print(job.request_id)
result = job.wait(timeout=300)
```

Defaults: `interval=1.0s`, `timeout=600s`. Override per call for very slow models (Veo, Seedance video) or use webhooks (SEG-93) instead.

What's in

  • New module `segmind/v2.py` — `AsyncJob`, `InferenceFailed`, `InferenceTimeout`, internal `submit()` / `run()` / `_v2_base()`.
  • `SegmindClient.submit_async()` and `.run_async()` methods on the existing client.
  • Module-level `segmind.submit_async()` / `segmind.run_async()` resolving through the lazy default client.
  • 11 respx-mocked unit tests covering submit, wait, polling progression, FAILED, TIMEOUT, one-shot, staging URL derivation, and module-level exports.

What's deliberately NOT in

  • Polling-hints consumption — SEG-243 was cancelled; this is a plain client. Caller tunes `timeout`/`interval` per call.
  • Async/await variant (`httpx.AsyncClient`) — separate ticket if/when wanted; the SDK is otherwise sync.
  • Webhook helpers — SEG-93 is the heimdall side; SDK webhook surface is a different concern.

Design notes

  • Exception names skip the `Error` suffix (`InferenceFailed`/`InferenceTimeout`). Reads naturally in caller code; per-file `# ruff: noqa: N818`.
  • `_v2_base()` derives the v2 prefix from `client.base_url` — callers using `api-latest.segmind.com/v1` for staging automatically get the matching `/v2` host with no extra config.
  • Loud-fail on a 2xx submit missing `request_id` / `status_url` / `response_url` instead of polling forever on a missing URL. Server contract is to always return all three.
  • `FAILED` path fetches the full result body (not just the status one) so the exception carries metrics + request_id alongside the error string.

Smoke

  • 11 / 11 unit tests pass via `pytest tests/test_v2.py`.
  • `ruff check` clean.
  • `black --check` clean.
  • Live one-shot vs `api-latest.segmind.com/v2/mock-inference` (sleep=1, credits=1e-6) returned `COMPLETED` with `inference_time=1.013s`.
  • Live timeout path (`sleep=5, timeout=1`) raised `InferenceTimeout` cleanly with the expected request_id in the message.

Tickets

Parent: SEG-52 "SDK async" (Phase 1 — Async core).
Cancelled sibling: SEG-243 (server-side polling hints — decided we don't need them; client picks defaults).
Related: SEG-93 (webhooks, for slow models).

Adds support for the v2 submit-then-poll inference path:

  client.submit_async(slug, **params) -> AsyncJob
  client.run_async(slug, **params, timeout=600, interval=1.0) -> dict
  AsyncJob.wait(timeout, interval) -> dict
  AsyncJob.status() / AsyncJob.result()

Plus module-level shortcuts segmind.submit_async / segmind.run_async
that resolve through the lazy default SegmindClient.

Design choices:

  * 1.0s poll interval, 600s timeout defaults. No consumption of
    server-side polling hints (SEG-243 was cancelled in favour of a
    plain client). Callers tune timeout/interval per call for slow
    models, or use webhooks (SEG-93).
  * Two new exceptions, InferenceFailed + InferenceTimeout, both
    subclassing the existing SegmindError so callers can broad-catch.
    Names deliberately omit the 'Error' suffix for natural reading
    (per-file ruff noqa).
  * _v2_base() derives the v2 prefix from client.base_url so callers
    who override for staging (api-latest.segmind.com/v1) keep working
    without a separate v2 base_url.
  * If a 2xx submit response lacks request_id/status_url/response_url
    we raise SegmindError immediately rather than polling forever on a
    missing URL.

Tests (11/11 pass, respx-mocked, no network):
  * submit returns AsyncJob with the right URLs
  * submit propagates 4xx via raise_for_status
  * submit raises on missing request_id in a 2xx body
  * wait returns result on COMPLETED
  * wait polls through QUEUED -> PROCESSING -> COMPLETED
  * wait raises InferenceFailed on FAILED with server error string
  * wait raises InferenceTimeout when deadline elapses
  * run_async one-shot end-to-end
  * v2 URL derives correctly from a staging base_url override
  * module-level run_async uses the lazy default client

Live smoke against api-latest.segmind.com /v2/mock-inference:
  * run_async(sleep=1) -> status=COMPLETED, inference_time=1.013s
  * run_async(sleep=5, timeout=1) -> InferenceTimeout raised cleanly

Linear: SEG-52 (parent, Phase 1 - Async core).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces v2 async inference capabilities to the Segmind Python SDK, adding submit_async and run_async methods alongside an AsyncJob class for polling and retrieving results. The feedback focuses on optimizing the polling loop in AsyncJob.wait to avoid sleeping past the timeout deadline, removing the unused _TERMINAL_STATES constant, and exposing SegmindError at the package level to simplify exception handling and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread segmind/v2.py Outdated
Comment on lines +123 to +144
deadline = time.monotonic() + timeout
while True:
status_body = self.status()
state = status_body.get("status")

if state == "COMPLETED":
return self.result()

if state == "FAILED":
# /status carries the error for FAILED; pull the full body
# so the exception caller has metrics + request_id alongside.
final = self.result()
err = final.get("error") or status_body.get("error")
raise InferenceFailed(detail=err, response_body=final)

if time.monotonic() >= deadline:
raise InferenceTimeout(
request_id=self.request_id,
elapsed_s=timeout,
)

time.sleep(interval)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current polling loop can sleep past the deadline and make an unnecessary HTTP request after the timeout has already expired. Additionally, if the timeout is reached during the sleep, the loop will still perform another status check before raising InferenceTimeout. We can optimize this by checking the deadline at the start of the loop, checking if the remaining time is exceeded before sleeping, and capping the sleep interval to the remaining time.

        deadline = time.monotonic() + timeout
        while True:
            if time.monotonic() >= deadline:
                raise InferenceTimeout(
                    request_id=self.request_id,
                    elapsed_s=timeout,
                )

            status_body = self.status()
            state = status_body.get("status")

            if state == "COMPLETED":
                return self.result()

            if state == "FAILED":
                # /status carries the error for FAILED; pull the full body
                # so the exception caller has metrics + request_id alongside.
                final = self.result()
                err = final.get("error") or status_body.get("error")
                raise InferenceFailed(detail=err, response_body=final)

            remaining = deadline - time.monotonic()
            if remaining <= 0:
                raise InferenceTimeout(
                    request_id=self.request_id,
                    elapsed_s=timeout,
                )
            time.sleep(min(interval, remaining))

Comment thread segmind/v2.py Outdated
Comment on lines +36 to +38
# Status strings reported by the v2 status endpoint. Anything outside
# this set is treated as in-progress (forward-compat with future states).
_TERMINAL_STATES = ("COMPLETED", "FAILED")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constant _TERMINAL_STATES is defined but never used anywhere in the module. It should be removed to keep the codebase clean and maintainable.

Comment thread segmind/__init__.py
Comment on lines +33 to +39
from segmind.v2 import (
DEFAULT_POLL_INTERVAL_S,
DEFAULT_POLL_TIMEOUT_S,
AsyncJob,
InferenceFailed,
InferenceTimeout,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Exposing SegmindError at the package level (segmind.SegmindError) is highly recommended so that users of the SDK can easily import and catch the base exception class without needing to import from internal modules. This also avoids workarounds in the test suite.

Suggested change
from segmind.v2 import (
DEFAULT_POLL_INTERVAL_S,
DEFAULT_POLL_TIMEOUT_S,
AsyncJob,
InferenceFailed,
InferenceTimeout,
)
from segmind.exceptions import SegmindError
from segmind.v2 import (
DEFAULT_POLL_INTERVAL_S,
DEFAULT_POLL_TIMEOUT_S,
AsyncJob,
InferenceFailed,
InferenceTimeout,
)

Comment thread segmind/__init__.py
Comment on lines +190 to +194
"DEFAULT_POLL_INTERVAL_S",
"DEFAULT_POLL_TIMEOUT_S",
"AsyncJob",
"InferenceFailed",
"InferenceTimeout",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add SegmindError to all to explicitly export it as part of the public API.

Suggested change
"DEFAULT_POLL_INTERVAL_S",
"DEFAULT_POLL_TIMEOUT_S",
"AsyncJob",
"InferenceFailed",
"InferenceTimeout",
"DEFAULT_POLL_INTERVAL_S",
"DEFAULT_POLL_TIMEOUT_S",
"AsyncJob",
"InferenceFailed",
"InferenceTimeout",
"SegmindError",

Comment thread tests/test_v2.py
return_value=httpx.Response(401, json={"error": "Invalid API key"})
)

with pytest.raises(segmind.SegmindError if hasattr(segmind, "SegmindError") else Exception):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With SegmindError exposed at the package level, we can simplify this assertion and remove the conditional hasattr check.

Suggested change
with pytest.raises(segmind.SegmindError if hasattr(segmind, "SegmindError") else Exception):
with pytest.raises(segmind.SegmindError):

Self-review pass before the tester session reports back.

Simplicity:
  * Remove _TERMINAL_STATES constant — defined but never referenced.
  * Drop the SEG-243 self-reference from the module docstring;
    replace with product-facing 'use larger timeout/interval for
    slow models, or webhooks for fire-and-forget'.

Optimization:
  * FAILED path no longer makes a second HTTP round-trip to the
    response URL. heimdall's /v2/requests/{id}/status already carries
    the error string on FAILED (SEG-97), so we can build the
    InferenceFailed exception from the status body alone.
  * Rename InferenceFailed.response_body -> .status_body to reflect
    that it now holds the status payload, not the full result.
    Callers who want server metrics on failure can still call
    AsyncJob.result() themselves after catching.

Test:
  * test_wait_raises_inference_failed_on_failed asserts via
    result_route.called == False that the optimization holds —
    any regression that re-introduces the extra GET fails this test.

All 11 tests still pass; ruff + black clean.
@shrey-rajvanshi

Copy link
Copy Markdown
Contributor Author

Self-review pass — pushed 2c34e83

Two cleanups before the tester session reports back. Both are pure simplifications — no public-API surface changes other than one field rename on the failure exception.

1. Removed unused _TERMINAL_STATES

The constant was defined and never referenced; `wait()` checks `state == "COMPLETED"` / `state == "FAILED"` directly. Deleted.

2. FAILED path no longer does an extra HTTP round-trip

Before:
```python
if state == "FAILED":
final = self.result() # extra GET /v2/requests/{id}
err = final.get("error") or status_body.get("error")
raise InferenceFailed(detail=err, response_body=final)
```

After:
```python
if state == "FAILED":
raise InferenceFailed(
detail=status_body.get("error"),
status_body=status_body,
)
```

`/v2/requests/{id}/status` already carries the error string on FAILED (heimdall SEG-97), so the second GET was redundant. Saves one round-trip per failure path.

Trade-off: the exception now carries the status payload (not the full result body). Callers who want server-side metrics on failure can still call `AsyncJob.result()` themselves after catching. The simpler default felt right for a no-over-engineering pass.

Field rename: `InferenceFailed.response_body` → `InferenceFailed.status_body`. The new test enforces this with `assert result_route.called is False` so any regression that re-introduces the extra GET fails the suite.

3. Module docstring polish

Dropped the `SEG-243 cancelled` reference — internal noise for SDK readers. Replaced with product-facing guidance.

Verified

  • 11 / 11 tests pass.
  • `ruff check` + `black --check` clean.
  • Live sanity against api-latest: a bogus slug surfaces as `SegmindError(404, "Model information not found")` at submit time (caught by `raise_for_status`), confirming the FAILED-state path is reachable separately for a queued-then-failed task.

What I considered but did NOT change

  • `submit_response` field on `AsyncJob` — keeps forward-compat for new server response keys; cheap to carry.
  • `_v2_base()` URL derivation — robust already; cosmetic micro-rewrite not worth the loss of clarity.
  • First-poll latency — the loop runs `status()` immediately (no initial sleep), so fast models pay only one poll-interval if the result didn't land within first round-trip.
  • `result()` caching — YAGNI; users rarely call it twice.

Shrey Kant Rajvanshi added 2 commits June 13, 2026 15:26
…ED + nits

Tester-session findings on SEG-52 (scenarios 5 and #4 nit). Reproduced
end-to-end against api-latest.

Headline bug — InferenceFailed was unreachable for worker-side failures.
  * heimdall returns HTTP 422 on /v2/requests/{id}/status when the task
    is in terminal FAILED state, while still carrying the
    {status: 'FAILED', error: '...'} body. The SDK's
    _request -> raise_for_status raised SegmindError(422) before
    wait() could inspect the body, so the FAILED branch never fired.
  * Fix: add AsyncJob._fetch_terminal_tolerant(url) which uses the
    underlying httpx client directly (no raise_for_status). If the body
    announces status COMPLETED or FAILED, return it as a valid payload
    regardless of HTTP code; otherwise fall through to the existing
    raise_for_status so genuine 401/404/5xx still surface as
    SegmindError. status() and result() both route through the helper.
  * _TERMINAL_STATES constant re-added (now used by the helper).
  * Two new tests:
      - 4xx-with-FAILED-body -> InferenceFailed
      - genuine 401 with no terminal body -> SegmindError(401),
        NOT a wrapped InferenceFailed
  * Live verified end-to-end:
      - sleep=99999 -> InferenceFailed('Validation error...sleep must be between 0 and 900.0 seconds')
      - bad slug (submit-time)  -> SegmindError(404, 'Model information not found')
      - happy path unchanged    -> COMPLETED, inference_time=1.013s

Nits from the tester comment:
  * InferenceTimeout.elapsed_s was stamped as the *configured* timeout
    arg, not real wall time. Now computed via
    time.monotonic() - start (live: timeout=2.0 -> elapsed_s=2.264,
    including last poll's sleep).
  * wait() docstring now notes that the result dict shape is
    model-dependent (status/output/metrics are reliable; the rest is
    model-specific).

Tests: 13/13 pass. ruff + black clean.

Tester verdict: 'requires one change before ship' — this commit
addresses that change. Host URL leak (staging returns prod URLs) is
heimdall-side, not SDK; will note as a separate follow-up.
The Documentation workflow auto-failed on this PR because
actions/upload-pages-artifact@v2 transitively pulls in
actions/upload-artifact@v3, which GitHub auto-fails as of 2024-04-16.
The if: refs/heads/main gates don't help — the deprecation check runs
at workflow parse time, before any step runs.

Bump to the current major versions, all of which use
actions/upload-artifact@v4 internally:
  actions/setup-python         @v4 -> @v5
  actions/configure-pages      @V3 -> @v5
  actions/upload-pages-artifact@v2 -> @V3
  actions/deploy-pages         @v2 -> @v4

No behaviour change in the steps themselves; the docs build job still
runs on PR (build smoke) and the deploy steps still gate on
github.ref == 'refs/heads/main'.
@shrey-rajvanshi shrey-rajvanshi merged commit 104f176 into main Jun 14, 2026
6 checks passed
shrey-rajvanshi pushed a commit that referenced this pull request Jun 14, 2026
Back-merged origin/main (PR #1 v2-async + the docs.yml deprecated-actions
CI fix) into this branch so build-and-deploy passes — it was red only
because the branch predated the Pages-actions bump now on main.

Bump __version__ 1.0.0 -> 1.1.0: 1.0.0 is already on PyPI, and main
gained the v2 async feature (submit_async / run_async / AsyncJob) since
1.0.0 — a minor bump. This branch also carries the SEG-319
X-Initiator: SDK-PY change, so 1.1.0 ships both.

Full suite green after the merge.
shrey-rajvanshi added a commit that referenced this pull request Jun 14, 2026
* feat(client): send X-Initiator: SDK-PY so SDK traffic is attributable (SEG-319)

The SDK sent X-Initiator: segmind-python-sdk/0.1.0, which spot-backend's
SQS worker rejects (not in InitiatorType) and coerces to OTHERS — so SDK
calls are indistinguishable from raw requests/curl in the DB.

Send the stable token X-Initiator: SDK-PY instead. Heimdall passes it
through verbatim on the sync path (-> SDK-PY) and suffixes -V2 on the
v2-async path (-> SDK-PY-V2). Both are added to InitiatorType in the
paired spot-backend PR.

Version detail stays in the User-Agent header (segmind-python-sdk/0.1.0),
which heimdall logs — so we don't lose version telemetry.

Updated test_http_client_headers assertions accordingly. Full suite:
256 passed, 7 skipped.

* chore(release): back-merge main + bump version to 1.1.0

Back-merged origin/main (PR #1 v2-async + the docs.yml deprecated-actions
CI fix) into this branch so build-and-deploy passes — it was red only
because the branch predated the Pages-actions bump now on main.

Bump __version__ 1.0.0 -> 1.1.0: 1.0.0 is already on PyPI, and main
gained the v2 async feature (submit_async / run_async / AsyncJob) since
1.0.0 — a minor bump. This branch also carries the SEG-319
X-Initiator: SDK-PY change, so 1.1.0 ships both.

Full suite green after the merge.

* chore(release): use patch bump 1.0.1 (not 1.1.0)

---------

Co-authored-by: Shrey Kant Rajvanshi <shrey@segmind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant