Feature/viewer asset flow by darrell-d · Pull Request #16 · Pennsieve/processor-post-timeseries

darrell-d · 2026-04-30T17:36:57Z

Switched the post-processor into a viewer-asset-aware flow: it now creates a viewer asset in packages-service, links each input package to it, creates channels with the asset id added to the row via pennsieve-api, uploads chunk files directly to S3 using STS credentials returned by packages-service, and registers ranges through timeseries-service's new POST /package/{id}/ranges endpoint

Previous legacy flow still reachable via a LEGACY_IMPORT_FLOW env var for rollback.

Re-running the same workflow is now idempotent (existing active assets short-circuit; failed prior runs are deleted and replaced), failures are recoverable (channels created in a failed run are torn down before the asset itself, so the next run starts clean), and the HTTP clients are safer (explicit timeouts on every request, retries on 5xx and connection-level errors, fail-fast on 4xx and on any malformed input file). Asset metadata is configurable per converter via ASSET_NAME

Also lots of unit tests. Thanks Claude

- new clients (PackagesAssetsClient and TimeSeriesRangesClient) - Create and add viewer asset ids to channels table - Attempts to fail and retry gracefully

Covers the new clients and end-to-end orchestration: - PackagesAssetsClient: create/list/update/delete plus retry behavior (5xx-with-recovery, no-retry-on-4xx). - TimeSeriesRangesClient: create + batched submission, oversize batch rejection, abort-on-first-failure, snake_case body shape. - AssetUploader: STS-bound boto3 client construction, parallel uploads with prefixed keys, ClientError propagation, empty-input no-op. - Integration tests for import_timeseries_via_assets: happy path, idempotent skip on active asset, stale-asset delete-and-recreate, upload failure triggers asset DELETE for cleanup, empty directory short-circuit. 225 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Endpoint can take 10x of what we were sending

Keys were committed to dev.env. That key was found and removed so it's no longer viable. Still embarassingly in history, but useless to anyone who tries it. using dev.env.example that can remain commited and git ignoring dev.env going forward

_find_or_create_asset was not looking at chlldren for assets, only the parent which would always come back blank.

Covers the bug fix in 6f49f5f: when a workflow has multiple packages, re-running an ingest finds the existing asset by iterating workflow.package_ids rather than looking up by the parent collection (which is never linked to the asset). The test sets up 3 children + 1 parent, registers an active asset linked only to the children, and asserts: - lookup-by-first-child returns empty - lookup-by-second-child returns the asset and short-circuits - third child is never queried (iteration stops on first hit) - no upload, no POST /assets — purely idempotent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@backoff

Items 4-8 from review feedback, plus the fail-fast changes for unparseable filenames. Failure-recovery fixes: - _create_or_resolve_channels now returns the list of channels it *created* (not reused). On exception, the orchestrator deletes those channels first, then the asset. Without this, partial failure left channels with viewer_asset_id pointing at a deleted asset, breaking the next re-run on the mismatch check. - New delete_channel method on TimeSeriesClient for that cleanup path. Failures are logged but do not block asset deletion — best-effort. - update_asset(status='active') failures no longer swallowed. They propagate, trigger cleanup, and let the next run start clean. Swallowing left assets stuck in 'created', misclassified as stale on re-run, and triggered delete-and-recreate that compounded the channel-mismatch problem. Fail-fast on malformed input: - Unparseable channel metadata filename → RuntimeError (was warning + skip). - Unparseable chunk filename → RuntimeError. - Chunk pointing at unknown channel index → RuntimeError. - Empty channels_by_index or renamed_data_files after the helpers → RuntimeError. Defensive belt-and-suspenders to ensure the asset never reaches 'active' with zero data. HTTP client robustness: - @backoff decorators now catch requests.RequestException (was just HTTPError). Covers timeouts, connection resets, DNS failures — they retry instead of failing immediately. - _is_client_error gives up only on 4xx responses; everything else retries. - DEFAULT_TIMEOUT=(5,30) constant in base_client; threaded into every request in packages_assets_client and timeseries_ranges_client. No more indefinite hangs on stalled services. - Dropped unused asdict import. Tests: - TestFailureCleanup split into two: channel-cleanup-on-failure asserts both DELETEs (channel then asset) fire in order; reused-channels-not-deleted asserts an existing channel reused by this run is left alone on cleanup. - TestFailFastOnMalformedFiles covers the three RuntimeError paths (bad metadata filename, bad chunk filename, orphan chunk). 230 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Also cleaned up some comments for readability and truth

rohanshah

@darrell-d this is interesting because it would be cleaner to just use the viewer asset data target but we need the viewer asset ID to create the channels...

not sure if you've discussed with @jwagenaar but I wonder if he has any thoughts around that

darrell-d and others added 12 commits April 29, 2026 17:37

Update flow to user viewer-asset

c4c2a70

- new clients (PackagesAssetsClient and TimeSeriesRangesClient) - Create and add viewer asset ids to channels table - Attempts to fail and retry gracefully

Increase max chunks

b62b09f

Endpoint can take 10x of what we were sending

scrub dev.env

b8f46af

Keys were committed to dev.env. That key was found and removed so it's no longer viable. Still embarassingly in history, but useless to anyone who tries it. using dev.env.example that can remain commited and git ignoring dev.env going forward

Bug fix for truly idempotent re-runs

6f49f5f

_find_or_create_asset was not looking at chlldren for assets, only the parent which would always come back blank.

Ensure channel names match

17b2b24

ruff formatting fixes

6c02c15

Comment clean up

0c5c335

follow convention and use ready instead of active

8309bc3

Also cleaned up some comments for readability and truth

update tests to reflect status naming pattern

c98c612

darrell-d requested a review from rohanshah April 30, 2026 20:22

rohanshah reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/viewer asset flow#16

Feature/viewer asset flow#16
darrell-d wants to merge 12 commits intomainfrom
feature/viewer-asset-flow

darrell-d commented Apr 30, 2026 •

edited

Loading

Uh oh!

rohanshah left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

darrell-d commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohanshah left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

darrell-d commented Apr 30, 2026 •

edited

Loading