Skip to content

fix(platform-import): adapt to new pyro-api sequence/detection schema#121

Merged
MateoLostanlen merged 1 commit intomainfrom
fix/platform-import-schema-drift
May 1, 2026
Merged

fix(platform-import): adapt to new pyro-api sequence/detection schema#121
MateoLostanlen merged 1 commit intomainfrom
fix/platform-import-schema-drift

Conversation

@MateoLostanlen
Copy link
Copy Markdown
Member

Summary

The platform API restructured its sequence/detection fields, which silently broke make import-platform — every sequence ended up with KeyError on the new field names, swallowed by a per-sequence try/except, leaving 0 records from N sequences with no diagnostic. This PR adapts the import script to the current schema and tightens a few related correctness issues that codex review surfaced.

Schema changes addressed

  • Sequence: camera_azimuth (= pose.azimuth at creation), sequence_azimuth (smoke cone direction, ignored), pose_id, cone_angle. Old single azimuth gone.
  • Detection: bbox (string "[(x,y,x,y,c), ...]") + others_bboxes (siblings of the same image). Old bboxes and azimuth gone.

What's in the patch

  • utils.py:to_record — rewritten for the new schema. Internal record key sequence_azimuthcamera_azimuth (the value is the camera/pose direction; the platform's sequence_azimuth is the smoke direction we don't want). Bbox parsing uses ast.literal_eval with defensive handling for flat singular tuples and malformed values; merges bbox + others_bboxes so each image carries all detected boxes.
  • sequence_fetching.py:process_single_sequence_detections — dropped the bare try/except that returned [] and hid the schema drift. Detections are deduped by bucket_key (the platform emits one row per bbox, so without dedupe a multi-box image would be imported N times with N-box predictions each). Fetches min(limit + 10, 100) so dedupe doesn't under-fill, and preserves 0 = no limit.
  • client.py:list_cameras — passes include_non_trustable=true. Without this, sequences from non-trustable cameras of the same org couldn't be matched against the camera index (your sdis-40 has 5 cameras total, not 4).
  • shared.py:transform_sequence_data — rounds float camera_azimuth to int and wraps with % 360 (annotation API stores Optional[int]; round(359.6) = 360 is out of range).
  • import.py / batch_import_local_yolo.py — clone path uses camera_azimuth; CSV consumer falls back to legacy keys for older exports.
  • .gitignore — ignore local .claude/ worktree state.

Codex review

Multiple rounds. Net result: codex went from blocker findings → P2 → P2 false-positive. Findings addressed in commits: defensive bbox parser, bucket-key dedupe with fetch buffer, azimuth modulo, missed clone-path key rename, detections_limit <= 0 short-circuit. Final P2 (claim that platform doesn't return camera_azimuth) is incorrect — verified directly against the API response.

Test plan

  • make import-platform DATE_FROM=2026-04-15 DATE_END=2026-04-15 --max-sequences 2 (smoke) — 1 sequence + 30 detections + 1 annotation, end-to-end clean
  • make import-platform DATE_FROM=2026-04-15 DATE_END=2026-04-17 MAX_SEQUENCES=0 (3-day on clean DB) — 48/48 sequences, 978/978 detections, 48/48 annotations
  • Re-run on populated DB — duplicates skipped cleanly, no errors
  • Annotation API state verified: sequences == annotations in ready_to_annotate (no orphans)
  • Pull pipeline (make pull-sequences) end-to-end — pending local stack bring-up (postgres volume mismatch unrelated to this PR)

🤖 Generated with Claude Code

The platform API restructured its sequence/detection fields, breaking the
import pipeline (silent KeyErrors masked by a per-sequence try/except,
ending with "0 records from N sequences"). Updates the script to the
current schema and tightens correctness around bbox parsing and azimuth
handling.

Schema changes addressed:
- Sequence: `camera_azimuth` (= pose.azimuth at sequence creation),
  `sequence_azimuth` (smoke cone direction, ignored), `pose_id`,
  `cone_angle`. The old single `azimuth` is gone.
- Detection: `bbox` (string `"[(x,y,x,y,c), ...]"`) plus `others_bboxes`
  (siblings of the same image). The old `bboxes` and `azimuth` are gone.

Changes:
- utils.py: rewrite `to_record` for the new fields. The internal record
  key is renamed `sequence_azimuth` → `camera_azimuth` to reflect what it
  actually is (camera/pose direction, not smoke direction). Bbox parsing
  is hardened (`ast.literal_eval`, defensive against flat singular tuples
  and malformed values), and merges `bbox + others_bboxes` so each image
  carries all detected boxes.
- sequence_fetching.py: stop swallowing per-sequence exceptions. Dedupe
  detections by `bucket_key` (the platform emits one row per bbox even
  for shared images, which would otherwise duplicate predictions). Fetch
  a small buffer above `--detections-limit` so the dedupe doesn't
  under-fill, and preserve the "0 = no limit" convention.
- client.py: `list_cameras` now passes `include_non_trustable=true` so
  the camera index covers every camera that can show up on a sequence,
  not just the trustable subset.
- shared.py: round float `camera_azimuth` to int and wrap with `% 360`
  (annotation API stores `Optional[int]`; rounding `359.6` would land on
  an out-of-range 360).
- import.py / batch_import_local_yolo.py: rename to `camera_azimuth` in
  the clone path; CSV consumer falls back to legacy keys.
- .gitignore: ignore local `.claude/` worktree state.
@MateoLostanlen MateoLostanlen merged commit 0f73ee1 into main May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant