meta/info.json— features schema, counters, path templatesmeta/stats.json— aggregated normalization statsmeta/tasks.parquet— task index mappingmeta/episodes/chunk-*/file-*.parquet— per-episode metadatadata/chunk-*/file-*.parquet— frame data (snappy, dictionary)videos/{video_key}/chunk-*/file-*.mp4— encoded camera streams
timestamp, frame_index, episode_index, index, task_index
- Data parquet: rotate when uncompressed size estimate exceeds
data_files_size_in_mb(default 100) - Video: rotate when file size exceeds
video_files_size_in_mb(default 200) - Use
UpdateChunkFileIndices(chunk, file, chunks_size)whenfile == chunks_size-1
pq.ParquetWriter(path, schema=table.schema, compression="snappy", use_dictionary=True)Go uses apache/arrow-go/v18 with parquet.WithCompression(Snappy) and WithDictionaryDefault(true).
- Each episode → isolated
staging/ep_XXXXXX/withframes.parquet,videos/*.mp4,episode_meta.json - Merger sorts by
episode_index, assigns globalindex, appends parquet, concat MP4 with DTS-safe ffmpeg