It turns out this is an open issue:
#4
It worked on the PVC queue using --num-worker 12. Adding a queue-switch-on-timeout option to PVC queue would solve it, but we should do only once it is clear that there is no easier solution.
I am trying to ingest a 29GB file. For --num-workers 3 I get this memory error:
hdfeos5_2json_mbtiles.py miaplpy_201410_202603/network_delaunay_4/S1_asc_149_miaplpy_20141027_XXXXXXXX_S2343W06835_S2338W06810_S2306W06819_S2311W06843_filtDel4DS.he5 miaplpy_2
01410_202603/network_delaunay_4/JSON --num-workers 3
reading displacement data from file: miaplpy_201410_202603/network_delaunay_4/S1_asc_149_miaplpy_20141027_XXXXXXXX_S2343W06835_S2338W06810_S2306W06819_S2311W06843_filtDel4DS.he5 ...
^[[A^[[A^[[A^[[A
reading mask data from file: miaplpy_201410_202603/network_delaunay_4/S1_asc_149_miaplpy_20141027_XXXXXXXX_S2343W06835_S2338W06810_S2306W06819_S2311W06843_filtDel4DS.he5 ...
Masking displacement
Creating shared memory for multiple processes
miaplpy_201410_202603/network_delaunay_4/JSON already exists
columns: 6669
rows: 2599
converted chunk 0
converted chunk 1
converted chunk 2
converted chunk 3
converted chunk 4
converted chunk 5
converted chunk 6
converted chunk 7
converted chunk 8
Process ForkPoolWorker-3:
Traceback (most recent call last):
File "/work2/05861/tg851601/stampede2/code/minsar/tools/miniforge3/envs/minsar/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/work2/05861/tg851601/stampede2/code/minsar/tools/miniforge3/envs/minsar/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work2/05861/tg851601/stampede2/code/minsar/tools/miniforge3/envs/minsar/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/work2/05861/tg851601/stampede2/code/minsar/tools/miniforge3/envs/minsar/lib/python3.10/multiprocessing/queues.py", line 367, in get
return _ForkingPickler.loads(res)
MemoryError
For --num-workers 1 I don't get memory error but it does not finish. It goes through Chunk 90 using 2 hours walltime. It uses 49% of the 198 GB RAM available, it goes up to 63%
c454-024 48 1.5 0.03 95.8 0.0 63.1% 2.46 UNDERUSED
NODES CPU LOAD L/CPU IDLE% WA% MEM% GB/core STATUS
-------------------------------------------------------------------------------
c454-024 48 1.5 0.03 97.6 0.0 49.7% 1.94 UNDERUSED
NODES CPU LOAD L/CPU IDLE% WA% MEM% GB/core STATUS
-------------------------------------------------------------------------------
c454-024 48 1.3 0.03 97.7 0.0 49.7% 1.94 UNDERUSED
I then used the PVC queue (1TB memory) with --num-worker=12. It went up to 22% (220GB) but finished in 30 minutes
Here what Cursor says:
How parallelization works
The .he5 cube is loaded into shared memory; timeseries_datasets holds one 2D array per date (slices of that cube).
The grid (WIDTH × LENGTH) is split into chunks of 20,000 flat indices; each chunk is one task.
multiprocessing.Pool(...).starmap(create_json, worker_args) runs those tasks with --num-workers processes.
After all chunk_*.json files exist, tippecanoe is run once via os.system on *.json (not parallelized in Python).
Why --num-workers 3 can OOM
starmap pickles every task’s arguments to send them to workers.
Each task’s tuple still includes the full timeseries_datasets dict and large lats / lons arrays (same for every chunk), so there are hundreds of near-duplicate large payloads in the queue.
Your MemoryError in ForkPoolWorker-3 at ForkPoolPickler.loads matches unpickling those queued tasks under memory pressure.
Choosing --num-workers
Use the largest value that does not OOM on your node; if 2 works and 3 fails, 2 is the right setting for that dataset.
There is no reliable formula from pixels/dates/file size alone because multiprocessing overhead dominates, not raw array size.
Pixels vs dates
Pixels drive the size of each 2D slice and how many points you iterate.
Dates multiply the depth of the 3D cube and the length of each feature’s d array in JSON.
Both matter for peak RAM; workers mainly add serialization/queue pressure on top.
Tippecanoe
It is often a real bottleneck (single external process, large input).
Running it only in a second job would require not running it in the first step (e.g. a --skip-tippecanoe flag). The current script always runs it; the doc records the exact commands so you can mirror that in a second step if you add a skip path later or use a wrapper.
Code was not modified; only MemoryIssues.md was added.
Does memory go up with more workers?
Usually yes for this script, but not as “N workers ⇒ N × full dataset in RAM.”
Base usage — Shared cube, timeseries_datasets, lat/lon, and task list depend mainly on pixels × dates, not strongly on worker count.
More worker processes — With fork, memory is shared at first (copy-on-write). Each worker that runs create_json touches data and builds Python/JSON, which can duplicate pages per process. More workers ⇒ more of that at once, so RSS can grow.
Main effect here — starmap pickles every chunk’s arguments, and each payload still includes the full timeseries_datasets and large arrays. Several workers unpickling in parallel and queue buffers raise peak memory when you increase --num-workers.
So: higher --num-workers tends to increase peak memory (especially from IPC/unpickling), which matches seeing OOM at 3 but not at 2. It is not a simple linear rule; if you are tight on RAM, fewer workers is safer.
It turns out this is an open issue:
#4
It worked on the PVC queue using --num-worker 12. Adding a queue-switch-on-timeout option to PVC queue would solve it, but we should do only once it is clear that there is no easier solution.
I am trying to ingest a 29GB file. For --num-workers 3 I get this memory error:
For
--num-workers 1I don't get memory error but it does not finish. It goes through Chunk 90 using 2 hours walltime. It uses 49% of the 198 GB RAM available, it goes up to 63%I then used the PVC queue (1TB memory) with --num-worker=12. It went up to 22% (220GB) but finished in 30 minutes
Here what Cursor says:
How parallelization works
The .he5 cube is loaded into shared memory; timeseries_datasets holds one 2D array per date (slices of that cube).
The grid (WIDTH × LENGTH) is split into chunks of 20,000 flat indices; each chunk is one task.
multiprocessing.Pool(...).starmap(create_json, worker_args) runs those tasks with --num-workers processes.
After all chunk_*.json files exist, tippecanoe is run once via os.system on *.json (not parallelized in Python).
Why --num-workers 3 can OOM
starmap pickles every task’s arguments to send them to workers.
Each task’s tuple still includes the full timeseries_datasets dict and large lats / lons arrays (same for every chunk), so there are hundreds of near-duplicate large payloads in the queue.
Your MemoryError in ForkPoolWorker-3 at ForkPoolPickler.loads matches unpickling those queued tasks under memory pressure.
Choosing --num-workers
Use the largest value that does not OOM on your node; if 2 works and 3 fails, 2 is the right setting for that dataset.
There is no reliable formula from pixels/dates/file size alone because multiprocessing overhead dominates, not raw array size.
Pixels vs dates
Pixels drive the size of each 2D slice and how many points you iterate.
Dates multiply the depth of the 3D cube and the length of each feature’s d array in JSON.
Both matter for peak RAM; workers mainly add serialization/queue pressure on top.
Tippecanoe
It is often a real bottleneck (single external process, large input).
Running it only in a second job would require not running it in the first step (e.g. a --skip-tippecanoe flag). The current script always runs it; the doc records the exact commands so you can mirror that in a second step if you add a skip path later or use a wrapper.
Code was not modified; only MemoryIssues.md was added.
Does memory go up with more workers?
Usually yes for this script, but not as “N workers ⇒ N × full dataset in RAM.”
Base usage — Shared cube, timeseries_datasets, lat/lon, and task list depend mainly on pixels × dates, not strongly on worker count.
More worker processes — With fork, memory is shared at first (copy-on-write). Each worker that runs create_json touches data and builds Python/JSON, which can duplicate pages per process. More workers ⇒ more of that at once, so RSS can grow.
Main effect here — starmap pickles every chunk’s arguments, and each payload still includes the full timeseries_datasets and large arrays. Several workers unpickling in parallel and queue buffers raise peak memory when you increase --num-workers.
So: higher --num-workers tends to increase peak memory (especially from IPC/unpickling), which matches seeing OOM at 3 but not at 2. It is not a simple linear rule; if you are tight on RAM, fewer workers is safer.