Skip to content

Child process with PID terminated with code 2 #38

Description

@leotizzei
having this issue when using terratorch iterate with strategy: ddp or strategy: fsdp. While this works on regular terratorch, it doesn’t with iterate. The problem seems to be that the --hpo flag is also passed down to the child processes, which don’t recognize it
[I 2025-08-28 12:49:37,247] Using an existing study with name 'so2sat' instead of creating a new one.
Seed set to 42
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
usage: terratorch [-h] [-c CONFIG] [--print_config[=flags]]
         {fit,validate,test,predict,compute_statistics} ...
error: Unrecognized arguments: --hpo
[rank: 1] Child process with PID 1253489 terminated with code 2. Forcefully terminating all other processes to avoid zombies :zombie:
/u/pedrohc/.lsbatch/1756401057.1774575: line 8: 1248255 Killed         terratorch iterate --hpo --config /u/pedrohc/terratorch-iterate/configs/geobench_v1_prithvi.yaml
defaults:
  trainer_args:
    precision: bf16-mixed
    max_epochs: 1
    strategy: ddp
    devices: -1
    num_nodes: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions