Skip to content

compute_distance_matrix_zernike3deep - ValueError: All arrays must be of the same length #30

@geoffwoollard

Description

@geoffwoollard

I can run ```compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20on a remote interactive cluster job, but when I submit via sbatch, I getValueError: All arrays must be of the same length`. This is independent of the cryo heterogeneity challenge wrapper. The shapes of the refs and targets is (80, 11239424) = (80, 224**3).

error trace

::::::::::::::
slurm/logs/4713549.err
::::::::::::::
2025-05-06 23:42:09.330840: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different comput
ation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-06 23:42:09.795816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been regist
ered
2025-05-06 23:42:09.796165: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registe
red
2025-05-06 23:42:09.828360: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been reg
istered
2025-05-06 23:42:09.929514: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-06 23:42:15.686854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Projecting volumes:  14%|█▍        | 11/80 [00:06<00:26,  2.38it/s]
Projecting volumes:  28%|██▊       | 22/80 [00:10<00:21,  2.65it/s]
Projecting volumes:  41%|████▏     | 33/80 [00:15<00:19,  2.50it/s]
Projecting volumes:  55%|█████▍    | 43/80 [00:18<00:13,  2.80it/s]
Projecting volumes:  68%|████�█▋   | 53/80 [00:22<00:09,  2.90it/s]
Projecting volumes:  80%|█�█████▉  | 63/80 [00:26<00:06,  2.65it/s]
Projecting volumes:  91%|█████████▏| 73/80 [00:29<00:02,  2.54it/s]
Projecting volumes: 100%|██████████| 80/80 [00:32<00:00,  2.44it/s]
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
    compute_distance_matrix(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 70, in compute_distance_matrix
    md = XmippMetaData(os.path.join(outPath, "projections.mrcs"), angles=angles_all,
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/xmipp_metadata/metadata/xmipp_metadata.py", line 86, in __init__
    self.table = pd.DataFrame.from_dict(COLUMN_DICT)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 1813, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/frame.py", line 733, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

submission script

#!/bin/bash
#SBATCH --job-name=map_to_map_zernike
#SBATCH --output=slurm/logs/%j.out
#SBATCH --error=slurm/logs/%j.err
#SBATCH --partition=gpu
#SBATCH -c 1
#SBATCH --time=99:00:00
#SBATCH --gpus=1
#SBATCH --constraint="a100|h100"

# for N in {1..24}; do sbatch submission_zernike.sh $N ; sleep 2; done

N=${1:-1}  # default to 1 if no argument is given

cd /mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/

for ICE_CREAM in $(ls submission*pt | sed -e 's/submission_//' -e 's/.pt//' | head -n "$N" | tail -n 1);
do   
    cd /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/
    PATH_TO_SUBMISSION_FILE=/mnt/home/smbp/ceph/smbpchallenge/preprocessing_submissions/set2/submission_${ICE_CREAM}.pt
    PATH_TO_OUTPUT=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/map_to_map_zernike_${ICE_CREAM}.pkl
    TMP_DIR_ZERNIKE=/mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir_zernike_${ICE_CREAM}
    sed -e "s|PATH_TO_SUBMISSION_FILE|${PATH_TO_SUBMISSION_FILE}|" -e "s|PATH_TO_OUTPUT|${PATH_TO_OUTPUT}|" -e "s|TMP_DIR_ZERNIKE|${TMP_DIR_ZERNIKE}|" config_files/config_map_to_map_template_zernike.yaml > config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
    run_map_to_map_pipeline --config config_files/config_map_to_map_zernike_${ICE_CREAM}.yaml
   #  wraps the command: compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
done 

Running the command directly, this is the error trace

(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$     compute_distance_matrix_zernike3deep.py --references_file ${TMP_DIR_ZERNIKE}/reference_maps.npy --targets_file ${TMP_DIR_ZERNIKE}/target_maps.npy --out_path ${TMP_DIR_ZERNIKE} --gpu 0 --num_projections 20 --thr 20
2025-05-07 00:02:10.658147: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:10.696375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:10.696403: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:10.697491: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:10.703892: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:11.498334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-05-07 00:02:15.825978: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-07 00:02:15.863871: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-07 00:02:15.863896: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-07 00:02:15.864970: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-07 00:02:15.871257: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-07 00:02:16.633121: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Open3D has not been installed. The program will continue without this package
/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Open3D has not been installed. The program will continue without this package
2025-05-07 00:02:17.855187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38548 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:31:00.0, compute capability: 8.0
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/train_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 226, in main
    train(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/train_zernike3deep.py", line 133, in train
    autoencoder.build(input_shape=[(None, autoencoder.xsize, autoencoder.xsize, 1),
AttributeError: 'AutoEncoder' object has no attribute 'xsize'
Traceback (most recent call last):
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/bin/compute_distance_matrix_zernike3deep.py", line 8, in <module>
    sys.exit(main())
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 167, in main
    compute_distance_matrix(**inputs)
  File "/mnt/ceph/users/gwoollard/repos/Cryo-EM-Heterogeneity-Challenge-1/src/tensorflow-toolkit/tensorflow_toolkit/scripts/compute_distance_matrix_zernike3deep.py", line 102, in compute_distance_matrix
    subprocess.check_call(f'eval "$({conda_base}/bin/conda shell.bash hook)" && '
  File "/mnt/home/gwoollard/software/mambaforge/envs/flexutils-tensorflow/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'eval "$(/mnt/home/gwoollard/software/mambaforge/bin/conda shell.bash hook)" && conda activate flexutils-tensorflow && train_zernike3deep.py --md_file /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd --out_path /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1 --L1 7 --L2 7 --batch_size 1024 --lr 0.001 --epochs 40 --architecture mlpnn --cost corr --regNorm 0.001 --apply_ctf 0 --shuffle --step 1 --split_train 1 --ctf_type apply --sr 1.0 --pose_reg 0.0 --ctf_reg 0.0 --gpu 0' returned non-zero exit status 1.
(flexutils-tensorflow) (base) [gwoollard@workergpu001 tmpdir2_zernike_cherry_1]$ more /mnt/home/smbp/ceph/smbpchallenge/map_to_map/set2/tmpdir2_zernike_cherry_1/proj_metadata.xmd 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions