Problem
Stage3_Interpretation/A_Plotting/src/k_quality_plots.py:140 defines a helper that builds:
'{output_dir}/{run_name}/adata/cNMF_{k}_{sel_thresh}.h5mu'
But the modern PerturbNMF Stage 1 inference (Stage1_Inference/torch-cNMF/Slurm_Version/torch_cnmf_inference_pipeline.py) writes h5mu files to <run>/Inference/adata/cNMF_*.h5mu (note the Inference/ subdirectory).
So programs_dotplots fails with FileNotFoundError: '<run>/adata/cNMF_30_2_0.h5mu' whenever cNMF_k_selection.py is run on output produced by the new layout.
Why this slipped through
Pre-existing Engreitz-lab cNMF results (from the older cNMF_benchmarking tool) put h5mus at <run>/adata/ directly with no Inference/ subdir. The programs_dotplots helper was written for that layout and not updated when PerturbNMF restructured outputs into <run>/Inference/. We seem to be the first external users running the full new pipeline end-to-end.
Reproduction
# Run inference, then K-selection plot. The latter crashes:
python Stage3_Interpretation/A_Plotting/Slurm_Version/cNMF_k_selection.py \
--output_directory <out_dir> \
--run_name <run_name> \
--eval_folder_name <run_dir>/Evaluation \
--stability_file <run_dir>/Inference/Inference.k_selection_stats.df.npz \
--K 30 50 ... \
--sel_threshs 2.0 \
--samples all
Stack trace:
File ".../Stage3_Interpretation/A_Plotting/src/k_quality_plots.py", line 150, in programs_dotplots
mdata = mu.read_h5mu(get_gene_path(output_dir, run_name, k, sel_thresh))
FileNotFoundError: [Errno 2] No such file or directory: '<run>/adata/cNMF_30_2_0.h5mu'
Workarounds in place
We've been creating a symlink <run>/adata -> Inference/adata per-dataset. Works, but is per-dataset and easy to forget.
Proposed fix
Update get_gene_path in k_quality_plots.py to use the modern layout:
- return '{output_dir}/{run_name}/adata/cNMF_{k}_{sel_thresh}.h5mu'.format(
+ return '{output_dir}/{run_name}/Inference/adata/cNMF_{k}_{sel_thresh}.h5mu'.format(
Alternatively, accept an h5mu directory pattern as a CLI arg with the Inference/adata default.
Problem
Stage3_Interpretation/A_Plotting/src/k_quality_plots.py:140defines a helper that builds:'{output_dir}/{run_name}/adata/cNMF_{k}_{sel_thresh}.h5mu'But the modern PerturbNMF Stage 1 inference (
Stage1_Inference/torch-cNMF/Slurm_Version/torch_cnmf_inference_pipeline.py) writes h5mu files to<run>/Inference/adata/cNMF_*.h5mu(note theInference/subdirectory).So
programs_dotplotsfails withFileNotFoundError: '<run>/adata/cNMF_30_2_0.h5mu'whenevercNMF_k_selection.pyis run on output produced by the new layout.Why this slipped through
Pre-existing Engreitz-lab cNMF results (from the older
cNMF_benchmarkingtool) put h5mus at<run>/adata/directly with noInference/subdir. Theprograms_dotplotshelper was written for that layout and not updated when PerturbNMF restructured outputs into<run>/Inference/. We seem to be the first external users running the full new pipeline end-to-end.Reproduction
Stack trace:
Workarounds in place
We've been creating a symlink
<run>/adata -> Inference/adataper-dataset. Works, but is per-dataset and easy to forget.Proposed fix
Update
get_gene_pathink_quality_plots.pyto use the modern layout:Alternatively, accept an h5mu directory pattern as a CLI arg with the
Inference/adatadefault.