Skip to content

programs_dotplots looks for h5mu at <run>/adata/ but Stage 1 inference writes to <run>/Inference/adata/ #6

@adamklie

Description

@adamklie

Problem

Stage3_Interpretation/A_Plotting/src/k_quality_plots.py:140 defines a helper that builds:

'{output_dir}/{run_name}/adata/cNMF_{k}_{sel_thresh}.h5mu'

But the modern PerturbNMF Stage 1 inference (Stage1_Inference/torch-cNMF/Slurm_Version/torch_cnmf_inference_pipeline.py) writes h5mu files to <run>/Inference/adata/cNMF_*.h5mu (note the Inference/ subdirectory).

So programs_dotplots fails with FileNotFoundError: '<run>/adata/cNMF_30_2_0.h5mu' whenever cNMF_k_selection.py is run on output produced by the new layout.

Why this slipped through

Pre-existing Engreitz-lab cNMF results (from the older cNMF_benchmarking tool) put h5mus at <run>/adata/ directly with no Inference/ subdir. The programs_dotplots helper was written for that layout and not updated when PerturbNMF restructured outputs into <run>/Inference/. We seem to be the first external users running the full new pipeline end-to-end.

Reproduction

# Run inference, then K-selection plot. The latter crashes:
python Stage3_Interpretation/A_Plotting/Slurm_Version/cNMF_k_selection.py \
    --output_directory <out_dir> \
    --run_name <run_name> \
    --eval_folder_name <run_dir>/Evaluation \
    --stability_file <run_dir>/Inference/Inference.k_selection_stats.df.npz \
    --K 30 50 ... \
    --sel_threshs 2.0 \
    --samples all

Stack trace:

File ".../Stage3_Interpretation/A_Plotting/src/k_quality_plots.py", line 150, in programs_dotplots
    mdata = mu.read_h5mu(get_gene_path(output_dir, run_name, k, sel_thresh))
FileNotFoundError: [Errno 2] No such file or directory: '<run>/adata/cNMF_30_2_0.h5mu'

Workarounds in place

We've been creating a symlink <run>/adata -> Inference/adata per-dataset. Works, but is per-dataset and easy to forget.

Proposed fix

Update get_gene_path in k_quality_plots.py to use the modern layout:

-        return '{output_dir}/{run_name}/adata/cNMF_{k}_{sel_thresh}.h5mu'.format(
+        return '{output_dir}/{run_name}/Inference/adata/cNMF_{k}_{sel_thresh}.h5mu'.format(

Alternatively, accept an h5mu directory pattern as a CLI arg with the Inference/adata default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions