Skip to content

compute_fake_perturbation_tests crashes with AttributeError: 'Namespace' object has no attribute 'reference_targets' #2

@adamklie

Description

@adamklie

compute_fake_perturbation_tests crashes with AttributeError: 'Namespace' object has no attribute 'reference_targets'

Summary

Running the U-test calibration script with --compute_fake_perturbation_tests causes the script to crash at line 160 because it references args.reference_targets, which is never defined on the argparse namespace.

Reproduction

python src/Stage2_Evaluation/B_Calibration/Slurm_version/U-test_perturbation_calibration/U-test_perturbation_calibration.py \
    --out_dir <out> \
    --run_name <run> \
    --mdata_guide_path <h5mu> \
    --guide_annotation_path <tsv> \
    --guide_annotation_key "non-targeting" \
    ... \
    --components 30 50 60 80 100 200 250 300 \
    --sel_thresh 2.0 \
    --compute_fake_perturbation_tests

The fake-test loop runs the first iteration successfully (loads the h5mu, picks 6 fake-targeting guides, subsets to NT guides), then crashes when it tries to call compute_perturbation_association:

Processing K=30, sel_thresh=2.0
  Running iteration 1/50
  Found 600 valid non-targeting guides out of 14151 total
Traceback (most recent call last):
  File ".../U-test_perturbation_calibration.py", line 385, in main
    test_stats_fake_df = compute_fake_perturbation_tests()
  File ".../U-test_perturbation_calibration.py", line 160, in compute_fake_perturbation_tests
    reference_targets=args.reference_targets,
AttributeError: 'Namespace' object has no attribute 'reference_targets'. Did you mean: 'reference_gtf_path'?

Root cause

src/Stage2_Evaluation/B_Calibration/Slurm_version/U-test_perturbation_calibration/U-test_perturbation_calibration.py:160:

test_stats_df = compute_perturbation_association(
    mdata_samp,
    prog_key=args.prog_key,
    collapse_targets=True,
    pseudobulk=False,
    reference_targets=args.reference_targets,  # <-- args.reference_targets is undefined
    FDR_method=args.FDR_method,
    n_jobs=-1,
    inplace=False
)

The argparser defines --guide_annotation_key (default ['non-targeting']) but never --reference_targets. The real-test path at lines 44–49 uses a local variable reference_targets, which is computed from either the annotation TSV or args.guide_annotation_key. The fake-test path was likely written by copy-paste but reference_targets was left referencing the (non-existent) namespace attribute.

git blame shows this line has been present since the initial commit (5ca68dc, 2026-04-23), so it predates the "Fix 7 bugs in U-test_perturbation_calibration.py" commit (a37a60b, 2026-05-01). It's an 8th typo in the same file that the fix-7 commit didn't catch.

Why this wasn't caught earlier

This bug has existed since day 1 of the repo and only surfaces when running the new PerturbNMF script end-to-end on a real dataset with --compute_fake_perturbation_tests.

Pre-existing fake-test outputs from the Engreitz lab (e.g., 5_fake_perturbation_association_results.txt-style files) appear to come from the older internal cNMF_benchmarking tool (referenced in commented-out paths inside cNMF_evaluation_pipeline.py, e.g. /oak/.../cNMF_benchmarking/cNMF_benchmarking_pipeline/Evaluation/...). PerturbNMF is a publishable rewrite of that internal tool, and the rewrite introduced typo-class bugs (this one + the 7 fixed in a37a60b) that wouldn't have been triggered by running the original tool. We appear to be the first external users to drive the new PerturbNMF U-test code end-to-end on a fresh dataset, which is why this and adjacent issues are surfacing now rather than being caught during the rewrite.

Proposed fix

Mirror the fallback logic from the real-test path (lines 44–49). If a TSV is provided, derive the reference-target list from it; otherwise fall back to args.guide_annotation_key.

@@ -157,7 +157,7 @@ def compute_fake_perturbation_tests():
                         prog_key=args.prog_key,
                         collapse_targets=True,
                         pseudobulk=False,
-                        reference_targets=args.reference_targets,
+                        reference_targets=args.guide_annotation_key,
                         FDR_method=args.FDR_method,
                         n_jobs=-1,
                         inplace=False

In the fake-test code path, the relabeled NT subset has target ∈ {'non-targeting', 'targeting'} (see line 149), so reference_targets=['non-targeting'] is the right choice. Using args.guide_annotation_key matches the real-test fallback and respects user override.

Latent issues in the same function (out of scope for this PR)

While debugging, two adjacent issues surfaced. Mentioning here so they can be tracked separately:

  1. Line 391 in main(): when --visualizations is set without --compute_real_perturbation_tests, test_stats_real_df is undefined and pd.concat([test_stats_real_df, test_stats_fake_df], ...) raises NameError. Either guard the concat on args.compute_real_perturbation_tests, or auto-load via load_real_perturbation_tests() if real-test results already exist on disk.

  2. Line 199 in load_real_perturbation_tests(): hardcoded for samp in ['D0', 'D4', 'D7']. This function is currently unreachable from main() (so harmless today), but if (1) is fixed by calling load_real_perturbation_tests(), this hardcode would break for any non-D0/D4/D7 dataset.

Environment

  • PerturbNMF main @ 8f7c9dd (also reproduces on 4bf662a)
  • Python 3.10, pandas, scipy, multipy, etc.
  • Running on Carter HPC (UCSD) — first end-to-end run of the new pipeline on Huangfu HUES8 datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions