Skip to content

cNMF_perturbed_gene_analysis.py crashes when mdata['rna'].X contains raw counts #9

@adamklie

Description

@adamklie

Stage3_Interpretation/A_Plotting/Slurm_Version/cNMF_perturbed_gene_analysis.py:82–91:

if 'X_umap' not in mdata[args.prog_key].obsm:
    import scanpy as sc
    adata_tmp = mdata[args.data_key].copy()
    sc.pp.highly_variable_genes(adata_tmp, n_top_genes=2000, subset=True)
    sc.tl.pca(adata_tmp, n_comps=50)
    sc.pp.neighbors(adata_tmp)
    sc.tl.umap(adata_tmp)
    mdata[args.prog_key].obsm['X_pca'] = adata_tmp.obsm['X_pca']
    mdata[args.prog_key].obsm['X_umap'] = adata_tmp.obsm['X_umap']
    del adata_tmp

When mdata[args.data_key].X contains raw counts (which is what falls out of the standard PerturbNMF Stage 1 inference), sc.pp.highly_variable_genes with the default flavor='seurat' raises:

ValueError: cannot specify integer `bins` when input data contains infinity
  File ".../scanpy/preprocessing/_highly_variable_genes.py", line 343, in _highly_variable_genes_single_batch
    df["mean_bin"] = _get_mean_bins(df["means"], flavor, n_bins)
  File ".../pandas/core/reshape/tile.py", line 246, in cut
    bins = _nbins_to_bins(x_idx, bins, right)

The flavor='seurat' HVG selection expects log-normalized data; on raw counts, the log/mean-binning produces inf for all-zero genes and pd.cut rejects the input.

Suggested fixes (any one)

  1. Normalize before the HVG call: sc.pp.normalize_total(adata_tmp, target_sum=1e4); sc.pp.log1p(adata_tmp) before sc.pp.highly_variable_genes(...).
  2. Use flavor='seurat_v3' which is designed for raw counts.
  3. Document that mdata['rna'].X must be log-normalized when reaching this script, and either error out cleanly with that message or filter zero-count genes before HVG.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions