Stage3_Interpretation/A_Plotting/Slurm_Version/cNMF_perturbed_gene_analysis.py:82–91:
if 'X_umap' not in mdata[args.prog_key].obsm:
import scanpy as sc
adata_tmp = mdata[args.data_key].copy()
sc.pp.highly_variable_genes(adata_tmp, n_top_genes=2000, subset=True)
sc.tl.pca(adata_tmp, n_comps=50)
sc.pp.neighbors(adata_tmp)
sc.tl.umap(adata_tmp)
mdata[args.prog_key].obsm['X_pca'] = adata_tmp.obsm['X_pca']
mdata[args.prog_key].obsm['X_umap'] = adata_tmp.obsm['X_umap']
del adata_tmp
When mdata[args.data_key].X contains raw counts (which is what falls out of the standard PerturbNMF Stage 1 inference), sc.pp.highly_variable_genes with the default flavor='seurat' raises:
ValueError: cannot specify integer `bins` when input data contains infinity
File ".../scanpy/preprocessing/_highly_variable_genes.py", line 343, in _highly_variable_genes_single_batch
df["mean_bin"] = _get_mean_bins(df["means"], flavor, n_bins)
File ".../pandas/core/reshape/tile.py", line 246, in cut
bins = _nbins_to_bins(x_idx, bins, right)
The flavor='seurat' HVG selection expects log-normalized data; on raw counts, the log/mean-binning produces inf for all-zero genes and pd.cut rejects the input.
Suggested fixes (any one)
- Normalize before the HVG call:
sc.pp.normalize_total(adata_tmp, target_sum=1e4); sc.pp.log1p(adata_tmp) before sc.pp.highly_variable_genes(...).
- Use
flavor='seurat_v3' which is designed for raw counts.
- Document that
mdata['rna'].X must be log-normalized when reaching this script, and either error out cleanly with that message or filter zero-count genes before HVG.
Stage3_Interpretation/A_Plotting/Slurm_Version/cNMF_perturbed_gene_analysis.py:82–91:When
mdata[args.data_key].Xcontains raw counts (which is what falls out of the standard PerturbNMF Stage 1 inference),sc.pp.highly_variable_geneswith the defaultflavor='seurat'raises:The
flavor='seurat'HVG selection expects log-normalized data; on raw counts, the log/mean-binning produces inf for all-zero genes andpd.cutrejects the input.Suggested fixes (any one)
sc.pp.normalize_total(adata_tmp, target_sum=1e4); sc.pp.log1p(adata_tmp)beforesc.pp.highly_variable_genes(...).flavor='seurat_v3'which is designed for raw counts.mdata['rna'].Xmust be log-normalized when reaching this script, and either error out cleanly with that message or filter zero-count genes before HVG.