Version 2.0 | RNA-seq & TempO-Seq Analysis Platform
- What is ARACRA?
- What You'll Need
- First-Time Setup: Linux
- First-Time Setup: Windows (via WSL2)
- Starting the App
- Web Interface Walkthrough
- Your Metadata File
- Understanding the Two-Phase Design
- Bioinformatics Tools Reference
- Output Files & Folders
- Troubleshooting
ARACRA is a Streamlit-based graphical user interface that integrates a complete RNA-seq processing pipeline with downstream differential expression (DEG) and dose-response (DRomics) analysis.
What it does, start to finish:
Raw reads (SRA or local FASTQ)
↓ fastp — quality trimming & adapter removal
↓ STAR / HISAT2 — alignment to the human genome (hg38)
↓ QC suite — flagstat, RSeQC, Picard, Qualimap, MultiQC
↓ featureCounts / Salmon — gene-level quantification
↓ [you review QC, decide which samples to keep]
↓ DESeq2 — differential expression analysis
↓ DRomics / BMD — dose-response modelling & benchmark dose
It also supports TempO-Seq (BioSpyder targeted sequencing) alongside standard whole-transcriptome RNA-seq.
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 16 GB | 64 GB+ |
| CPU cores | 4 | 16+ |
| Free disk space | 100 GB | 500 GB+ |
| Internet connection | Required for first setup | — |
RAM note: STAR alignment requires ~32 GB RAM to load the human genome index. If your machine has less than 32 GB, the setup script will automatically recommend HISAT2, which only needs ~8 GB RAM. You won't lose any important functionality — HISAT2 is a solid, widely used aligner.
- Linux (Ubuntu 20.04+ or any modern distro) — runs natively
- Windows 10/11 — runs through WSL2 (Windows Subsystem for Linux)
- macOS — technically possible but not officially tested
This section assumes you're starting from a fresh Linux machine with no special software installed. Everything else will be installed automatically.
Open a terminal and run:
# If you haven't already, install git
sudo apt update && sudo apt install -y git
# Clone (or download and extract) the ARACRA folder
cd ~/Downloads
# (the ARACRA folder should already be here if you received it directly)
cd ARACRAThe setup downloads about 10–15 GB of reference genome data. Make sure you have room:
df -h ~If you want to store the databases somewhere other than ~/databases (e.g. a large external drive), note the path — you'll pass it to setup with --db-dir.
bash setup.shThe setup script will:
- Detect your CPU count, RAM, and available disk
- Automatically install Miniforge3 (a lightweight conda) if conda isn't already present
- Create a conda environment called
test_ARACRAwith every tool pre-installed - Download the hg38 human genome, GTF annotation, transcriptome FASTA, and all annotation files
- Build alignment indexes (STAR takes 30–45 min; HISAT2 index is downloaded pre-built)
- Build the Salmon index with genome decoys
- Install all required R packages including DESeq2, DRomics, edgeR, and pathway databases
Expect this to take 1–3 hours the first time, mostly waiting for downloads and index building.
# Skip building alignment indexes (if you already have them elsewhere)
bash setup.sh --skip-index
# Skip downloading FastQ Screen genomes (saves ~14 GB, disables contamination screening)
bash setup.sh --skip-screen
# Store databases on a different drive
bash setup.sh --db-dir=/mnt/bigdrive/databases
# Combine flags
bash setup.sh --skip-screen --db-dir=/mnt/bigdrive/databasesAfter setup completes, you should see a summary like:
✔ nextflow version 24.x.x
✔ STAR 2.7.x
✔ hisat2 2.2.x
✔ samtools 1.x
✔ fastp 0.23.x
✔ featureCounts 2.x
✔ salmon 1.10.x
✔ multiqc 1.x
✔ Rscript 4.x
✔ streamlit 1.x
If any tool shows ✘ NOT FOUND, see the Troubleshooting section.
Windows doesn't run Linux tools natively, but WSL2 (Windows Subsystem for Linux 2) gives you a full Linux environment right inside Windows — no virtual machine needed. ARACRA runs inside WSL2 exactly like it would on a Linux machine.
Open PowerShell as Administrator (right-click the Start menu → "Windows PowerShell (Admin)") and run:
wsl --installThis installs WSL2 with Ubuntu. Restart your computer when prompted.
If you already have WSL1, upgrade it:
wsl --set-default-version 2
After restart, search "Ubuntu" in the Start menu and open it. The first time it runs, it will ask you to create a username and password — choose something simple, since you'll type the password occasionally.
# Update the package list
sudo apt update && sudo apt upgrade -y
# Install some basics
sudo apt install -y git curl wget unzip build-essentialBy default, WSL2 caps itself at 50% of your system RAM. If you want to use STAR (which needs ~32 GB), you may need to raise this limit.
Create or edit the file C:\Users\YourName\.wslconfig (replace YourName with your Windows username):
[wsl2]
memory=48GB
processors=12
swap=8GBThen restart WSL from PowerShell:
wsl --shutdownRe-open Ubuntu. Run free -h to confirm the RAM is now available.
WSL2 has its own filesystem, separate from Windows. Your Windows C:\ drive is mounted at /mnt/c/ inside WSL. For best performance, keep your data inside WSL (e.g. ~/Downloads/ARACRA), not on the Windows drive — filesystem crossover is slow.
# Copy files from Windows into WSL (if needed)
cp -r /mnt/c/Users/YourName/Downloads/ARACRA ~/Downloads/ARACRAcd ~/Downloads/ARACRA
bash setup.shEverything from this point forward is identical to the Linux instructions. The setup script handles everything automatically.
When you launch the app inside WSL2, the Streamlit server starts at http://localhost:8501. Open any Windows browser (Chrome, Edge, Firefox) and go to:
http://localhost:8501
It works. WSL2 forwards ports to Windows automatically.
Tip: If the page doesn't load, try
http://127.0.0.1:8501instead.
Each time you want to use ARACRA (after the one-time setup is done):
cd ~/Downloads/ARACRA
bash run_app.shYou'll see output like:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Open that URL in your browser. Leave the terminal running in the background — the pipeline continues even if you close the browser tab.
The pipeline keeps running even if you close the browser. It runs as a detached background process. You can come back hours later, re-open the app, and see your results.
The ARACRA interface has two main areas: the sidebar on the left for configuration, and four tabs in the main area for different stages of your analysis.
The sidebar is your control panel. It stays visible from every tab.
At the very top, a small box shows your machine's live stats:
System
RAM: 62.8 GB (58.2 GB free)
CPUs: 16
Disk: 420 GB free
Aligner: STAR
The Aligner line tells you which aligner the setup script recommended based on your RAM. You can override this in the Full Pipeline tab.
These paths are auto-populated from your setup. You usually don't need to touch them unless you want to use a custom reference.
| Field | What it points to |
|---|---|
| STAR index | Pre-built STAR genome index (~28 GB) |
| HISAT2 index | Pre-built HISAT2 splice-aware index |
| Salmon index | Selective alignment index with decoys |
| GTF annotation | GENCODE v44 gene annotation |
| RefSeq BED | Gene body coordinates for RSeQC |
| Housekeeping BED | Housekeeping gene coordinates for coverage QC |
| refFlat | Picard RNA metrics reference |
| FastQ Screen conf | Contamination screen genome list |
- Output directory — where results, count matrices, QC reports, and analysis outputs are saved. Default:
~/aracra_star_work/results - Work directory — Nextflow's internal scratch space for intermediate files. Default:
~/aracra_star_work/work
- Profile —
standardruns everything locally on your machine.slurmsubmits jobs to an HPC cluster (requires a configured SLURM environment). - Library layout —
SEfor single-end reads,PEfor paired-end reads. - Resume previous run — Keep this checked. It tells Nextflow to skip steps that already completed successfully, so if something fails partway through, you restart without repeating hours of alignment work.
- RNA-seq (whole transcriptome) — standard mode; aligns to the full hg38 genome
- TempO-Seq (targeted) — for BioSpyder TempO-Seq data; aligns to a probe reference instead of the whole genome. When selected, a file uploader appears for your BioSpyder manifest CSV.
Checkboxes to enable/disable the analysis steps that run after quantification:
- DESeq2 — differential expression analysis (treatment vs. control)
- DRomics / BMD — dose-response modelling and benchmark dose calculation
Click ▶ Check Tools to verify that every required tool is properly installed and accessible. A green checkmark means it's working; a red ✘ means something is wrong.
This is the first thing to check if your pipeline fails unexpectedly.
This is where most users spend their time. It runs the complete pipeline from raw reads to quantified counts, with an optional QC review gate before statistical analysis.
Choose where your sequencing data comes from:
- Download from SRA — enter SRA accession numbers (e.g.
SRR12345678) directly in the app; the pipeline downloads them automatically usingprefetchandfasterq-dump - Local raw FASTQs — you already have FASTQ files; provide the folder path
- Local pre-trimmed FASTQs — your reads are already trimmed with fastp/Trimmomatic; skip the trimming step
Upload your sample metadata as a CSV or Excel file. This tells the pipeline which samples belong to which treatment groups, what doses were used, and which samples are controls.
At minimum, your file needs columns for:
| Column | Purpose |
|---|---|
| Sample_Name | SRR accession or file basename |
| Treatment | Chemical or treatment name |
| Dose | Dose level (numeric, in µM or your units) |
| Type | treatment or control |
The app auto-detects common column names (e.g. sample, chemical, conc, condition) and maps them. A column-mapping interface appears so you can confirm or correct the mapping.
Choose between STAR and HISAT2:
- STAR — faster, more accurate for standard RNA-seq; needs ~32 GB RAM
- HISAT2 — uses only ~8 GB RAM; excellent for machines with less memory
Toggle additional QC steps on or off:
| Module | What it does | Cost |
|---|---|---|
| FastQ Screen | Screens for non-human contamination | Moderate time |
| Read distribution | RSeQC read distribution across gene features | Fast |
| Qualimap | Deep BAM QC including RNA-seq metrics | Slow |
| Gene body coverage (housekeeping) | 5'→3' coverage uniformity on stable genes | Fast |
| Gene body coverage (full) | Coverage across all genes | Moderate |
| Picard RNA metrics | Ribosomal fraction, 5'/3' bias | Fast |
| Salmon (alignment-based) | Transcript-level quantification | Moderate |
| Salmon (pseudoalignment) | Ultra-fast pseudo-mapping | Fast |
- featureCounts — gene-level counts from BAM files; enabled by default
- Salmon (aligned) — uses the BAM file for accurate transcript quantification
- Infer strandedness — auto-detects library strandedness using RSeQC; recommended
- ▶ Start Pipeline — launches Phase 1 (preprocessing + QC + quantification)
- ⏹ Stop — gracefully terminates the running pipeline
- Progress bars — show per-process completion (Download → Fastp → STAR → QC → featureCounts → MultiQC)
- Live log — streams the Nextflow log in real time; last 80 lines shown
After Phase 1 completes, the app shows a QC review panel before any statistical analysis runs. This is intentional — you get to see the data quality and decide which samples to include.
The panel shows two layers of outlier detection:
Layer 1 — Alignment QC (Grubbs test + IQR fences):
- Flags samples with abnormally low mapping rates, too few reads, or excessive multi-mapping
- A sample is only flagged if both the Grubbs test and IQR method agree
Layer 2 — Expression PCA:
- Runs PCA on the count matrix and flags samples that cluster far from their treatment group
- A scatter plot shows PC1 vs PC2, with outliers marked
Flagged samples are pre-ticked for exclusion, but you make the final call. Untick a sample to keep it. Tick a sample that wasn't auto-flagged if you want to remove it manually.
Once you're happy with your selection, click ▶ Run Phase 2 (DESeq2 / DRomics) to launch the statistical analysis.
This tab lets you run just the statistical analysis — useful when you already have a count matrix from a previous run (or from another pipeline entirely).
Upload a count matrix CSV/TSV directly. The first column should be gene IDs; remaining columns are samples. No alignment needed.
Before running DESeq2, you can run a standalone PCA QC check on your count matrix. This generates the PCA plot and outlier report so you can review sample quality before committing to the full analysis.
Dropdowns to select which treatment group to compare against which control. The run is named automatically (e.g. BisphenolA_vs_Control).
Click ▶ Run DESeq2 or ▶ Run DRomics (or both). The analysis runs in a few minutes for typical datasets.
Fine-tune the statistical parameters before running your analysis. Sensible defaults are pre-filled; most users won't need to change these.
| Parameter | Default | Meaning |
|---|---|---|
| FDR (strict) | 0.01 | Adjusted p-value cutoff for the "stringent" DEG list |
| FDR (relaxed) | 0.05 | Adjusted p-value cutoff for the main DEG list |
| log₂ fold-change threshold | 1.0 | Minimum fold-change magnitude (log₂ scale) for a gene to be called DE |
| Read count threshold | 1,000,000 | Samples below this read count are flagged as low-quality |
| Parameter | Default | Meaning |
|---|---|---|
| FDR (gene selection) | 0.05 | Significance threshold for selecting dose-responsive genes |
| Model selection criterion | AICc | Information criterion for choosing between dose-response models |
| Transformation method | auto | Data transformation before fitting (auto, log, sqrt, none) |
| BMD z-factor | 1.0 | Number of standard deviations from baseline for BMD definition |
| BMD x-factor | 10.0 | Percentage change from baseline for BMD definition (alternative) |
| Bootstrap iterations | 1000 | Iterations for BMD confidence interval estimation |
| Use MSigDB pathways | Yes | Include MSigDB hallmark/KEGG/GO pathway sets for tPOD calculation |
These filters clean up BMD estimates before calculating the tPOD:
| Filter | Default | Purpose |
|---|---|---|
| Remove BMDs > max dose | On | Removes BMDs that exceed the highest tested dose (extrapolation) |
| Extrapolation factor | 10× | Flags BMDs more than 10× below the lowest tested dose |
| BMDU/BMDL ratio | 40 | Removes genes with very wide confidence intervals (uncertain BMDs) |
| Min fold-change | 0.0 | Optional: remove genes with small effect sizes |
Once a run completes, the Results tab shows a history of all your analysis runs and their outputs.
- Metric cards — samples analyzed, total genes tested, DEGs at FDR < 0.05, upregulated count, downregulated count
- PCA plot — batch-corrected VST-normalized PCA showing sample clustering
- Volcano plot — fold-change vs. significance; colored by up/down/non-significant
- DEG table — top differentially expressed genes with fold-change, p-value, and adjusted p-value
- Download buttons — export All Results, Filtered DEGs, Upregulated, or Downregulated gene lists as CSV
- Metric cards — genes analyzed, dose-responsive genes, models fitted, median gene BMD, tPOD
- Model quality panel — median pseudo-R², signal-to-noise ratio, percentage of genes with R² > 0.5
- tPOD highlight box — the transcriptomic point of departure with the driving pathway
- All tPOD methods table — compares tPOD estimates from multiple calculation methods (NTP 2018, median, Q25, etc.)
- tPOD by gene set collection — breakdown by KEGG, GO Biological Process, GO Molecular Function, Hallmarks
- Top 5 most sensitive pathways — the pathways with the lowest median BMD
- Dose-response curves — fitted curves for top responding genes
- BMD distribution — histogram of all gene-level BMD estimates
- Pathway sensitivity plot — ranked pathway tPOD values
- Download buttons — BMD results, top sensitive genes, dose-response models, pathway summary, model quality metrics
The metadata file is the key input that connects your samples to their experimental context. Get this right and everything else follows.
A CSV or Excel file with one row per sample:
Sample_Name,Treatment,Dose,Type,Batch
SRR12345678,Bisphenol A,0.1,treatment,1
SRR12345679,Bisphenol A,1.0,treatment,1
SRR12345680,Bisphenol A,10.0,treatment,1
SRR12345681,Control,0,control,1
SRR12345682,Control,0,control,1
SRR12345683,Control,0,control,1
- Sample_Name — must match your SRR accession exactly (for SRA downloads), or the FASTQ filename without the
.fastq.gzextension (for local files) - Treatment — the chemical or treatment name. Identical strings = same group. Spaces are fine.
- Dose — numeric. Use the same units throughout (µM is conventional). Controls should have dose = 0.
- Type —
treatmentfor treated samples,controlfor vehicle/untreated controls - Batch (optional) — integer batch number if samples were processed in multiple batches. DESeq2 will include batch as a covariate in the design.
- Column names are flexible — the app will auto-detect
sample,chemical,conc,group, etc. and map them. You'll see a confirmation screen before the pipeline starts. - If you have multiple treatment groups, include all of them in one file. You'll select the specific treatment vs. control comparison when running DESeq2.
- Dose values for DRomics must be numeric and monotonically increasing. Categorical dose labels won't work.
ARACRA deliberately splits the pipeline into two phases with a human review step in between. This design choice is intentional and important.
Runs automatically without intervention:
- Download raw reads from SRA (if needed)
- Trim adapters and low-quality bases with fastp
- Screen for contamination with FastQ Screen
- Align to hg38 with STAR or HISAT2
- Collect alignment statistics (flagstat, idxstats)
- Detect library strandedness
- Measure gene body coverage
- Run Picard and/or Qualimap RNA-seq QC
- Quantify gene expression (featureCounts, Salmon)
- Compile everything into a MultiQC report
After Phase 1, the pipeline pauses and shows you the quality metrics. Two automated algorithms scan for outliers:
- Grubbs test — detects statistical outliers in the mapping rate distribution
- IQR method — flags samples outside 1.5× IQR from Q1/Q3
A sample is only auto-flagged if both methods agree, reducing false positives. You see the PCA plot of all samples and a per-sample table. Pre-ticked samples are the auto-detected outliers.
Review carefully:
- Are the flagged samples genuinely bad, or just naturally variable?
- Are there samples you want to remove that weren't auto-flagged?
- Do the PCA clusters make biological sense?
After you confirm your sample selection, Phase 2 runs DESeq2 and DRomics on the approved samples.
This two-phase approach prevents a common problem: running statistical analysis on bad data and not realizing it until much later. Every decision is visible and recorded.
All tools are installed automatically by setup.sh. Here's what each one does:
| Tool | Purpose |
|---|---|
| SRA Tools (prefetch + fasterq-dump) | Downloads reads from NCBI's Sequence Read Archive |
| fastp | Adapter trimming, quality filtering, per-sample QC reports |
| FastQ Screen | Checks reads against 13 reference genomes to detect contamination |
| Tool | Purpose |
|---|---|
| STAR | Fast splice-aware aligner; needs 32 GB RAM; gold standard for RNA-seq |
| HISAT2 | Splice-aware aligner; needs only 8 GB RAM; excellent alternative |
| Bowtie2 | Used internally by FastQ Screen for contamination screening |
| Tool | Purpose |
|---|---|
| samtools | Sorts, indexes BAM files; generates flagstat and idxstats reports |
| RSeQC | Gene body coverage, read distribution, strandedness inference |
| Picard | RNA-seq metrics: ribosomal fraction, 5'/3' bias, median CV coverage |
| Qualimap | Comprehensive BAM QC and RNA-seq specific statistics |
| MultiQC | Aggregates all QC reports into one interactive HTML report |
| Tool | Purpose |
|---|---|
| featureCounts (Subread) | Counts reads per gene from BAM files using GTF annotation |
| Salmon | Transcript quantification; supports both alignment-based and pseudo-alignment modes |
| Package | Purpose |
|---|---|
| DESeq2 | Negative binomial model for differential expression; size factor normalization |
| edgeR | Alternative DE method; used for low-count filtering |
| SVA | Surrogate variable analysis for batch effect correction |
| DRomics | Dose-response modelling; fits sigmoidal, exponential, linear, and other models |
| org.Hs.eg.db | Human gene annotation database (Entrez IDs, symbols, GO terms) |
| GO.db | Gene Ontology database for pathway enrichment |
| KEGGREST | KEGG pathway database access |
| AnnotationDbi | Interface layer for Bioconductor annotation packages |
| ggplot2 + ggrepel | Publication-quality plot generation |
| Tool | Purpose |
|---|---|
| Nextflow | Workflow manager; handles job scheduling, parallelism, and resumability |
| Conda / Mamba | Package and environment manager; keeps all tool versions isolated |
| Streamlit | Web framework that powers the browser interface |
After a run, your output directory (~/aracra_star_work/results by default) contains:
results/
├── fastp/ ← Per-sample trimming stats and HTML reports
├── fastq_screen/ ← Contamination screening results
├── star/ or hisat2/ ← Aligned BAM files
├── samtools/ ← flagstat, idxstats per sample
├── rseqc/ ← Gene body coverage, strandedness, read distribution
├── picard/ ← RNA metrics
├── qualimap/ ← BAM and RNA-seq QC
├── featurecounts/ ← Raw count matrix (all samples)
├── salmon/ ← Transcript-level quantification
├── qc/
│ └── outlier/
│ ├── outlier_flag.json ← Machine-readable QC flags
│ └── pca_qc_plot.png ← PCA scatter for QC review
├── multiqc_report.html ← Interactive QC summary (open in browser)
├── counts_matrix.csv ← Gene × sample count matrix
└── runs/
└── TreatmentName_vs_Control/
├── deg_results/
│ ├── All_Results.csv ← All genes with stats
│ ├── Custom_Filtered_DEGs.csv ← Significant DEGs
│ ├── Upregulated_Genes.csv
│ ├── Downregulated_Genes.csv
│ ├── Final_PCA_Plot.png
│ ├── Volcano_Plot.png
│ └── analysis_summary.json
└── dromics_results/
├── bmd_results.csv ← Gene-level BMD estimates
├── top50_sensitive_genes.csv
├── dose_response_curves.png
├── bmd_distribution.png
├── pathway_sensitivity.png
├── pathway_bmd_summary.csv ← Pathway-level tPOD
├── model_quality_metrics.csv
└── dromics_summary.json
The MultiQC report (multiqc_report.html) is worth opening directly in your browser — it's an interactive dashboard with all QC metrics across all samples side by side. Use it to spot outliers, check alignment rates, and verify library quality before your manuscript.
The sidebar shows this warning when setup hasn't been run or didn't complete. Run bash setup.sh from the ARACRA directory and wait for it to finish fully.
The setup script uses set -euo pipefail, meaning it stops on any error. Common causes:
No internet access:
curl -I https://ftp.ebi.ac.uk
# Should return HTTP/1.1 200. If it times out, check your network/proxy.Not enough disk space:
df -h ~/databases
# Need at least 80 GB free for full setupconda conflicts: If you have an existing conda installation that's causing issues:
conda clean --all -y
bash setup.shCheck the live log in the app, or read it directly:
tail -100 ~/aracra_star_work/pipeline.logCommon causes:
SRA download failure:
ERROR ~ process DOWNLOAD_SRA failed
→ Check your SRR accession numbers are correct and you have internet access. SRA can also be temporarily slow; try enabling "Resume" and re-running — it will pick up where it left off.
STAR out of memory:
STAR: Out of memory
→ Your machine has less than 32 GB RAM, or other processes are consuming RAM. Either free up memory, or switch to HISAT2 in the sidebar before running.
Nextflow not found:
conda activate ~/miniforge3/envs/test_ARACRA
nextflow -versionIf this fails, the conda environment isn't activating. Check that run_app.sh references the correct conda path.
The PCA plot is generated during Step 1 of the DESeq2/DRomics tab (QC Check), not during Phase 1 preprocessing. Click ▶ Step 1: QC Check in the DESeq2/DRomics tab to generate it.
Your read count threshold is too high, or the samples have very low sequencing depth. In the Parameters tab:
- Lower the Read count threshold from 1,000,000 to something like 500,000 or 100,000
- Check the
fastpreports (or MultiQC) to see actual read counts per sample
DRomics requires at least 3–4 dose levels (not counting the control) to fit dose-response curves. If you have fewer dose groups, DRomics won't have enough data points to fit a model.
Also check:
- FDR threshold in Parameters — try relaxing from 0.05 to 0.1
- Transformation method — try
logorsqrtinstead ofautoif your data has a skewed distribution
The R packages aren't loading from the conda environment. This usually means the conda environment isn't activated when the app launches.
# Check which R is being used
which Rscript
# Should show: ~/miniforge3/envs/test_ARACRA/bin/Rscript
# If not, activate manually and try
conda activate ~/miniforge3/envs/test_ARACRA
Rscript -e 'library(DESeq2); cat("OK\n")'If DESeq2 still isn't found, reinstall it:
conda activate ~/miniforge3/envs/test_ARACRA
Rscript -e 'BiocManager::install("DESeq2", ask=FALSE)'- Make sure the terminal running
run_app.shis still open and hasn't errored - Try
http://127.0.0.1:8501instead oflocalhost:8501 - Check Windows Firewall isn't blocking WSL2's loopback:
- Open Windows Security → Firewall → Allow an app → check that your browser is allowed on private networks
- Restart WSL: in PowerShell run
wsl --shutdown, then reopen Ubuntu and relaunch the app
This can happen if the Nextflow work directory was deleted or if the .nextflow/ cache was removed. If you want a truly clean run:
rm -rf ~/aracra_star_work/work/.nextflowThen uncheck "Resume" in the sidebar for the next run (it will redo everything from scratch).
Mamba occasionally hits unsolvable conflicts when it tries to satisfy all package constraints simultaneously — especially on machines with an existing conda base that has pinned versions. You'll see something like:
LibMambaUnsatisfiableError: Encountered problems while solving:
- package star-2.7.11a requires libstdc++ >=12, but none of the providers can be installed
or a slower variant where conda just spins for many minutes and never resolves.
Step 1 — Try the libmamba solver explicitly:
conda install -n base conda-libmamba-solver -y
conda config --set solver libmamba
bash setup.shStep 2 — If still stuck, create the environment from scratch with a pinned Python:
# Remove the broken environment
conda env remove -p ~/miniforge3/envs/test_ARACRA -y
# Re-run setup — it will rebuild cleanly
bash setup.shStep 3 — Conflict is between two specific packages:
Identify the conflicting packages from the error message, then install them in two passes:
conda activate ~/miniforge3/envs/test_ARACRA
# Install the core aligner stack first (no R)
mamba install -c conda-forge -c bioconda \
openjdk=17 nextflow star hisat2 samtools fastp salmon subread -y
# Then install R stack separately
mamba install -c conda-forge -c bioconda \
r-base bioconductor-deseq2 bioconductor-edger -yStep 4 — Nuclear option (completely fresh conda):
If your existing conda base is deeply conflicted (old pinned packages, stale channels):
# Back up any other environments you care about first!
# Then remove miniforge/miniconda entirely
rm -rf ~/miniforge3 # or ~/miniconda3 / ~/anaconda3
# Re-run setup — it will install a fresh Miniforge3 automatically
bash setup.shStep 5 — Salmon GLIBC incompatibility (separate issue):
On some older Linux systems or certain WSL2 configurations, Salmon installs but fails to run:
salmon: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34 not found
Fix:
conda activate ~/miniforge3/envs/test_ARACRA
mamba install -c bioconda 'salmon>=1.10' --force-reinstall -yIf that doesn't resolve it, run setup with --skip-index to at least get the rest of the pipeline working, then use featureCounts for quantification instead of Salmon.
- Check the live log first (
~/aracra_star_work/pipeline.log) — it usually contains the exact error - Use the ▶ Check Tools button in the sidebar to confirm all tools are installed
- For Nextflow-specific issues: the Nextflow log is at
.nextflow.login the ARACRA directory - For any issue, create a ticket in issues section of GitHub.
ARACRA Pipeline · v2.0 · For research use