-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Welcome to the radEmu wiki! This is a living document where we will be collecting great questions from users.
Q: If I ran radEmu on a dataset of shotgun sequencing "counts" with 1 covariate (control vs treatment), and for a specific bin/otu I get an estimate value of say 6, how can I interpret that?
A: If
- counts = "depth of coverage", and
- you didn't alter the argument
constraint_fntoemuFit(), and - the predictor that you put in your model is called
group(formula = ~ group) and has valuesTreatmentandControl, - you have no other predictors in your model,
then if the estimated coefficient on groupTreatment is 6, you would interpret that as follows:
"We estimate that the average cell concentration of [Your Bin] is e^6 \approx 403 times higher in samples in the Treatment group compared to those in the Control group, compared to the average fold-change across all bins."
Q: Would be appropriate to perform any sort of abundance filtering prior to running radEmu?
A: Practical answer -- we recommend against abundance filtering... but we definitely recommend filtering out chimeras and contaminants according to best practices.
One of the reasons why we recommend against abundance filtering is that while restricting to a prespecified subcomposition (e.g., "I only want to look at fold changes in Bifidobacteria") is absolutely fine, restricting to a data-driven subcomposition (via abundance filtering) is a fairly complex conditioning process that radEmu's p-values don't account for. Hence our recommendation against this practice 😊
Fun fact -- at the time of writing, we don't know of any differential abundance methods that account for the impact of abundance filtering on the distribution of test statistics either... but we know of many methods that perform abundance filtering as part of DA, even though they don't account for its impact! 🐈⬛ Just one more reason to consider radEmu...
Q: How does radEmu differ from [my current favorite method]?
New tools are always being developed, but as of April 2026, I do not know of any methods that are comparable in their performance to radEmu (exception: fastEmu). At some point the below may become out of date, but here are some methods that I'm thinking about as I make these comparisons: ANCOM-BC2 (and other members of the ANCOM family); corncob; DESeq2; ALDEx2; LinDA; MaAsLin3 (and other members of the biobakery family); and any kind of Wilcoxon or t-test.
One of the strongest advantages of radEmu over other methods is that it makes explicit what we are assuming about absolute abundances, HTS data, and the connection between them. There is no way to interpret other methods' estimates as about absolute abundances. This is one of the advantages of our approach of starting with what we would like to know (absolute abundances), and connecting it to what we have (HTS data) via explicit assumptions (e.g., differential detection). Some methods rely on vague buzzwords (like "compositionality" and "structural zeroes"), but don't make clear what is actually being assumed about the data. The transparency of our approach is, in my view, its biggest strength -- because then we can have a rational conversation about what's reasonable and not.
The second biggest advantage of radEmu over other methods is that we have type 1 error rate control (valid p-values); most others do not. An exception to this is ALDEx2 (which generally does have valid T1E), but it's underpowered, unfortunately. By maintaining error rates at nominal levels, radEmu generally has the highest power out of error rate controlling methods.
Why does T1E control matter? p-values that are too small are (unfortunately) good for getting papers published, but bad for science in general because the results are not replicable. So, I would strongly pitch for valid p-values over too-small p-values
That said, while I believe that radEmu has a stronger intellectual and practical foundation than other methods, I want to be transparent about two things
- estimated effect sizes and p-values are generally correlated between methods. In some analyses we have done, the two methods differ in their results, and in others they are similar.
- most other methods runs faster. (if speed is an issue, I recommend fastEmu as an alternative -- a good but fast approximation to radEmu)
We've gotten positive feedback from the community on the method, and I hope you'll consider radEmu for your next analysis! I (Amy) strongly believe in this method.