Skip to content

ARG gene symbols are not standardized across tools in hAMRonization summary, making cross-tool comparison difficult #525

@OriolGEM

Description

@OriolGEM

Hi funcscan team,

I am using the ARG workflow in nf-core/funcscan and noticed that the hAMRonization summary keeps tool/database-specific gene_symbol values, which makes it difficult to compare the same ARG across tools.

For example, the same gene may appear as:

erm(X) in AMRFinderPlus / ABRicate
ERMX in DeepARG
ErmX in RGI

Because of this, simple downstream comparisons by gene_symbol treat these as different entries even when they likely refer to the same ARG.

I know that funcscan already uses argNorm to normalize outputs from some tools to ARO terms, but in the current summary this still leaves a practical problem for users who want a directly comparable gene-level field across tools.

Why this matters
For metagenomic or multi-tool benchmarking analyses, users often want to:

compare concordance between tools
collapse redundant calls
summarize ARG abundance at the gene level

This is hard to do robustly when the summary output contains only the raw tool-specific naming conventions.

Current workaround
As a temporary workaround, I created a cleaned field in R:

hamronization_cleaned <- hamronization %>%
dplyr::mutate(
gene_symbol_clean = gene_symbol %>%
stringr::str_to_lower() %>%
stringr::str_replace_all("[^a-z0-9]", "")
)

This helps for simple cases, but it is only a heuristic and may not be the best long-term solution.

Suggested improvement
Would it be possible to add a standardized identifier/label in the ARG summary, for example:

gene_symbol_standardized
and/or an ontology-backed field such as ARO / ARO_name

while still preserving the original raw gene_symbol?

It would also be helpful to document clearly:

which tools are normalized
which tools are not
whether standardized values come from argNorm / ARO mapping or another approach

If this is considered upstream behavior from hAMRonization or argNorm, I would also be happy to open/cross-link an upstream issue.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement for existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions