ARG gene symbols are not standardized across tools in hAMRonization summary, making cross-tool comparison difficult

Hi funcscan team,

I am using the ARG workflow in nf-core/funcscan and noticed that the hAMRonization summary keeps tool/database-specific gene_symbol values, which makes it difficult to compare the same ARG across tools.

For example, the same gene may appear as:

erm(X) in AMRFinderPlus / ABRicate
ERMX in DeepARG
ErmX in RGI

Because of this, simple downstream comparisons by gene_symbol treat these as different entries even when they likely refer to the same ARG.

I know that funcscan already uses argNorm to normalize outputs from some tools to ARO terms, but in the current summary this still leaves a practical problem for users who want a directly comparable gene-level field across tools.

*Why this matters*
For metagenomic or multi-tool benchmarking analyses, users often want to:

compare concordance between tools
collapse redundant calls
summarize ARG abundance at the gene level

This is hard to do robustly when the summary output contains only the raw tool-specific naming conventions.

*Current workaround*
As a temporary workaround, I created a cleaned field in R:

hamronization_cleaned <- hamronization %>%
  dplyr::mutate(
    gene_symbol_clean = gene_symbol %>%
      stringr::str_to_lower() %>%
      stringr::str_replace_all("[^a-z0-9]", "")
  )

This helps for simple cases, but it is only a heuristic and may not be the best long-term solution.

*Suggested improvement*
Would it be possible to add a standardized identifier/label in the ARG summary, for example:

gene_symbol_standardized
and/or an ontology-backed field such as ARO / ARO_name

while still preserving the original raw gene_symbol?

It would also be helpful to document clearly:

which tools are normalized
which tools are not
whether standardized values come from argNorm / ARO mapping or another approach

If this is considered upstream behavior from hAMRonization or argNorm, I would also be happy to open/cross-link an upstream issue.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARG gene symbols are not standardized across tools in hAMRonization summary, making cross-tool comparison difficult #525

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARG gene symbols are not standardized across tools in hAMRonization summary, making cross-tool comparison difficult #525

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions