Hi funcscan team,
I am using the ARG workflow in nf-core/funcscan and noticed that the hAMRonization summary keeps tool/database-specific gene_symbol values, which makes it difficult to compare the same ARG across tools.
For example, the same gene may appear as:
erm(X) in AMRFinderPlus / ABRicate
ERMX in DeepARG
ErmX in RGI
Because of this, simple downstream comparisons by gene_symbol treat these as different entries even when they likely refer to the same ARG.
I know that funcscan already uses argNorm to normalize outputs from some tools to ARO terms, but in the current summary this still leaves a practical problem for users who want a directly comparable gene-level field across tools.
Why this matters
For metagenomic or multi-tool benchmarking analyses, users often want to:
compare concordance between tools
collapse redundant calls
summarize ARG abundance at the gene level
This is hard to do robustly when the summary output contains only the raw tool-specific naming conventions.
Current workaround
As a temporary workaround, I created a cleaned field in R:
hamronization_cleaned <- hamronization %>%
dplyr::mutate(
gene_symbol_clean = gene_symbol %>%
stringr::str_to_lower() %>%
stringr::str_replace_all("[^a-z0-9]", "")
)
This helps for simple cases, but it is only a heuristic and may not be the best long-term solution.
Suggested improvement
Would it be possible to add a standardized identifier/label in the ARG summary, for example:
gene_symbol_standardized
and/or an ontology-backed field such as ARO / ARO_name
while still preserving the original raw gene_symbol?
It would also be helpful to document clearly:
which tools are normalized
which tools are not
whether standardized values come from argNorm / ARO mapping or another approach
If this is considered upstream behavior from hAMRonization or argNorm, I would also be happy to open/cross-link an upstream issue.
Thanks!
Hi funcscan team,
I am using the ARG workflow in nf-core/funcscan and noticed that the hAMRonization summary keeps tool/database-specific gene_symbol values, which makes it difficult to compare the same ARG across tools.
For example, the same gene may appear as:
erm(X) in AMRFinderPlus / ABRicate
ERMX in DeepARG
ErmX in RGI
Because of this, simple downstream comparisons by gene_symbol treat these as different entries even when they likely refer to the same ARG.
I know that funcscan already uses argNorm to normalize outputs from some tools to ARO terms, but in the current summary this still leaves a practical problem for users who want a directly comparable gene-level field across tools.
Why this matters
For metagenomic or multi-tool benchmarking analyses, users often want to:
compare concordance between tools
collapse redundant calls
summarize ARG abundance at the gene level
This is hard to do robustly when the summary output contains only the raw tool-specific naming conventions.
Current workaround
As a temporary workaround, I created a cleaned field in R:
hamronization_cleaned <- hamronization %>%
dplyr::mutate(
gene_symbol_clean = gene_symbol %>%
stringr::str_to_lower() %>%
stringr::str_replace_all("[^a-z0-9]", "")
)
This helps for simple cases, but it is only a heuristic and may not be the best long-term solution.
Suggested improvement
Would it be possible to add a standardized identifier/label in the ARG summary, for example:
gene_symbol_standardized
and/or an ontology-backed field such as ARO / ARO_name
while still preserving the original raw gene_symbol?
It would also be helpful to document clearly:
which tools are normalized
which tools are not
whether standardized values come from argNorm / ARO mapping or another approach
If this is considered upstream behavior from hAMRonization or argNorm, I would also be happy to open/cross-link an upstream issue.
Thanks!