feat(id-conversion): Support HGNC names as user input #67

tonywu1999 · 2026-01-15T20:50:02Z

Summary by CodeRabbit

New Features
- Support for "Hgnc_Name" as a protein identifier type, enabling annotation from HGNC gene names via the Gilda mapping service.
Documentation
- Updated docs to include "Hgnc_Name" as an accepted proteinIdType option.
Tests
- Added tests covering HGNC name-based resolution and expected output columns/statuses.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T20:50:12Z

📝 Walkthrough

Walkthrough

Adds support for a new Hgnc_Name proteinIdType: introduces an internal Gilda API caller to map HGNC names, propagates proteinIdType through HGNC/Uniprot population logic, sets UniprotId to NA for Hgnc_Name, updates docs, and adds tests.

Changes

Cohort / File(s)	Summary
Core logic `R/annotateProteinInfoFromIndra.R`, `R/utils_annotateProteinInfoFromIndra.R`	Add `Hgnc_Name` handling. New internal `.callGetHgncIdsFromGildaApi(hgncNames)` posts to Gilda (`grounding.indra.bio/ground_multi`) to derive HGNC IDs from GlobalProtein names. `populateHgncIdsInDataFrame(df, proteinIdType)` now branches on `proteinIdType`; `UniprotId` set to `NA` for `Hgnc_Name`. Note: function implemented twice (duplicate definition) in `R/utils_annotateProteinInfoFromIndra.R`.
Documentation `man/annotateProteinInfoFromIndra.Rd`, `man/dot-populateHgncIdsInDataFrame.Rd`	Documented new allowed value `"Hgnc_Name"` for `proteinIdType` and updated `.populateHgncIdsInDataFrame` docs to include `proteinIdType` parameter.
Tests `tests/testthat/test-annotateProteinInfoFromIndra.R`, `tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R`	Add tests for `Hgnc_Name` workflow and for `.callGetHgncIdsFromGildaApi()` verifying parsed mappings (e.g., `EGFR -> "3236"`, `CHEK1 -> "1925"`).

Sequence Diagram

sequenceDiagram
    participant Main as annotateProteinInfoFromIndra()
    participant Pop as populateHgncIdsInDataFrame()
    participant Gilda as Gilda API
    participant DF as DataFrame

    Main->>Pop: call(df, proteinIdType="Hgnc_Name")
    Pop->>Pop: branch on proteinIdType
    alt Hgnc_Name
        Pop->>Gilda: POST GlobalProtein names to grounding.indra.bio/ground_multi
        Gilda-->>Pop: return grounding entries (filter term$db == "HGNC")
        Pop->>DF: set HgncId from mappings
        Pop->>DF: set UniprotId = NA
    else Uniprot-based
        Pop->>DF: map HgncId from existing UniprotId
    end
    Pop-->>Main: return annotated df

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Feature ptm analysis #56: Modifies the same protein ID annotation logic (changes to annotateProteinInfoFromIndra and Uniprot/HGNC population paths).

Suggested labels

Review effort 3/5

Poem

🐰 I nibble names and ask the sky,
Gilda whispers HGNC by and by,
Uniprot sleeps, IDs set to NA,
Hgnc_Name hops in—hip hip hooray! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is entirely empty—no motivation, context, changes list, testing details, or checklist completion is provided beyond the template structure.	Add a comprehensive description including motivation for supporting HGNC names, detailed list of changes made to each file, description of tests added, and complete the pre-review checklist.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding support for HGNC names as a new protein ID type option for users.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50811ef and 094a752.

📒 Files selected for processing (1)

R/utils_annotateProteinInfoFromIndra.R

🚧 Files skipped from review as they are similar to previous changes (1)

R/utils_annotateProteinInfoFromIndra.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-15T20:51:07Z

Failed to generate code suggestions for PR

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

man/dot-populateHgncIdsInDataFrame.Rd (1)
18-19: Outdated function description.

The description states the function "populates the HGNC IDs in the data frame based on the Uniprot IDs," but this is no longer accurate when proteinIdType is "Hgnc_Name" — in that case, HGNC IDs are derived from GlobalProtein names via the Gilda API.

Consider updating the roxygen comment in R/annotateProteinInfoFromIndra.R to reflect both pathways:
💡 Suggested fix
-#' This function populates the HGNC IDs in the data frame based on the Uniprot IDs.
+#' This function populates the HGNC IDs in the data frame based on the protein ID type.
+#' For Uniprot/Uniprot_Mnemonic, it uses Uniprot IDs; for Hgnc_Name, it uses the Gilda API.
R/annotateProteinInfoFromIndra.R (1)

90-96: Documentation mismatch in function comment.

Line 92 states "populates the HGNC IDs in the data frame based on the Uniprot IDs" but the function now also handles Hgnc_Name which uses the Gilda API instead of Uniprot IDs. Update the roxygen comment to reflect both pathways.

tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)

1-87: Rename file: remove duplicate .R extension.

The file test-utils_annotateProteinInfoFromIndra.R.R has a double .R extension and should be renamed to test-utils_annotateProteinInfoFromIndra.R.

🤖 Fix all issues with AI agents

In `@R/utils_annotateProteinInfoFromIndra.R`:
- Around line 257-263: The `@return` roxygen tag is incorrect: change it to
describe the actual return type (a named character vector mapping input HGNC
names to HGNC IDs) rather than "list of HGNC IDs"; update the `@return` text in
the documentation block for the function in utils_annotateProteinInfoFromIndra.R
to explicitly state it returns a named character vector (names are the provided
HGNC names, values are the corresponding HGNC IDs) so the docs match the
function behavior.
- Around line 300-317: The function currently builds and returns an empty
character vector when res is NULL, causing inconsistent failure signaling versus
other helpers like .callGetHgncIdsFromUniprotIdsApi which return NULL; update
the logic at the start of the block that builds hgnc_mapping to explicitly check
if res is NULL (or length zero representing API failure) and return NULL
immediately, otherwise proceed to construct hgnc_mapping from res and return the
mapping; reference the local variable res and the mapping variable hgnc_mapping
to locate where to add the early NULL-return check.

🧹 Nitpick comments (6)

tests/testthat/test-annotateProteinInfoFromIndra.R (1)

85-110: Good test coverage for the new Hgnc_Name pathway.

The test correctly validates the expected behavior where UniprotId is NA when using Hgnc_Name type, which aligns with the implementation.

Two observations:

Inconsistent boolean field assertions: Other tests in this file (lines 24-26, 58-60) assert specific FALSE values for IsTranscriptionFactor, IsKinase, and IsPhosphatase, but this test only checks expect_type(..., "logical"). Consider asserting the expected values for consistency, or document why type checking is sufficient here.

Live API dependency: Unlike the utility tests which use local_mocked_bindings, this test makes actual API calls, making it potentially flaky in CI environments. Consider whether mocking the Gilda API call would improve test reliability.
tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)
79-86: Inconsistent test pattern: real API call vs mocked calls.

Unlike other tests in this file that use local_mocked_bindings to mock the function under test (lines 4, 17, 30, etc.), this test makes a real API call to the Gilda service. This has tradeoffs:

Pro: Actually validates the real implementation and API contract

Con: Test becomes flaky if the external API is unavailable or returns different data

If this is intentional as an integration test, consider adding a skip condition for CI environments:
💡 Suggested improvement
 test_that(".callGetHgncIdsFromGildaApi works correctly", {
+    skip_if_offline()
+    skip_on_cran()
     hgnc_names <- list("EGFR", "CHEK1")
     result <- .callGetHgncIdsFromGildaApi(hgnc_names)
R/annotateProteinInfoFromIndra.R (2)
83-86: Redundant assignment: UniprotId is already NA.

Line 67 already initializes df$UniprotId <- NA, and neither the "Uniprot" nor "Uniprot_Mnemonic" branches modify it when proteinIdType == "Hgnc_Name". This block at lines 84-86 is therefore redundant.
♻️ Suggested removal
-        
-        if (proteinIdType == "Hgnc_Name") {
-            df$UniprotId <- NA
-        }
         return(df)
100-121: Consider explicit type check instead of else fallback.

The else branch at line 111 implicitly handles "Hgnc_Name", but it would also silently accept any invalid proteinIdType value. This could mask bugs where a typo like "HGNC_Name" or "hgnc_name" is passed.

Consider either:

Adding input validation at the start of annotateProteinInfoFromIndra to reject invalid types, or

Using explicit matching:
💡 Suggested explicit matching
-        } else {
+        } else if (proteinIdType == "Hgnc_Name") {
             hgncNames <- unique(df$GlobalProtein)
             ...
+        } else {
+            stop("Invalid proteinIdType: ", proteinIdType, 
+                 ". Must be 'Uniprot', 'Uniprot_Mnemonic', or 'Hgnc_Name'.")
         }
R/utils_annotateProteinInfoFromIndra.R (2)
278-278: Consider extracting the Gilda API URL to a constant.

Other functions in this file use the INDRA_API_URL constant defined at line 1. For consistency and maintainability, consider defining a similar constant for the Gilda API:
 INDRA_API_URL = "https://discovery.indra.bio"
+GILDA_API_URL = "https://grounding.indra.bio"
Then update this line:
-    apiUrl <- file.path("https://grounding.indra.bio/", "ground_multi")
+    apiUrl <- file.path(GILDA_API_URL, "ground_multi")
287-298: Consider adding HTTP status code validation.

The API call doesn't verify the HTTP response was successful before processing content. If the server returns an error status (4xx/5xx), the response body might contain an error message that would be silently ignored during processing.
♻️ Suggested improvement
     res <- tryCatch({
         response <- POST(
             apiUrl,
             body = requestBody,
             add_headers("Content-Type" = "application/json"),
             encode = "raw"
         )
+        if (httr::http_error(response)) {
+            message("API request failed with status: ", httr::status_code(response))
+            return(NULL)
+        }
         content(response)
     }, error = function(e) {
         message("Error in API call: ", e)
         NULL
     })
Note: This pattern is consistent with other functions in this file, so this improvement could be applied across all API functions as a follow-up.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5afd016 and 50811ef.

📒 Files selected for processing (6)

R/annotateProteinInfoFromIndra.R
R/utils_annotateProteinInfoFromIndra.R
man/annotateProteinInfoFromIndra.Rd
man/dot-populateHgncIdsInDataFrame.Rd
tests/testthat/test-annotateProteinInfoFromIndra.R
tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: pkgdown
GitHub Check: test-coverage
GitHub Check: build

🔇 Additional comments (3)

man/annotateProteinInfoFromIndra.Rd (1)

15-16: LGTM!

Documentation correctly updated to reflect the new "Hgnc_Name" option for proteinIdType.

man/dot-populateHgncIdsInDataFrame.Rd (1)

6-13: LGTM!

The function signature and parameter documentation correctly reflect the new proteinIdType parameter.

R/annotateProteinInfoFromIndra.R (1)

26-35: LGTM on the integration flow.

The main function correctly propagates proteinIdType through the pipeline, enabling the new HGNC name resolution pathway.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

R/utils_annotateProteinInfoFromIndra.R

codecov-commenter · 2026-01-15T20:56:45Z

Codecov Report

❌ Patch coverage is 88.67925% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.87%. Comparing base (5afd016) to head (094a752).

Files with missing lines	Patch %	Lines
R/utils_annotateProteinInfoFromIndra.R	84.21%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel      #67      +/-   ##
==========================================
+ Coverage   57.89%   58.87%   +0.98%     
==========================================
  Files           7        7              
  Lines        1356     1403      +47     
==========================================
+ Hits          785      826      +41     
- Misses        571      577       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tonywu1999 added 2 commits January 15, 2026 15:39

feat(id-conversion): Support HGNC names for identifiers

69b7b74

add unit tests for hgnc names

50811ef

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

R/utils_annotateProteinInfoFromIndra.R Show resolved Hide resolved

R/utils_annotateProteinInfoFromIndra.R Show resolved Hide resolved

update coderabbit comments

094a752

tonywu1999 changed the title ~~Feature hgnc~~ feat(id-conversion): Support HGNC names as user input Jan 15, 2026

tonywu1999 merged commit ce2af12 into devel Jan 15, 2026
4 checks passed

tonywu1999 deleted the feature-hgnc branch January 15, 2026 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(id-conversion): Support HGNC names as user input #67

feat(id-conversion): Support HGNC names as user input #67

Uh oh!

tonywu1999 commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(id-conversion): Support HGNC names as user input #67

feat(id-conversion): Support HGNC names as user input #67

Uh oh!

Conversation

tonywu1999 commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tonywu1999 commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

codecov-commenter commented Jan 15, 2026 •

edited

Loading