Skip to content

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 15, 2026

Summary by CodeRabbit

  • New Features

    • Support for "Hgnc_Name" as a protein identifier type, enabling annotation from HGNC gene names via the Gilda mapping service.
  • Documentation

    • Updated docs to include "Hgnc_Name" as an accepted proteinIdType option.
  • Tests

    • Added tests covering HGNC name-based resolution and expected output columns/statuses.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds support for a new Hgnc_Name proteinIdType: introduces an internal Gilda API caller to map HGNC names, propagates proteinIdType through HGNC/Uniprot population logic, sets UniprotId to NA for Hgnc_Name, updates docs, and adds tests.

Changes

Cohort / File(s) Summary
Core logic
R/annotateProteinInfoFromIndra.R, R/utils_annotateProteinInfoFromIndra.R
Add Hgnc_Name handling. New internal .callGetHgncIdsFromGildaApi(hgncNames) posts to Gilda (grounding.indra.bio/ground_multi) to derive HGNC IDs from GlobalProtein names. populateHgncIdsInDataFrame(df, proteinIdType) now branches on proteinIdType; UniprotId set to NA for Hgnc_Name. Note: function implemented twice (duplicate definition) in R/utils_annotateProteinInfoFromIndra.R.
Documentation
man/annotateProteinInfoFromIndra.Rd, man/dot-populateHgncIdsInDataFrame.Rd
Documented new allowed value "Hgnc_Name" for proteinIdType and updated .populateHgncIdsInDataFrame docs to include proteinIdType parameter.
Tests
tests/testthat/test-annotateProteinInfoFromIndra.R, tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
Add tests for Hgnc_Name workflow and for .callGetHgncIdsFromGildaApi() verifying parsed mappings (e.g., EGFR -> "3236", CHEK1 -> "1925").

Sequence Diagram

sequenceDiagram
    participant Main as annotateProteinInfoFromIndra()
    participant Pop as populateHgncIdsInDataFrame()
    participant Gilda as Gilda API
    participant DF as DataFrame

    Main->>Pop: call(df, proteinIdType="Hgnc_Name")
    Pop->>Pop: branch on proteinIdType
    alt Hgnc_Name
        Pop->>Gilda: POST GlobalProtein names to grounding.indra.bio/ground_multi
        Gilda-->>Pop: return grounding entries (filter term$db == "HGNC")
        Pop->>DF: set HgncId from mappings
        Pop->>DF: set UniprotId = NA
    else Uniprot-based
        Pop->>DF: map HgncId from existing UniprotId
    end
    Pop-->>Main: return annotated df
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Feature ptm analysis #56: Modifies the same protein ID annotation logic (changes to annotateProteinInfoFromIndra and Uniprot/HGNC population paths).

Suggested labels

Review effort 3/5

Poem

🐰 I nibble names and ask the sky,
Gilda whispers HGNC by and by,
Uniprot sleeps, IDs set to NA,
Hgnc_Name hops in—hip hip hooray! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely empty—no motivation, context, changes list, testing details, or checklist completion is provided beyond the template structure. Add a comprehensive description including motivation for supporting HGNC names, detailed list of changes made to each file, description of tests added, and complete the pre-review checklist.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for HGNC names as a new protein ID type option for users.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.



📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50811ef and 094a752.

📒 Files selected for processing (1)
  • R/utils_annotateProteinInfoFromIndra.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • R/utils_annotateProteinInfoFromIndra.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

Failed to generate code suggestions for PR

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
man/dot-populateHgncIdsInDataFrame.Rd (1)

18-19: Outdated function description.

The description states the function "populates the HGNC IDs in the data frame based on the Uniprot IDs," but this is no longer accurate when proteinIdType is "Hgnc_Name" — in that case, HGNC IDs are derived from GlobalProtein names via the Gilda API.

Consider updating the roxygen comment in R/annotateProteinInfoFromIndra.R to reflect both pathways:

💡 Suggested fix
-#' This function populates the HGNC IDs in the data frame based on the Uniprot IDs.
+#' This function populates the HGNC IDs in the data frame based on the protein ID type.
+#' For Uniprot/Uniprot_Mnemonic, it uses Uniprot IDs; for Hgnc_Name, it uses the Gilda API.
R/annotateProteinInfoFromIndra.R (1)

90-96: Documentation mismatch in function comment.

Line 92 states "populates the HGNC IDs in the data frame based on the Uniprot IDs" but the function now also handles Hgnc_Name which uses the Gilda API instead of Uniprot IDs. Update the roxygen comment to reflect both pathways.

tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)

1-87: Rename file: remove duplicate .R extension.

The file test-utils_annotateProteinInfoFromIndra.R.R has a double .R extension and should be renamed to test-utils_annotateProteinInfoFromIndra.R.

🤖 Fix all issues with AI agents
In `@R/utils_annotateProteinInfoFromIndra.R`:
- Around line 257-263: The `@return` roxygen tag is incorrect: change it to
describe the actual return type (a named character vector mapping input HGNC
names to HGNC IDs) rather than "list of HGNC IDs"; update the `@return` text in
the documentation block for the function in utils_annotateProteinInfoFromIndra.R
to explicitly state it returns a named character vector (names are the provided
HGNC names, values are the corresponding HGNC IDs) so the docs match the
function behavior.
- Around line 300-317: The function currently builds and returns an empty
character vector when res is NULL, causing inconsistent failure signaling versus
other helpers like .callGetHgncIdsFromUniprotIdsApi which return NULL; update
the logic at the start of the block that builds hgnc_mapping to explicitly check
if res is NULL (or length zero representing API failure) and return NULL
immediately, otherwise proceed to construct hgnc_mapping from res and return the
mapping; reference the local variable res and the mapping variable hgnc_mapping
to locate where to add the early NULL-return check.
🧹 Nitpick comments (6)
tests/testthat/test-annotateProteinInfoFromIndra.R (1)

85-110: Good test coverage for the new Hgnc_Name pathway.

The test correctly validates the expected behavior where UniprotId is NA when using Hgnc_Name type, which aligns with the implementation.

Two observations:

  1. Inconsistent boolean field assertions: Other tests in this file (lines 24-26, 58-60) assert specific FALSE values for IsTranscriptionFactor, IsKinase, and IsPhosphatase, but this test only checks expect_type(..., "logical"). Consider asserting the expected values for consistency, or document why type checking is sufficient here.

  2. Live API dependency: Unlike the utility tests which use local_mocked_bindings, this test makes actual API calls, making it potentially flaky in CI environments. Consider whether mocking the Gilda API call would improve test reliability.

tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)

79-86: Inconsistent test pattern: real API call vs mocked calls.

Unlike other tests in this file that use local_mocked_bindings to mock the function under test (lines 4, 17, 30, etc.), this test makes a real API call to the Gilda service. This has tradeoffs:

  • Pro: Actually validates the real implementation and API contract
  • Con: Test becomes flaky if the external API is unavailable or returns different data

If this is intentional as an integration test, consider adding a skip condition for CI environments:

💡 Suggested improvement
 test_that(".callGetHgncIdsFromGildaApi works correctly", {
+    skip_if_offline()
+    skip_on_cran()
     hgnc_names <- list("EGFR", "CHEK1")
     result <- .callGetHgncIdsFromGildaApi(hgnc_names)
R/annotateProteinInfoFromIndra.R (2)

83-86: Redundant assignment: UniprotId is already NA.

Line 67 already initializes df$UniprotId <- NA, and neither the "Uniprot" nor "Uniprot_Mnemonic" branches modify it when proteinIdType == "Hgnc_Name". This block at lines 84-86 is therefore redundant.

♻️ Suggested removal
-        
-        if (proteinIdType == "Hgnc_Name") {
-            df$UniprotId <- NA
-        }
         return(df)

100-121: Consider explicit type check instead of else fallback.

The else branch at line 111 implicitly handles "Hgnc_Name", but it would also silently accept any invalid proteinIdType value. This could mask bugs where a typo like "HGNC_Name" or "hgnc_name" is passed.

Consider either:

  1. Adding input validation at the start of annotateProteinInfoFromIndra to reject invalid types, or
  2. Using explicit matching:
💡 Suggested explicit matching
-        } else {
+        } else if (proteinIdType == "Hgnc_Name") {
             hgncNames <- unique(df$GlobalProtein)
             ...
+        } else {
+            stop("Invalid proteinIdType: ", proteinIdType, 
+                 ". Must be 'Uniprot', 'Uniprot_Mnemonic', or 'Hgnc_Name'.")
         }
R/utils_annotateProteinInfoFromIndra.R (2)

278-278: Consider extracting the Gilda API URL to a constant.

Other functions in this file use the INDRA_API_URL constant defined at line 1. For consistency and maintainability, consider defining a similar constant for the Gilda API:

 INDRA_API_URL = "https://discovery.indra.bio"
+GILDA_API_URL = "https://grounding.indra.bio"

Then update this line:

-    apiUrl <- file.path("https://grounding.indra.bio/", "ground_multi")
+    apiUrl <- file.path(GILDA_API_URL, "ground_multi")

287-298: Consider adding HTTP status code validation.

The API call doesn't verify the HTTP response was successful before processing content. If the server returns an error status (4xx/5xx), the response body might contain an error message that would be silently ignored during processing.

♻️ Suggested improvement
     res <- tryCatch({
         response <- POST(
             apiUrl,
             body = requestBody,
             add_headers("Content-Type" = "application/json"),
             encode = "raw"
         )
+        if (httr::http_error(response)) {
+            message("API request failed with status: ", httr::status_code(response))
+            return(NULL)
+        }
         content(response)
     }, error = function(e) {
         message("Error in API call: ", e)
         NULL
     })

Note: This pattern is consistent with other functions in this file, so this improvement could be applied across all API functions as a follow-up.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5afd016 and 50811ef.

📒 Files selected for processing (6)
  • R/annotateProteinInfoFromIndra.R
  • R/utils_annotateProteinInfoFromIndra.R
  • man/annotateProteinInfoFromIndra.Rd
  • man/dot-populateHgncIdsInDataFrame.Rd
  • tests/testthat/test-annotateProteinInfoFromIndra.R
  • tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pkgdown
  • GitHub Check: test-coverage
  • GitHub Check: build
🔇 Additional comments (3)
man/annotateProteinInfoFromIndra.Rd (1)

15-16: LGTM!

Documentation correctly updated to reflect the new "Hgnc_Name" option for proteinIdType.

man/dot-populateHgncIdsInDataFrame.Rd (1)

6-13: LGTM!

The function signature and parameter documentation correctly reflect the new proteinIdType parameter.

R/annotateProteinInfoFromIndra.R (1)

26-35: LGTM on the integration flow.

The main function correctly propagates proteinIdType through the pipeline, enabling the new HGNC name resolution pathway.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 88.67925% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.87%. Comparing base (5afd016) to head (094a752).

Files with missing lines Patch % Lines
R/utils_annotateProteinInfoFromIndra.R 84.21% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #67      +/-   ##
==========================================
+ Coverage   57.89%   58.87%   +0.98%     
==========================================
  Files           7        7              
  Lines        1356     1403      +47     
==========================================
+ Hits          785      826      +41     
- Misses        571      577       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tonywu1999 tonywu1999 changed the title Feature hgnc feat(id-conversion): Support HGNC names as user input Jan 15, 2026
@tonywu1999 tonywu1999 merged commit ce2af12 into devel Jan 15, 2026
4 checks passed
@tonywu1999 tonywu1999 deleted the feature-hgnc branch January 15, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants