-
Notifications
You must be signed in to change notification settings - Fork 0
feat(id-conversion): Support HGNC names as user input #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds support for a new Changes
Sequence DiagramsequenceDiagram
participant Main as annotateProteinInfoFromIndra()
participant Pop as populateHgncIdsInDataFrame()
participant Gilda as Gilda API
participant DF as DataFrame
Main->>Pop: call(df, proteinIdType="Hgnc_Name")
Pop->>Pop: branch on proteinIdType
alt Hgnc_Name
Pop->>Gilda: POST GlobalProtein names to grounding.indra.bio/ground_multi
Gilda-->>Pop: return grounding entries (filter term$db == "HGNC")
Pop->>DF: set HgncId from mappings
Pop->>DF: set UniprotId = NA
else Uniprot-based
Pop->>DF: map HgncId from existing UniprotId
end
Pop-->>Main: return annotated df
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Failed to generate code suggestions for PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
man/dot-populateHgncIdsInDataFrame.Rd (1)
18-19: Outdated function description.The description states the function "populates the HGNC IDs in the data frame based on the Uniprot IDs," but this is no longer accurate when
proteinIdTypeis"Hgnc_Name"— in that case, HGNC IDs are derived from GlobalProtein names via the Gilda API.Consider updating the roxygen comment in
R/annotateProteinInfoFromIndra.Rto reflect both pathways:💡 Suggested fix
-#' This function populates the HGNC IDs in the data frame based on the Uniprot IDs. +#' This function populates the HGNC IDs in the data frame based on the protein ID type. +#' For Uniprot/Uniprot_Mnemonic, it uses Uniprot IDs; for Hgnc_Name, it uses the Gilda API.R/annotateProteinInfoFromIndra.R (1)
90-96: Documentation mismatch in function comment.Line 92 states "populates the HGNC IDs in the data frame based on the Uniprot IDs" but the function now also handles
Hgnc_Namewhich uses the Gilda API instead of Uniprot IDs. Update the roxygen comment to reflect both pathways.tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)
1-87: Rename file: remove duplicate.Rextension.The file
test-utils_annotateProteinInfoFromIndra.R.Rhas a double.Rextension and should be renamed totest-utils_annotateProteinInfoFromIndra.R.
🤖 Fix all issues with AI agents
In `@R/utils_annotateProteinInfoFromIndra.R`:
- Around line 257-263: The `@return` roxygen tag is incorrect: change it to
describe the actual return type (a named character vector mapping input HGNC
names to HGNC IDs) rather than "list of HGNC IDs"; update the `@return` text in
the documentation block for the function in utils_annotateProteinInfoFromIndra.R
to explicitly state it returns a named character vector (names are the provided
HGNC names, values are the corresponding HGNC IDs) so the docs match the
function behavior.
- Around line 300-317: The function currently builds and returns an empty
character vector when res is NULL, causing inconsistent failure signaling versus
other helpers like .callGetHgncIdsFromUniprotIdsApi which return NULL; update
the logic at the start of the block that builds hgnc_mapping to explicitly check
if res is NULL (or length zero representing API failure) and return NULL
immediately, otherwise proceed to construct hgnc_mapping from res and return the
mapping; reference the local variable res and the mapping variable hgnc_mapping
to locate where to add the early NULL-return check.
🧹 Nitpick comments (6)
tests/testthat/test-annotateProteinInfoFromIndra.R (1)
85-110: Good test coverage for the newHgnc_Namepathway.The test correctly validates the expected behavior where
UniprotIdisNAwhen usingHgnc_Nametype, which aligns with the implementation.Two observations:
Inconsistent boolean field assertions: Other tests in this file (lines 24-26, 58-60) assert specific
FALSEvalues forIsTranscriptionFactor,IsKinase, andIsPhosphatase, but this test only checksexpect_type(..., "logical"). Consider asserting the expected values for consistency, or document why type checking is sufficient here.Live API dependency: Unlike the utility tests which use
local_mocked_bindings, this test makes actual API calls, making it potentially flaky in CI environments. Consider whether mocking the Gilda API call would improve test reliability.tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)
79-86: Inconsistent test pattern: real API call vs mocked calls.Unlike other tests in this file that use
local_mocked_bindingsto mock the function under test (lines 4, 17, 30, etc.), this test makes a real API call to the Gilda service. This has tradeoffs:
- Pro: Actually validates the real implementation and API contract
- Con: Test becomes flaky if the external API is unavailable or returns different data
If this is intentional as an integration test, consider adding a skip condition for CI environments:
💡 Suggested improvement
test_that(".callGetHgncIdsFromGildaApi works correctly", { + skip_if_offline() + skip_on_cran() hgnc_names <- list("EGFR", "CHEK1") result <- .callGetHgncIdsFromGildaApi(hgnc_names)R/annotateProteinInfoFromIndra.R (2)
83-86: Redundant assignment:UniprotIdis alreadyNA.Line 67 already initializes
df$UniprotId <- NA, and neither the"Uniprot"nor"Uniprot_Mnemonic"branches modify it whenproteinIdType == "Hgnc_Name". This block at lines 84-86 is therefore redundant.♻️ Suggested removal
- - if (proteinIdType == "Hgnc_Name") { - df$UniprotId <- NA - } return(df)
100-121: Consider explicit type check instead ofelsefallback.The
elsebranch at line 111 implicitly handles"Hgnc_Name", but it would also silently accept any invalidproteinIdTypevalue. This could mask bugs where a typo like"HGNC_Name"or"hgnc_name"is passed.Consider either:
- Adding input validation at the start of
annotateProteinInfoFromIndrato reject invalid types, or- Using explicit matching:
💡 Suggested explicit matching
- } else { + } else if (proteinIdType == "Hgnc_Name") { hgncNames <- unique(df$GlobalProtein) ... + } else { + stop("Invalid proteinIdType: ", proteinIdType, + ". Must be 'Uniprot', 'Uniprot_Mnemonic', or 'Hgnc_Name'.") }R/utils_annotateProteinInfoFromIndra.R (2)
278-278: Consider extracting the Gilda API URL to a constant.Other functions in this file use the
INDRA_API_URLconstant defined at line 1. For consistency and maintainability, consider defining a similar constant for the Gilda API:INDRA_API_URL = "https://discovery.indra.bio" +GILDA_API_URL = "https://grounding.indra.bio"Then update this line:
- apiUrl <- file.path("https://grounding.indra.bio/", "ground_multi") + apiUrl <- file.path(GILDA_API_URL, "ground_multi")
287-298: Consider adding HTTP status code validation.The API call doesn't verify the HTTP response was successful before processing content. If the server returns an error status (4xx/5xx), the response body might contain an error message that would be silently ignored during processing.
♻️ Suggested improvement
res <- tryCatch({ response <- POST( apiUrl, body = requestBody, add_headers("Content-Type" = "application/json"), encode = "raw" ) + if (httr::http_error(response)) { + message("API request failed with status: ", httr::status_code(response)) + return(NULL) + } content(response) }, error = function(e) { message("Error in API call: ", e) NULL })Note: This pattern is consistent with other functions in this file, so this improvement could be applied across all API functions as a follow-up.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
R/annotateProteinInfoFromIndra.RR/utils_annotateProteinInfoFromIndra.Rman/annotateProteinInfoFromIndra.Rdman/dot-populateHgncIdsInDataFrame.Rdtests/testthat/test-annotateProteinInfoFromIndra.Rtests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pkgdown
- GitHub Check: test-coverage
- GitHub Check: build
🔇 Additional comments (3)
man/annotateProteinInfoFromIndra.Rd (1)
15-16: LGTM!Documentation correctly updated to reflect the new
"Hgnc_Name"option forproteinIdType.man/dot-populateHgncIdsInDataFrame.Rd (1)
6-13: LGTM!The function signature and parameter documentation correctly reflect the new
proteinIdTypeparameter.R/annotateProteinInfoFromIndra.R (1)
26-35: LGTM on the integration flow.The main function correctly propagates
proteinIdTypethrough the pipeline, enabling the new HGNC name resolution pathway.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #67 +/- ##
==========================================
+ Coverage 57.89% 58.87% +0.98%
==========================================
Files 7 7
Lines 1356 1403 +47
==========================================
+ Hits 785 826 +41
- Misses 571 577 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.