Bacdive media fix by realmarcin · Pull Request #174 · Knowledge-Graph-Hub/kg-microbe

realmarcin · 2024-06-04T19:31:54Z

No description provided.

Copilot

Pull Request Overview

This PR fixes issues in the BacDive media processing pipeline by refactoring data accumulation and edge creation logic. The changes address data structure inconsistencies and improve the handling of metabolite utilization and enzyme activity data.

Removes unused NCBI_TO_ENZYME_EDGE import and refactors enzyme activity handling
Changes data structures from dictionaries to tuples for consistent processing
Adds accumulation logic to collect data per NCBITaxon before writing edges

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-13T16:53:13Z

kg_microbe/transform_utils/bacdive/bacdive.py

+                )
+            for assay_id in info["assays"]:
+                # Unpacking the assay information stored as tuples
+                assay_curie, assay_value, utilization_type = assay_id


The tuple unpacking assumes all assay items have exactly 3 elements, but enzyme activities are stored as 2-element tuples (lines 749, 758) while metabolite utilizations are stored as 3-element tuples. This will cause a ValueError when processing enzyme data.

Suggested change

assay_curie, assay_value, utilization_type = assay_id

# Unpacking the assay information stored as tuples (handle both 2- and 3-element tuples)

if len(assay_id) == 3:

assay_curie, assay_value, utilization_type = assay_id

elif len(assay_id) == 2:

assay_curie, assay_value = assay_id

utilization_type = None

else:

raise ValueError(f"Unexpected assay tuple length: {len(assay_id)} for {assay_id}")

Copilot · 2025-08-13T16:53:13Z

kg_microbe/transform_utils/bacdive/bacdive.py

-                                        BACDIVE_PREFIX + key,
-                                    ]
-                                    edge_writer.writerow(meta_util_edges_to_write)
+                        for k, _, _ in positive_chebi_activity:


This line is incorrectly indented and creates a syntax error. It should be aligned with the previous if statement or properly nested within it.

Copilot · 2025-08-13T16:53:14Z

kg_microbe/transform_utils/bacdive/bacdive.py

                        writer_2.writerow(phys_and_meta_data)

+                    if ncbitaxon_id:
+                        if ncbitaxon_id not in self.ncbitaxon_info:


This data accumulation logic is duplicated later in the code (lines 604-656 and 767-814). The duplicate code should be consolidated into a single location or extracted into a helper method to improve maintainability.

Copilot · 2025-08-13T16:53:14Z

kg_microbe/transform_utils/bacdive/bacdive.py

+                edge_writer.writerow(
+                    [
+                        ncbitaxon_id,
+                        NCBI_TO_METABOLITE_UTILIZATION_EDGE,


All assay data is being written with NCBI_TO_METABOLITE_UTILIZATION_EDGE edge type, but enzyme activities should use a different edge type since they represent different biological relationships.

Suggested change

NCBI_TO_METABOLITE_UTILIZATION_EDGE,

# Select edge type based on utilization_type

if utilization_type == "enzyme_activity":

edge_type = ENZYME_TO_ASSAY_EDGE

else:

edge_type = NCBI_TO_METABOLITE_UTILIZATION_EDGE

edge_writer.writerow(

[

ncbitaxon_id,

edge_type,

realmarcin added 9 commits June 3, 2024 23:14

strain to taxon, strain to medium. also isolation source

7141959

error fix

8dbc5b5

error fix

308a243

error fix

88c5e31

error fix

4290d2c

error fix

1d1cb72

error fix

9412f78

error fix

b8562bd

error fix

cb65501

turbomam requested a review from Copilot August 13, 2025 16:51

Copilot AI reviewed Aug 13, 2025

View reviewed changes

realmarcin assigned sujaypatil96 Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Bacdive media fix#174

Bacdive media fix#174
realmarcin wants to merge 9 commits intomasterfrom
bacdive_media_fix

realmarcin commented Jun 4, 2024

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                assay_curie, assay_value, utilization_type = assay_id
+                # Unpacking the assay information stored as tuples (handle both 2- and 3-element tuples)
+                if len(assay_id) == 3:
+                    assay_curie, assay_value, utilization_type = assay_id
+                elif len(assay_id) == 2:
+                    assay_curie, assay_value = assay_id
+                    utilization_type = None
+                else:
+                    raise ValueError(f"Unexpected assay tuple length: {len(assay_id)} for {assay_id}")

-                        NCBI_TO_METABOLITE_UTILIZATION_EDGE,
+                # Select edge type based on utilization_type
+                if utilization_type == "enzyme_activity":
+                    edge_type = ENZYME_TO_ASSAY_EDGE
+                else:
+                    edge_type = NCBI_TO_METABOLITE_UTILIZATION_EDGE
+                edge_writer.writerow(
+                    [
+                        ncbitaxon_id,
+                        edge_type,

Comments

Conversation

realmarcin commented Jun 4, 2024

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants