Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
51e25e1
Variable renaming and new CLI cleanup
nicoloesch May 8, 2026
4a0c685
Initial support for omop-emb 0.5.0
nicoloesch May 11, 2026
43be1d3
Update omop-emb version, move .csv to config, update docker-compose
nicoloesch May 13, 2026
ea39029
Updated Docs
nicoloesch May 13, 2026
e038889
Fix positional column dependency for EdgeView
nicoloesch May 13, 2026
5a58aab
Clearer description of synonym and regular table, added attribute
nicoloesch May 13, 2026
a478e63
Fix LabelMatchGroupView sorting in from_matches
nicoloesch May 13, 2026
3dc8c54
Fix traverse to include only visited nodes
nicoloesch May 13, 2026
9abbf27
Fix deduplication issue in standard_paths
nicoloesch May 13, 2026
b303a72
Adapt docstring and signature of find_standard_paths to accurately re…
nicoloesch May 13, 2026
ced97eb
Fix for empty path steps to not crash
nicoloesch May 13, 2026
ba5e952
Adapt docstring of previous commit
nicoloesch May 13, 2026
0598f5a
Fix scoring and grounding description and variable names
nicoloesch May 13, 2026
7952358
Do not rely on standard=False for path reconstruction
nicoloesch May 13, 2026
47bd7aa
Rename the row elements of edges query to fall in line with EdgeView.…
nicoloesch May 13, 2026
2d09ca5
Include immediate parent/child in ancestor/descendant
nicoloesch May 13, 2026
ec20e88
Dont include self in count of ancestors and descendants
nicoloesch May 13, 2026
de8f8a4
Include class_id and sublcass_id in query
nicoloesch May 13, 2026
0778344
Correctly reference column name that was renamed for synonym
nicoloesch May 13, 2026
1509e7f
Correctly clear all LRU caches
nicoloesch May 13, 2026
399d595
Correctly have invert of relationships
nicoloesch May 13, 2026
6b6c6cf
Rectify the relationship cache warning in __init__
nicoloesch May 13, 2026
6ac3930
Correct CLI log message for relationship-classification
nicoloesch May 13, 2026
24ea78e
Yield from within session scope of entailed incoming relationships
nicoloesch May 13, 2026
4e753b8
Include oaklib as dependency
nicoloesch May 13, 2026
b5550ab
Correct CLI docs for env variables
nicoloesch May 13, 2026
3f507d2
Remove logging from tracked files
nicoloesch May 13, 2026
9fe1f39
Pin orm-loader version, remove redundant omop-cdm
nicoloesch May 19, 2026
f3f303f
Adapt logging message depending on exception in for optional embeddin…
nicoloesch May 19, 2026
b4b5174
Cleanup of grounding and nearest concepts
nicoloesch May 19, 2026
9176056
Change text to query and text_embedding to query_embedding
nicoloesch May 19, 2026
0a40981
Cleanup for omop-graph and pylance
nicoloesch May 19, 2026
d647187
Adapt docstring to satisfy strict mkdocs
nicoloesch May 19, 2026
d66c2d7
Bring back the option to add test data
nicoloesch May 19, 2026
0cce679
Add docs and polish CLI further
nicoloesch May 19, 2026
ad44355
Remove LRU cache
nicoloesch May 19, 2026
cd22033
Fix bugs and remove stale/unused code
nicoloesch May 19, 2026
972aa5b
Remove round-trips to DB and utilise cache for validation
nicoloesch May 19, 2026
b93124e
Instance level RelationshipCache to support multiple KG instances
nicoloesch May 20, 2026
ed06aa8
Updated docs and pyproject
nicoloesch May 20, 2026
9d9be6e
Fix renderers
nicoloesch May 20, 2026
4d87c5c
Rename ClassIDEnum to PredicateKind for clearer description
nicoloesch May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@ wheels/
docs/backup/
docs/omop_relationships.csv
.vscode/
.env
.env
resources/
*.DS_Store
logging/
117 changes: 49 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,12 @@
# Architecture

This library provides a lightweight, query-time knowledge-graph layer over an OMOP vocabulary database, with explicit separation between:

* graph access (nodes, edges, predicates),
* graph algorithms (traversal, pathfinding),
* path scoring and explanation, and
* presentation / inspection utilities.

# omop-graph

**omop-graph** is a lightweight, opinionated knowledge-graph traversal and path-analysis library built on top of the OMOP vocabulary model.

It provides:
- a stable **KnowledgeGraph façade** over OMOP concepts and relationships
- flexible **graph traversal** (forward, backward, bidirectional)
- **path discovery and ranking** with transparent scoring
- **traceable explanations** of why one path is preferred over another
- **path discovery** with transparent scoring
- **traceable explanations** of traversal decisions
- multiple **rendering backends** (text, HTML, Mermaid)

The library is designed for:
Expand All @@ -31,105 +22,95 @@ The library is designed for:
pip install omop-graph
```

With embedding support (sqlite-vec backend, zero config):

```bash
pip install "omop-graph[emb]"
```

For larger deployments use `[pgvector]` or `[faiss-cpu]` instead (or in addition).
Full setup is covered in the [omop-emb documentation](https://australiancancerdatanetwork.github.io/omop-emb/).

---

## Core Concepts

### KnowledgeGraph

KnowledgeGraph is the main entry point. It wraps an existing SQLAlchemy session connected to an OMOP vocabulary schema. kg-core assumes OMOP semantics and tables.
`KnowledgeGraph` is the main entry point. It wraps a SQLAlchemy `Engine` connected to an OMOP vocabulary schema and provides a high-level Pythonic API over the relational tables.

```python
from sqlalchemy import create_engine
from omop_graph.graph.kg import KnowledgeGraph
```

### Nodes and Edges
engine = create_engine("postgresql://user:pass@localhost/omop")
kg = KnowledgeGraph(engine)

Nodes are OMOP Concepts; Edges are OMOP Concept_Relationships
# Lookup a concept by label
match_group = kg.label_lookup("Atrial Fibrillation", fuzzy=False)
concept = match_group.best_match
print(f"ID: {concept.concept_id}, Name: {concept.matched_label}")

Relationships are classified into semantic kinds:
# Traverse the hierarchy
parents = kg.parents(concept.concept_id)
```

### Nodes and Edges

* ONTOLOGICAL
* MAPPING
* ATTRIBUTE
* VERSIONING
* METADATA
Nodes are OMOP Concepts; Edges are OMOP Concept_Relationships.

This classification drives traversal and scoring.
Relationships are pre-classified into semantic kinds (`PredicateKind`):

### Traversal, Paths and Scoring
- `HIERARCHY` — parent/child ontological relationships
- `IDENTITY` — mapping to standard concepts
- `COMPOSITION` — part-of relationships
- `ASSOCIATION` — lateral clinical associations
- `ATTRIBUTE` — concept attribute relationships

You can:
This classification drives traversal filtering and scoring.

* expand neighbourhoods
* extract subgraphs
* trace traversal decisions
* control which relationship kinds are followed
* discover multiple candidate paths between concepts and rank them
* render simple HTML cards for easy interactive exploration
### Traversal and Paths

```python
from omop_graph.graph.paths import find_shortest_paths
from omop_graph.extensions.omop_alchemy import ClassIDEnum

ingredient = kg.concept_id_by_code("RxNorm", "6809") # Metformin
drug = kg.concept_id_by_code("RxNorm", "860975") # Metformin 500 MG Oral Tablet
from omop_graph.extensions.omop_alchemy import PredicateKind

kg.concept_view(drug) # ConceptView(id=40163924, RxNorm:860975, name='24 HR metformin hydrochloride 500 MG Extended Release Oral Tablet')
kg.concept_view(ingredient) # ConceptView(id=1503297, RxNorm:6809, name='metformin')
ingredient = kg.concept_id_by_code("RxNorm", "6809") # Metformin
drug = kg.concept_id_by_code("RxNorm", "860975") # Metformin 500 MG Oral Tablet

paths, trace = find_shortest_paths(
kg,
source=drug,
target=ingredient,
predicate_kinds={
ClassIDEnum.HIERARCHICAL,
ClassIDEnum.IDENTITY,
},
predicate_kinds=frozenset({PredicateKind.HIERARCHY, PredicateKind.IDENTITY}),
max_depth=6,
traced=True,
)

ranked = rank_paths(kg, paths)

```

###

```python
paths = kg.find_shortest_paths(
source=a,
target=b,
max_depth=6,
)
ranked = kg.rank_paths(paths)
```

### Rendering

Outputs can be rendered as:
Outputs can be rendered as plain text, HTML (Jupyter), or Mermaid diagrams. Rendering auto-detects the environment.

* plain text (CLI / logs)
* HTML (Jupyter)
* Mermaid diagrams

Rendering auto-detects the environment.

```python
```python
from IPython.display import HTML, display
from omop_graph.render import render_trace

display(HTML(render_trace(kg, trace)))
```

---

## Project Structure
```graphql

```
omop_graph/
├── graph/ # graph logic, traversal, paths, scoring
├── render/ # HTML / text / Mermaid renderers
├── reasoning/ # Ontology traversal methods for specific reasoner tasks
├────── resolvers/ # Resolve labels for exact / fuzzy / synonym matches - TODO: embedding matches
├────── phenotypes/ # Set operations to build efficient hierarchical groupings for reasoning
├── reasoning/ # ontology traversal methods for specific reasoner tasks
│ ├── resolvers/ # resolve labels via exact / fuzzy / full-text / synonym search
│ └── phenotypes/ # set operations for hierarchical groupings
├── oaklib_interface/ # OAK-compliant adapter
├── api.py # stable public API surface
└── db/ # session helpers

```
```
File renamed without changes.
File renamed without changes.
28 changes: 28 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
services:
omop-cdm-db:
image: postgres:16-alpine
restart: always
env_file: .env
environment:
- POSTGRES_USER=${OMOP_CDM_DB_USER:-omop}
- POSTGRES_PASSWORD=${OMOP_CDM_DB_PASSWORD:-omop}
- POSTGRES_DB=${OMOP_CDM_DB_NAME:-omop}
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- db_data:/var/lib/postgresql/data
networks:
- omop-net
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${OMOP_CDM_DB_USER:-omop} -d ${OMOP_CDM_DB_NAME:-omop}"]
interval: 5s
timeout: 5s
retries: 5
ports:
- "5432:5432"

networks:
omop-net:
name: omop-net

volumes:
db_data:
6 changes: 3 additions & 3 deletions docs/graph/edges.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,16 @@ To allow reproduction and evaluation of this approach, we provide clear guidelin

??? "Expand to see the grouping classification of predicates"

{{ to_grouped_table('docs/predicate_classification.csv', [0, 1], [0, 1, 2, 3, 4], [0, 1],) }}
{{ to_grouped_table('config/predicate_classification.csv', [0, 1], [0, 1, 2, 3, 4], [0, 1],) }}

## Predicate Mappings
Following the predicate classification guidelines of the previous seciton, we calssified the following predicates into their respective classification groups.
Following the predicate classification guidelines of the previous section, we classified the following predicates into their respective classification groups.

!!! warning

This classification is currently still under development and most likely may change with increased feedback from clinicians. The respective interface to store these classifications in the OMOP CDM has been prepared and we are in talks to potentially include this classification eventually in the official OMOP CDM.

??? "Expand to see the classification of all edge connections"

{{ to_grouped_table('docs/predicate_mapping.csv', [0, 1], [0, 1, 2, 3], [0, 1], {"r_id": "relationship_id", "r_name": "relationship_name"}) }}
{{ to_grouped_table('config/predicate_mapping.csv', [0, 1], [0, 1, 2, 3], [0, 1], {"r_id": "relationship_id", "r_name": "relationship_name"}) }}

67 changes: 37 additions & 30 deletions docs/graph/kg.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,29 +27,24 @@ While the OMOP CDM is stored in a Relational Database Management System (RDBMS),

### Basic Usage

The `KnowledgeGraph` can be used standalone after connecting to the OMOP CDM database on disk.
The `KnowledgeGraph` can be used standalone after connecting to the OMOP CDM database.

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from omop_graph.graph.kg import KnowledgeGraph
from omop_graph.graph.nodes import LabelMatchKind

# Setup your SQLAlchemy session
engine = create_engine("postgresql://user:pass@localhost/omop")
SessionLocal = sessionmaker(bind=engine)

# Initialize the Virtual Knowledge Graph
kg = KnowledgeGraph(SessionLocal)
kg = KnowledgeGraph(engine)

# Lookup a concept by its label
match_group = kg.label_lookup("Atrial Fibrillation", fuzzy=False)
concept = match_group.best_match

print(f"ID: {concept.concept_id}, Name: {concept.matched_label}")
matches = kg.concept_lookup("Atrial Fibrillation", match_kind=LabelMatchKind.EXACT)
if matches:
print(f"ID: {matches[0].matched_concept_id}, Name: {matches[0].matched_concept_label}")

# Traverse the hierarchy
parents = kg.parents(concept.concept_id)
print(f"Parent IDs: {parents}")
# Traverse the hierarchy
parents = kg.parents(matches[0].matched_concept_id)
print(f"Parent IDs: {parents}")
```

---
Expand All @@ -59,41 +54,53 @@ print(f"Parent IDs: {parents}")
To enable semantic similarity and RAG-based retrieval, pass a `KnowledgeGraphEmbeddingConfiguration` when initialising the graph.
This requires the optional `omop-emb` package — see the [installation guide](../usage/installation.md#embedding-rag).

!!! info "omop-emb documentation"
`omop-emb` manages all embedding storage, backends, and retrieval. Full documentation — including backend setup, CLI reference, FAISS sidecar, and configuration — is available at [australiancancerdatanetwork.github.io/omop-emb](https://australiancancerdatanetwork.github.io/omop-emb/).

#### Read-only (pre-computed embeddings already in the DB)

Use this when embeddings have already been indexed and you only need retrieval:

```python
from sqlalchemy import create_engine
from omop_graph.graph.kg import KnowledgeGraph, KnowledgeGraphEmbeddingConfiguration
from omop_emb import BackendType, ProviderType
from omop_emb.config import BackendType, MetricType, ProviderType

engine = create_engine("postgresql://user:pass@localhost/omop")

emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
backend_type=BackendType.PGVECTOR, # or BackendType.SQLITEVEC
provider_type=ProviderType.OLLAMA,
canonical_model_name="text-embedding-3-small:0.6b",
base_storage_dir="/data/embeddings",
model_name="nomic-embed-text:v1.5", # must match the name used at ingestion time
metric_type=MetricType.COSINE,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```

The backend is resolved from `backend_type` or, as a fallback, from the `OMOP_EMB_BACKEND` environment variable.
See the [omop-emb configuration reference](https://australiancancerdatanetwork.github.io/omop-emb/usage/configuration/) for all connection variables.

#### Write-capable (generate and store embeddings at runtime)

Provide an `EmbeddingClient` to enable both reading and writing embeddings:
Provide an `EmbeddingClient` to enable both reading and writing embeddings. The `provider_type` and `model_name`
are derived automatically from the client:

```python
from omop_emb import EmbeddingClient
from omop_emb import BackendType, ProviderType
from omop_emb.config import BackendType, MetricType

client = EmbeddingClient(...) # configured for your provider
client = EmbeddingClient(
model="nomic-embed-text:v1.5",
api_base="http://ollama:11434/v1",
)

emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
base_storage_dir="/data/embeddings",
backend_type=BackendType.PGVECTOR,
metric_type=MetricType.COSINE,
client=client,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```
The `provider_type` will be automatically determined from the `client`.

#### Fallback embedding calculation

Expand All @@ -107,12 +114,12 @@ for any missing concepts on-the-fly during a similarity call.

```python
emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type="faiss",
base_storage_dir="/data/embeddings",
backend_type=BackendType.PGVECTOR,
metric_type=MetricType.COSINE,
client=client,
compute_missing_embeddings=True, # compute embeddings for concepts not yet in the store
compute_missing_embeddings=True,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```

| `compute_missing_embeddings` | `client` present | Behaviour when concepts are missing |
Expand Down
Loading
Loading