Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Agent Skills are portable instruction sets that enhance AI coding agents with sp
| [memgraph-python-query-modules](memgraph-python-query-modules/) | Develop custom query modules in Python for Memgraph using the mgp API |
| [memgraph-cpp-query-modules](memgraph-cpp-query-modules/) | Develop custom query modules in C++ for Memgraph using the mgp.hpp API |
| [memgraph-rust-query-modules](memgraph-rust-query-modules/) | Develop custom query modules in Rust for Memgraph using rsmgp-sys |
| [memgraph-graph-rag](memgraph-graph-rag/) | Language-agnostic blueprint for GraphRAG with Memgraph and agent tooling |

## Usage

Expand Down
197 changes: 197 additions & 0 deletions memgraph-graph-rag/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
---
name: memgraph-graph-rag
description: Language-agnostic blueprint for building GraphRAG systems with Memgraph and agent tooling. Covers end-to-end architecture, schema design, ingestion, hybrid retrieval, tool contracts, and evaluation. Use when designing GraphRAG platforms that must work across multiple programming languages.
compatibility: Any language with a Bolt-compatible driver. Memgraph instance required.
metadata:
version: "0.0.1"
author: memgraph
---

# Memgraph GraphRAG for Agent Systems (Language-Agnostic)

Build Graph Retrieval-Augmented Generation systems that combine vector similarity search with graph traversal, and expose retrieval as tools for agents across any programming language.

## When to Use

Use this skill when:

- Designing a GraphRAG platform (not a single app)
- Supporting multiple client languages (Python, JS, Java, Go, etc.)
- Building agent tools for ingestion, retrieval, and diagnostics
- Combining document chunk retrieval with entity/relationship context

Do NOT use this skill for:

- Simple vector-only RAG or standalone chatbots
- Pure ETL without retrieval or agent workflows
- Use cases that require strict SQL semantics

## Outcomes

By following this skill, you will deliver:

- A graph schema that supports hybrid retrieval
- A repeatable ingestion pipeline
- A retrieval tool contract usable by agents
- An evaluation plan for recall, latency, and groundedness

## Architecture Overview

```
Sources (files, URLs, APIs)
┌──────────────────────┐
│ Ingestion Pipeline │ Parse → chunk → entity extract
└──────────────────────┘
┌──────────────────────┐
│ Memgraph │ Graph + embeddings + indexes
└──────────────────────┘
┌──────────────────────┐
│ Hybrid Retriever │ Vector search + graph expansion
└──────────────────────┘
┌──────────────────────┐
│ Agent Tools │ run_query / retrieve_context
└──────────────────────┘
┌──────────────────────┐
│ LLM Response │ Answer with citations
└──────────────────────┘
```

## Step 1: Define GraphRAG Requirements

Document the following before implementation:

- **Use cases**: Q&A, research, report generation, troubleshooting
- **Source types**: PDFs, HTML, Markdown, APIs, databases
- **Entity types**: People, systems, components, products, errors
- **Relationship types**: MENTIONS, CONNECTS_TO, DEPENDS_ON, NEXT
- **Latency targets**: P95 retrieval time, max hops for expansion

## Step 2: Choose Ingestion Strategy

Pick one of these:

- **Toolkit-based**: `unstructured2graph` + `lightrag-memgraph`
- **Custom ETL**: your parser + entity extractor + Cypher loader
- **Batch migration**: sql2graph for relational sources

Ingestion must always produce:

- Chunk nodes with text + source metadata
- Entity nodes with normalized names
- Relationships that connect entities to chunks and each other

## Step 3: Standard Graph Schema (Recommended)

Use a stable schema so any client can query consistently:

**Nodes**
- `Document` {id, title, source, created_at}
- `Chunk` {id, text, source, embedding}
- `Entity` {id, name, type, description}
- `Concept` {id, name}

**Relationships**
- `(Document)-[:HAS_CHUNK]->(Chunk)`
- `(Entity)-[:MENTIONED_IN]->(Chunk)`
- `(Entity)-[:RELATES_TO]->(Entity)`
- `(Chunk)-[:NEXT]->(Chunk)`

## Step 4: Embeddings + Vector Index

- Store embeddings on `Chunk.embedding`
- Create a vector index for `Chunk(embedding)`
- Use Memgraph’s vector search in retrieval

Keep embeddings consistent across ingestion and queries.

## Step 5: Hybrid Retrieval Strategy

Default retrieval flow:

1. **Vector search** to get top-$k$ chunks
2. **Graph expansion** to include related chunks/entities
3. **Ranking** by degree, recency, or path length
4. **Dedup** and trim to the model context budget

Recommended query pattern (pseudo-Cypher):

- Vector search on `Chunk`
- BFS traversal limited to $1–3$ hops
- Return chunk text + related entities

## Step 6: Agent Tool Contracts

Expose retrieval and diagnostics as tools. Minimal contract:

**Tool: `run_query`**
- Input: `{ cypher: string, params?: object }`
- Output: `{ rows: array }`

**Tool: `retrieve_context`**
- Input: `{ question: string, vector_k?: number, hop_limit?: number }`
- Output: `{ chunks: [{text, source, entities[]}], graph_stats }`

**Tool: `ingest_sources`**
- Input: `{ sources: string[], mode?: "append" | "replace" }`
- Output: `{ documents, chunks, entities, duration_ms }`

Agents should:

- Call `get_schema` once per session
- Use parameterized queries
- Log retrieval traces for debugging

## Step 7: Language-Agnostic Integration

Memgraph supports the Bolt protocol. Use a Bolt-compatible driver in your language of choice.

Integration responsibilities in each language:

- Create a connection pool
- Execute parameterized Cypher
- Map results to your tool contract
- Handle retries and timeouts

## Step 8: Evaluation & Observability

Minimum evaluation suite:

- **Recall@k**: relevant chunks retrieved
- **Groundedness**: answer supported by chunks
- **Latency**: P95 retrieval under target
- **Coverage**: percent of sources indexed

Log for each request:

- Query text
- Top vector hits
- Expansion hops used
- Final context length

## Guardrails

- Always answer from retrieved context
- If context is insufficient, say so
- Avoid unbounded graph traversal
- Enforce per-request timeouts

## Optional: Use Memgraph AI Toolkit

Use the Memgraph AI Toolkit for faster setup:

- `memgraph-toolbox` for core utilities
- `unstructured2graph` for parsing and ingestion
- `lightrag-memgraph` for entity extraction
- `langchain-memgraph` for agent tooling

Treat these as implementation options, not requirements.
145 changes: 145 additions & 0 deletions memgraph-graph-rag/references/REFERENCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Reference: GraphRAG with Memgraph (Agent-Oriented)

This reference supplements the skill with concrete patterns, tool contracts, and queries that are language-agnostic.

## Tool Contract Examples

### Tool: `retrieve_context`

**Input**
```json
{
"question": "string",
"vector_k": 5,
"hop_limit": 2,
"max_chunks": 10
}
```

**Output**
```json
{
"chunks": [
{
"id": "chunk-uuid",
"text": "...",
"source": "url|path",
"entities": ["EntityA", "EntityB"]
}
],
"graph_stats": {
"vector_hits": 5,
"expanded_chunks": 12,
"hops": 2
}
}
```

### Tool: `run_query`

**Input**
```json
{
"cypher": "MATCH (n) RETURN count(n) AS cnt",
"params": {}
}
```

**Output**
```json
{
"rows": [{"cnt": 1234}]
}
```

## Schema Template

**Nodes**
- `Document` {id, title, source, created_at}
- `Chunk` {id, text, source, embedding}
- `Entity` {id, name, type, description}
- `Concept` {id, name}

**Relationships**
- `(Document)-[:HAS_CHUNK]->(Chunk)`
- `(Entity)-[:MENTIONED_IN]->(Chunk)`
- `(Entity)-[:RELATES_TO]->(Entity)`
- `(Chunk)-[:NEXT]->(Chunk)`

## Core Indexes

```cypher
CREATE INDEX ON :Document(id);
CREATE INDEX ON :Chunk(id);
CREATE INDEX ON :Entity(name);
CREATE INDEX ON :Chunk;
```

Vector index (dimension must match your embedding model):

```cypher
CREATE VECTOR INDEX vs_chunks
ON :Chunk(embedding)
WITH CONFIG {"dimension": 384, "capacity": 100000};
```

## Retrieval Patterns

### Pattern A: Vector → Chunk → Entity Context

```cypher
CALL embeddings.text([$question]) YIELD embeddings
CALL vector_search.search('vs_chunks', $vector_k, embeddings[0])
YIELD node, similarity
OPTIONAL MATCH (e:Entity)-[:MENTIONED_IN]->(node)
RETURN node.text AS text, similarity, collect(e.name) AS entities
ORDER BY similarity DESC
LIMIT $max_chunks;
```

### Pattern B: Vector + BFS Expansion (bounded)

```cypher
CALL embeddings.text([$question]) YIELD embeddings
CALL vector_search.search('vs_chunks', $vector_k, embeddings[0]) YIELD node
MATCH (node)-[*bfs..$hop_limit]-(expanded:Chunk)
WITH DISTINCT expanded, degree(expanded) AS importance
ORDER BY importance DESC
RETURN expanded.text AS text
LIMIT $max_chunks;
```

### Pattern C: Sequential Context Around Hits

```cypher
CALL embeddings.text([$question]) YIELD embeddings
CALL vector_search.search('vs_chunks', $vector_k, embeddings[0]) YIELD node
OPTIONAL MATCH (prev:Chunk)-[:NEXT]->(node)
OPTIONAL MATCH (node)-[:NEXT]->(next:Chunk)
RETURN prev.text AS previous, node.text AS matched, next.text AS next;
```

## Safe Parameterization

Always pass user inputs via parameters instead of string interpolation.

```cypher
CALL embeddings.text([$question]) YIELD embeddings
CALL vector_search.search('vs_chunks', $vector_k, embeddings[0]) YIELD node
RETURN node.text AS text
LIMIT $max_chunks;
```

## Evaluation Checklist

- **Recall@k**: sample questions with known answers
- **Groundedness**: answer must quote or cite retrieved chunks
- **Latency**: P95 retrieval duration
- **Coverage**: % of sources ingested and indexed

## Guardrails

- Enforce max hops and max chunks
- Reject queries without a vector index
- Log query traces (vector hits + expansions)
- If context is insufficient, respond accordingly
4 changes: 2 additions & 2 deletions memgraph-python-query-modules/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: memgraph-python-query-modules
description: Develop custom query modules in Python for Memgraph graph database. Use when creating custom graph algorithms, procedures (@mgp.read_proc, @mgp.write_proc), or functions (@mgp.function) for Memgraph. Covers the mgp Python API, graph traversal, data transformations, and module deployment.
compatibility: Requires Memgraph instance (Docker recommended), Python 3.5+
compatibility: Requires Memgraph instance (Docker recommended), Python 3.7+
metadata:
version: "0.0.1"
author: memgraph
Expand All @@ -23,7 +23,7 @@ Develop custom query modules in Python for Memgraph graph database.
## Prerequisites

- Memgraph instance running
- Python 3.5.0+
- Python 3.7.0+
- Install mgp locally for development: `pip install mgp`

## Quick Reference
Expand Down