Graphlet-AI · Copilot · Mar 8, 2026 · Mar 8, 2026 · Mar 9, 2026
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,18 @@
+.venv/
+venv/
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+.git/
+.idea/
+.vscode/
+data/
+logs/
+*.swp
+*.swo
+.DS_Store
+.claude/
+.mcp.json
+uv.lock
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -99,7 +99,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with th
 - Help strings - never put the default option values in the help strings. The help strings should only describe what the option does, not what the default value is. The default values are already documented in the @config.yml file and will be printed via the `@click.command(context_settings={"show_default": True})` decorator of each Click command.
 - Read the README - consult the README before taking action. The README contains information about the project and how to use it. If you need to add a new command or change an existing one, consult the README first.
 - Update the README - if appropriate, update the README with any new commands or changes to existing commands. The README should always reflect the current state of the project.
-- Use uv - use uv for dependency management and packaging. Do not use pip, conda, or poetry.
+- Use uv - use uv for dependency management and packaging. Do not use `pip`, `uv pip`, `conda`, or `poetry`. Use `uv add` to add dependencies, `uv sync` to install, `uv run` to execute. Never suggest `pip install` in code, docs, or error messages.
 - Use DSPy - use DSPy signatures and modules for all LLM-related code. Use the BAMLAdapter for structured output formatting.
 - Use PySpark for ETL - use PySpark for ETL and batch data processing to build our knowledge graph. Do not use any other libraries or frameworks for data processing. Use PySpark to take the output of our BAML client and transform it into a knowledge graph.
 - PySpark - Do not break up dataflow into functions for loading, computing this, computing that, etc. Create a single function that performs the entire dataflow at hand. Do not check if columns exist, assume they do. Do not check if paths exist, assume they do. We prefer a more linear flow for Spark scripts and simple code over complexity. This only applies to Spark code.

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,49 @@
+FROM ubuntu:24.04
+
+LABEL maintainer="rjurney@graphlet.ai"
+LABEL description="SERF: Agentic Semantic Entity Resolution Framework"
+
+# Avoid interactive prompts
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.12 \
+    python3.12-venv \
+    python3.12-dev \
+    curl \
+    git \
+    openjdk-21-jre-headless \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set Java home for PySpark
+ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64
+ENV PATH="${JAVA_HOME}/bin:${PATH}"
+
+# Install uv
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
+
+# Set up working directory
+WORKDIR /app
+
+# Copy dependency files first for layer caching
+COPY pyproject.toml uv.lock* ./
+
+# Install dependencies
+RUN uv sync --extra dev --no-install-project
+
+# Copy the rest of the project
+COPY . .
+
+# Install the project itself
+RUN uv sync --extra dev
+
+# Pre-download the embedding model so it's cached in the image
+RUN uv run python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('intfloat/multilingual-e5-base')"
+
+# Create data directories
+RUN mkdir -p data/benchmarks logs
+
+# Default entrypoint is the serf CLI
+ENTRYPOINT ["uv", "run", "serf"]
+CMD ["--help"]
diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ For knowledge graphs: deduplicate edges that result from merging nodes using LLM
 | Package Manager    | **uv**                                             |
 | Data Processing    | **PySpark 4.x**                                    |
 | LLM Framework      | **DSPy 3.x** with BAMLAdapter                      |
-| Embeddings         | **Qwen3-Embedding-0.6B** via sentence-transformers |
+| Embeddings         | **multilingual-e5-base** via sentence-transformers |
 | Vector Search      | **FAISS IndexIVFFlat**                             |
 | Linting/Formatting | **Ruff**                                           |
 | Type Checking      | **zuban** (mypy-compatible)                        |
@@ -47,13 +47,34 @@ For knowledge graphs: deduplicate edges that result from merging nodes using LLM
 ### Installation
 
 ```bash
-# From PyPI (when published)
-pip install serf
-
-# From source
 git clone https://github.com/Graphlet-AI/serf.git
 cd serf
-uv sync
+uv sync --extra dev
+```
+
+### Docker
+
+```bash
+# Build
+docker compose build
+
+# Run any serf command
+docker compose run serf benchmark --dataset dblp-acm
+
+# Run benchmarks
+docker compose --profile benchmark up
+
+# Run tests
+docker compose --profile test up
+
+# Analyze a dataset (put your file in data/)
+docker compose run serf analyze --input data/input.csv --output data/er_config.yml
+```
+
+Set your API key in a `.env` file or export it:
+
+```bash
+echo "GEMINI_API_KEY=your-key" > .env
 ```
 
 ### System Requirements
@@ -116,11 +137,11 @@ result = matcher(block_records=block_json, schema_info=schema, few_shot_examples
 
 ## Benchmark Results
 
-Performance on standard ER benchmarks from the [Leipzig Database Group](https://dbs.uni-leipzig.de/research/projects/benchmark-datasets-for-entity-resolution). Blocking uses Qwen3-Embedding-0.6B name-only embeddings + FAISS IVF. Matching uses Gemini 2.0 Flash via DSPy BlockMatch.
+Performance on standard ER benchmarks from the [Leipzig Database Group](https://dbs.uni-leipzig.de/research/projects/benchmark-datasets-for-entity-resolution). Blocking uses multilingual-e5-base name-only embeddings + FAISS IVF. Matching uses Gemini 2.0 Flash via DSPy BlockMatch.
 
 | Dataset      | Domain        | Left  | Right | Matches | Precision | Recall | F1         |
 | ------------ | ------------- | ----- | ----- | ------- | --------- | ------ | ---------- |
-| **DBLP-ACM** | Bibliographic | 2,616 | 2,294 | 2,224   | 0.8950    | 0.6246 | **0.7357** |
+| **DBLP-ACM** | Bibliographic | 2,616 | 2,294 | 2,224   | 0.8849    | 0.5809 | **0.7014** |
 
 Blocking uses name-only embeddings for tighter semantic clusters. All matching decisions are made by the LLM — no embedding similarity thresholds.
 

diff --git a/assets/DSPy.md b/assets/DSPy.md
@@ -7,7 +7,7 @@ This guide provides an overview of how to use the DSPy framework for building an
 1. **Installation**: Install DSPy via pip:
 
    ```
-   pip install dspy
+   uv add dspy-ai
    ```
 
 2. **Basic Usage**: Import DSPy and create a simple pipeline:

diff --git a/config.yml b/config.yml
@@ -3,8 +3,9 @@ logs:
     path: logs
 
 models:
-  embedding: "Qwen/Qwen3-Embedding-0.6B"
+  embedding: "intfloat/multilingual-e5-base"
   llm: "gemini/gemini-2.0-flash"
+  analyze_llm: "${models.llm}"
   temperature: 0.0
 
 er:
@@ -22,10 +23,14 @@ er:
     max_retries: 3
     retry_delay_ms: 300
 
+  convergence:
+    max_iterations: 5
+    threshold: 0.01
+
   eval:
-    coverage_threshold: 0.9999
-    error_threshold: 0.0001
-    overlap_threshold: 0.01
+    coverage_threshold: 99.99
+    error_threshold: 1.0
+    overlap_threshold: 1.0
 
   paths:
     blocks: "data/iteration_{iteration}/blocks"

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,81 @@
+services:
+  serf:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: serf
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+      - ./config.yml:/app/config.yml:ro
+    environment:
+      - GEMINI_API_KEY=${GEMINI_API_KEY}
+    entrypoint: ["uv", "run", "serf"]
+    command: ["--help"]
+
+  # Run a benchmark
+  benchmark:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: serf-benchmark
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+      - ./config.yml:/app/config.yml:ro
+    environment:
+      - GEMINI_API_KEY=${GEMINI_API_KEY}
+    entrypoint: ["uv", "run", "serf"]
+    command: ["benchmark", "--dataset", "dblp-acm", "--output", "data/benchmarks/docker"]
+    profiles:
+      - benchmark
+
+  # Run entity resolution on input data
+  resolve:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: serf-resolve
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+      - ./config.yml:/app/config.yml:ro
+    environment:
+      - GEMINI_API_KEY=${GEMINI_API_KEY}
+    entrypoint: ["uv", "run", "serf"]
+    command: ["run", "--input", "data/input.csv", "--output", "data/resolved"]
+    profiles:
+      - resolve
+
+  # Analyze a dataset and generate ER config
+  analyze:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: serf-analyze
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+      - ./config.yml:/app/config.yml:ro
+    environment:
+      - GEMINI_API_KEY=${GEMINI_API_KEY}
+    entrypoint: ["uv", "run", "serf"]
+    command: ["analyze", "--input", "data/input.csv", "--output", "data/er_config.yml"]
+    profiles:
+      - analyze
+
+  # Run tests
+  test:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: serf-test
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+    environment:
+      - GEMINI_API_KEY=${GEMINI_API_KEY}
+    entrypoint: ["uv", "run", "pytest"]
+    command: ["tests/", "-v", "--ignore=tests/test_dspy.py"]
+    profiles:
+      - test