Skip to content

NathanMaine/neuralforge

NeuralForge

NeuralForge

Your experts. Your GPU. Your data never leaves.

NeuralForge gives every GPU owner a team of domain experts on their desk — aligned to their research, tracking their field, always available, never forgetting.

NIM TensorRT-LLM Triton NeMo Guardrails RAPIDS cuGraph CUDA

Python 3.12 License Status


What Makes This Different

Most knowledge platforms choose between cloud convenience and data control. NeuralForge refuses that trade-off.

Capability LangChain + OpenAI Cloud RAG (Pinecone) NeuralForge
Data stays on your hardware Yes
GPU-accelerated graph algorithms cuGraph (Louvain, PageRank)
Expert contradiction detection Automatic
Hallucination guardrails Partial NeMo Guardrails
Temporal knowledge tracking Valid-from/to on every edge
Multi-source ingestion (blogs, YouTube, arXiv, docs) Plugin Plugin Built-in
Single-command deployment docker compose up
PII scrubbing before storage Built-in

Architecture

                         NeuralForge Stack
    ┌──────────────────────────────────────────────────────┐
    │                                                      │
    │   ┌──────────────┐     ┌──────────────────────────┐  │
    │   │  NeuralForge  │────▶│  NIM (TensorRT-LLM)      │  │
    │   │  API :8090    │     │  Llama 3.1 8B    :8000   │  │
    │   │               │     └──────────────────────────┘  │
    │   │  FastAPI       │                                  │
    │   │  cuGraph       │     ┌──────────────────────────┐  │
    │   │  NeMo Rails    │────▶│  Triton Inference Server  │  │
    │   │  Layered CTX   │     │  Embed + Rerank  :8001   │  │
    │   │               │     └──────────────────────────┘  │
    │   │               │                                  │
    │   │               │     ┌──────────────────────────┐  │
    │   │               │────▶│  Qdrant                   │  │
    │   │               │     │  Vector DB       :6333   │  │
    │   └──────────────┘     └──────────────────────────┘  │
    │                                                      │
    │                     ┌────────────────┐                │
    │                     │  NVIDIA GPU(s)  │                │
    │                     │  CUDA 12.5      │                │
    │                     └────────────────┘                │
    └──────────────────────────────────────────────────────┘

Four containers, one GPU, zero cloud dependencies.

Container Role NVIDIA Tech Port
neuralforge-api Knowledge platform API RAPIDS cuGraph, NeMo Guardrails 8090
neuralforge-nim LLM inference (TensorRT-LLM optimized) NIM, TensorRT-LLM 8000
neuralforge-triton Embedding + reranking Triton Inference Server 8001
neuralforge-qdrant Vector storage -- 6333

Deploy in One Command

# 1. Clone
git clone https://github.com/NathanMaine/neuralforge.git
cd neuralforge

# 2. Configure
cp .env.example .env
# Edit .env and add your NGC_API_KEY

# 3. Preflight check
bash check_env.sh

# 4. Launch
docker compose up -d

That's it. Four GPU-accelerated containers, health-checked and ready.


Ingest Your First Expert

import httpx

# Add Tim Dettmers' blog as a knowledge source
resp = httpx.post("http://localhost:8090/api/v1/ingest", json={
    "source_type": "blog",
    "source_url": "https://timdettmers.com",
    "expert_name": "Tim Dettmers",
})
print(resp.json()["job_id"])

Or use the CLI examples:

# YouTube channel
python examples/ingest_youtube.py --channel "3Blue1Brown" --name "Grant Sanderson"

# arXiv papers
python examples/ingest_arxiv.py --author "Yann LeCun" --max-papers 20

# Local documents
python examples/ingest_documents.py --folder ./papers --expert "Research Team"

Ask Your Knowledge Base

curl -X POST http://localhost:8090/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LoRA and why does it work?", "include_graph_context": true}'

Every response flows through 4 context layers:

Layer Source Purpose
0 System identity Expert persona + attribution rules
1 cuGraph PageRank rankings + contradiction warnings
2 Qdrant + Triton AAAK-compressed vector search chunks
3 Deep search Full uncompressed high-relevance passages

Discover Relationships

NeuralForge doesn't just store knowledge -- it discovers structure.

Contradiction Detection

# Where do your experts disagree?
python examples/find_contradictions.py --topic "quantization"

When Expert A says "4-bit quantization preserves 99% of performance" and Expert B says "anything below 8-bit significantly degrades reasoning" -- NeuralForge surfaces that disagreement, with citations to both sources.

Expert Communities

# Who clusters together?
python examples/expert_communities.py

Louvain community detection (GPU-accelerated via cuGraph) finds natural clusters: researchers who cite each other, authors covering the same topics, institutions with shared methodologies.

Influence Ranking

# Who carries the most weight on a topic?
python examples/influence_ranking.py --topic "transformer architectures"

PageRank computed across the knowledge graph, weighted by topic-specific edge density. Not popularity -- structural authority.


GPU Graph Algorithms

NeuralForge runs graph algorithms directly on the GPU via RAPIDS cuGraph, with automatic networkx fallback for CPU-only environments.

Algorithm What It Does Use Case
PageRank Scores node importance by link structure Expert authority ranking
Louvain Detects communities in the graph Expert clustering by domain
BFS Traversal Walks the graph from any node "Show me everything connected to LoRA"
Shortest Path Finds the connection between two nodes "How is Expert A related to Concept B?"
Temporal Filtering Valid-from/to on every edge "What was the consensus in 2023?"

Safety and Governance

Every query passes through NeMo Guardrails -- three rail types protecting the pipeline:

Input Rails

  • PII Scrubbing -- personal information is stripped before it enters the system
  • Jailbreak Detection -- prompt injection attempts are caught and blocked
  • Topic Relevance -- off-topic queries are redirected

Output Rails

  • Hallucination Check -- expert citations are verified against the knowledge graph
  • Attribution Verification -- every named expert must exist as a graph node
  • Provenance Tracking -- source metadata is appended to every response

Retrieval Rails

  • Relevance Filtering -- retrieved chunks are validated for query relevance before reaching the LLM

Benchmarks

Performance measured on representative hardware. Ingestion = blog with 50 articles. Search = single query with graph context. Graph = PageRank over 10K-node graph.

Hardware Ingestion Search (p95) Graph PageRank VRAM Used
DGX Spark (GB10, 128GB) -- -- -- --
H100 (80GB) -- -- -- --
RTX 4090 (24GB) -- -- -- --
RTX 3090 (24GB) -- -- -- --

Benchmarks in progress. Hardware sponsors welcome.


Project Structure

neuralforge/
  forge/
    api/              # FastAPI application + middleware
    core/             # NIM client, Triton client, Qdrant client, embeddings
    graph/            # cuGraph engine, Parquet store, auto-discovery
    guardrails/       # NeMo Guardrails config + custom actions
    ingest/           # Blog scraper, document loader, chunker, PII scrubber
    layers/           # Layered context engine + AAAK compression
    workers/          # Background jobs (discovery, scraping)
    mcp/              # Model Context Protocol integration
  triton/models/      # Triton model configs (embedding, reranker)
  examples/           # 8 standalone quickstart scripts
  tests/              # Full test suite
  benchmarks/         # Performance measurement harness
  docs/               # Architecture, guides, NVIDIA stack docs

Documentation

Guide Description
Architecture Deep dive on system design and data flow
Quickstart 5-minute deployment guide
Ingestion Guide How to feed your data into NeuralForge
NVIDIA Stack Why each NVIDIA technology was chosen
Graph Algorithms cuGraph operations explained
Guardrails Guide Configuring NeMo Guardrails

Requirements

  • NVIDIA GPU with 16GB+ VRAM (24GB+ recommended)
  • NVIDIA Driver 535+, CUDA 12.5+
  • Docker with NVIDIA Container Toolkit
  • NGC API key (get one here)
  • 50GB+ free disk space (NIM model cache)

Built by Nathan Maine

NVIDIA Inception Member. Building tools that put GPU-accelerated AI in the hands of individual researchers and small teams.


License

Apache 2.0. See LICENSE.

About

GPU-native knowledge intelligence platform built on 6 NVIDIA technologies. Your experts. Your GPU. Your data never leaves.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages