NeuralForge

Your experts. Your GPU. Your data never leaves.

NeuralForge gives every GPU owner a team of domain experts on their desk — aligned to their research, tracking their field, always available, never forgetting.

What Makes This Different

Most knowledge platforms choose between cloud convenience and data control. NeuralForge refuses that trade-off.

Capability	LangChain + OpenAI	Cloud RAG (Pinecone)	NeuralForge
Data stays on your hardware			Yes
GPU-accelerated graph algorithms			cuGraph (Louvain, PageRank)
Expert contradiction detection			Automatic
Hallucination guardrails	Partial		NeMo Guardrails
Temporal knowledge tracking			Valid-from/to on every edge
Multi-source ingestion (blogs, YouTube, arXiv, docs)	Plugin	Plugin	Built-in
Single-command deployment			`docker compose up`
PII scrubbing before storage			Built-in

Architecture

                         NeuralForge Stack
    ┌──────────────────────────────────────────────────────┐
    │                                                      │
    │   ┌──────────────┐     ┌──────────────────────────┐  │
    │   │  NeuralForge  │────▶│  NIM (TensorRT-LLM)      │  │
    │   │  API :8090    │     │  Llama 3.1 8B    :8000   │  │
    │   │               │     └──────────────────────────┘  │
    │   │  FastAPI       │                                  │
    │   │  cuGraph       │     ┌──────────────────────────┐  │
    │   │  NeMo Rails    │────▶│  Triton Inference Server  │  │
    │   │  Layered CTX   │     │  Embed + Rerank  :8001   │  │
    │   │               │     └──────────────────────────┘  │
    │   │               │                                  │
    │   │               │     ┌──────────────────────────┐  │
    │   │               │────▶│  Qdrant                   │  │
    │   │               │     │  Vector DB       :6333   │  │
    │   └──────────────┘     └──────────────────────────┘  │
    │                                                      │
    │                     ┌────────────────┐                │
    │                     │  NVIDIA GPU(s)  │                │
    │                     │  CUDA 12.5      │                │
    │                     └────────────────┘                │
    └──────────────────────────────────────────────────────┘

Four containers, one GPU, zero cloud dependencies.

Container	Role	NVIDIA Tech	Port
`neuralforge-api`	Knowledge platform API	RAPIDS cuGraph, NeMo Guardrails	8090
`neuralforge-nim`	LLM inference (TensorRT-LLM optimized)	NIM, TensorRT-LLM	8000
`neuralforge-triton`	Embedding + reranking	Triton Inference Server	8001
`neuralforge-qdrant`	Vector storage	--	6333

Deploy in One Command

# 1. Clone
git clone https://github.com/NathanMaine/neuralforge.git
cd neuralforge

# 2. Configure
cp .env.example .env
# Edit .env and add your NGC_API_KEY

# 3. Preflight check
bash check_env.sh

# 4. Launch
docker compose up -d

That's it. Four GPU-accelerated containers, health-checked and ready.

Ingest Your First Expert

import httpx

# Add Tim Dettmers' blog as a knowledge source
resp = httpx.post("http://localhost:8090/api/v1/ingest", json={
    "source_type": "blog",
    "source_url": "https://timdettmers.com",
    "expert_name": "Tim Dettmers",
})
print(resp.json()["job_id"])

Or use the CLI examples:

# YouTube channel
python examples/ingest_youtube.py --channel "3Blue1Brown" --name "Grant Sanderson"

# arXiv papers
python examples/ingest_arxiv.py --author "Yann LeCun" --max-papers 20

# Local documents
python examples/ingest_documents.py --folder ./papers --expert "Research Team"

Ask Your Knowledge Base

curl -X POST http://localhost:8090/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LoRA and why does it work?", "include_graph_context": true}'

Every response flows through 4 context layers:

Layer	Source	Purpose
0	System identity	Expert persona + attribution rules
1	cuGraph	PageRank rankings + contradiction warnings
2	Qdrant + Triton	AAAK-compressed vector search chunks
3	Deep search	Full uncompressed high-relevance passages

Discover Relationships

NeuralForge doesn't just store knowledge -- it discovers structure.

Contradiction Detection

# Where do your experts disagree?
python examples/find_contradictions.py --topic "quantization"

When Expert A says "4-bit quantization preserves 99% of performance" and Expert B says "anything below 8-bit significantly degrades reasoning" -- NeuralForge surfaces that disagreement, with citations to both sources.

Expert Communities

# Who clusters together?
python examples/expert_communities.py

Louvain community detection (GPU-accelerated via cuGraph) finds natural clusters: researchers who cite each other, authors covering the same topics, institutions with shared methodologies.

Influence Ranking

# Who carries the most weight on a topic?
python examples/influence_ranking.py --topic "transformer architectures"

PageRank computed across the knowledge graph, weighted by topic-specific edge density. Not popularity -- structural authority.

GPU Graph Algorithms

NeuralForge runs graph algorithms directly on the GPU via RAPIDS cuGraph, with automatic networkx fallback for CPU-only environments.

Algorithm	What It Does	Use Case
PageRank	Scores node importance by link structure	Expert authority ranking
Louvain	Detects communities in the graph	Expert clustering by domain
BFS Traversal	Walks the graph from any node	"Show me everything connected to LoRA"
Shortest Path	Finds the connection between two nodes	"How is Expert A related to Concept B?"
Temporal Filtering	Valid-from/to on every edge	"What was the consensus in 2023?"

Safety and Governance

Every query passes through NeMo Guardrails -- three rail types protecting the pipeline:

Input Rails

PII Scrubbing -- personal information is stripped before it enters the system
Jailbreak Detection -- prompt injection attempts are caught and blocked
Topic Relevance -- off-topic queries are redirected

Output Rails

Hallucination Check -- expert citations are verified against the knowledge graph
Attribution Verification -- every named expert must exist as a graph node
Provenance Tracking -- source metadata is appended to every response

Retrieval Rails

Relevance Filtering -- retrieved chunks are validated for query relevance before reaching the LLM

Benchmarks

Performance measured on representative hardware. Ingestion = blog with 50 articles. Search = single query with graph context. Graph = PageRank over 10K-node graph.

Hardware	Ingestion	Search (p95)	Graph PageRank	VRAM Used
DGX Spark (GB10, 128GB)	--	--	--	--
H100 (80GB)	--	--	--	--
RTX 4090 (24GB)	--	--	--	--
RTX 3090 (24GB)	--	--	--	--

Benchmarks in progress. Hardware sponsors welcome.

Project Structure

neuralforge/
  forge/
    api/              # FastAPI application + middleware
    core/             # NIM client, Triton client, Qdrant client, embeddings
    graph/            # cuGraph engine, Parquet store, auto-discovery
    guardrails/       # NeMo Guardrails config + custom actions
    ingest/           # Blog scraper, document loader, chunker, PII scrubber
    layers/           # Layered context engine + AAAK compression
    workers/          # Background jobs (discovery, scraping)
    mcp/              # Model Context Protocol integration
  triton/models/      # Triton model configs (embedding, reranker)
  examples/           # 8 standalone quickstart scripts
  tests/              # Full test suite
  benchmarks/         # Performance measurement harness
  docs/               # Architecture, guides, NVIDIA stack docs

Documentation

Guide	Description
Architecture	Deep dive on system design and data flow
Quickstart	5-minute deployment guide
Ingestion Guide	How to feed your data into NeuralForge
NVIDIA Stack	Why each NVIDIA technology was chosen
Graph Algorithms	cuGraph operations explained
Guardrails Guide	Configuring NeMo Guardrails

Requirements

NVIDIA GPU with 16GB+ VRAM (24GB+ recommended)
NVIDIA Driver 535+, CUDA 12.5+
Docker with NVIDIA Container Toolkit
NGC API key (get one here)
50GB+ free disk space (NIM model cache)

Built by Nathan Maine

NVIDIA Inception Member. Building tools that put GPU-accelerated AI in the hands of individual researchers and small teams.

GitHub: NathanMaine
HuggingFace: Nathan-Maine
LinkedIn: Nathan Maine

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
examples		examples
forge		forge
tests		tests
triton/models		triton/models
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
check_env.sh		check_env.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralForge

What Makes This Different

Architecture

Deploy in One Command

Ingest Your First Expert

Ask Your Knowledge Base

Discover Relationships

Contradiction Detection

Expert Communities

Influence Ranking

GPU Graph Algorithms

Safety and Governance

Input Rails

Output Rails

Retrieval Rails

Benchmarks

Project Structure

Documentation

Requirements

Built by Nathan Maine

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuralForge

What Makes This Different

Architecture

Deploy in One Command

Ingest Your First Expert

Ask Your Knowledge Base

Discover Relationships

Contradiction Detection

Expert Communities

Influence Ranking

GPU Graph Algorithms

Safety and Governance

Input Rails

Output Rails

Retrieval Rails

Benchmarks

Project Structure

Documentation

Requirements

Built by Nathan Maine

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages