A stateful, graph-orchestrated hybrid RAG platform that unifies retrieval, reasoning, and tool execution under a single agentic workflow.
Traditional RAG systems often struggle with multi-hop reasoning, tool-dependent queries, and dynamic knowledge integration. Vector similarity search alone is insufficient for complex analytical tasks that require structured graph traversal, conditional branching, and computational execution.
LangGraph-Agentic-Graph RAG introduces a stateful graph-orchestrated architecture built on LangGraph and powered by SGLang.
The system unifies knowledge retrieval and tool execution within a single checkpointed workflow, enabling intelligent routing, quality-gated backtracking, and Graph-of-Thought expansion for adaptive multi-path reasoning.
LangGraph-Agentic-Graph RAG is an intelligent hybrid RAG platform powered by **LangGraph + SGLang** that seamlessly integrates knowledge retrieval with tool execution.
The system features:
- Dual-mode query processing: Automatically routes between knowledge-based RAG retrieval and tool execution based on LLM-powered intent classification
- Advanced document ingestion: Converts raw documents (PDF/images/audio) into Markdown chunks and structured graph metadata via LangGraph state machines with checkpoint persistence
- Intelligent retrieval routing: Hop-based router with quality-gate backtracking dynamically selects among three retrieval paths (Vector, Weaviate Cross-Reference GraphRAG, or Neo4j Deep Graph Traversal) based on query complexity
- Tool calling framework: MCP (Model Context Protocol) server integration with local fallback for calculator, API calls, code execution, and database queries
- Graph-of-Thought reasoning: Multi-branch exploration with snapshot-based backtracking for complex analytical queries
-
LangGraph state machines: All workflows (ingestion, query reasoning, tool execution, summarization, mindmap generation) run on LangGraph
StateGraphwithMemorySavercheckpointing for full state persistence and recovery. -
Intelligent query routing: LLM-powered intent classification automatically determines whether a query requires knowledge retrieval or computational tool execution:
- Knowledge queries β RAG pipeline with 3-way retrieval routing
- Calculation queries β Calculator tool with AST-based safe evaluation supporting advanced math (sqrt, log, trig, sigma)
- Database queries β SQL executor (planned)
- API calls β HTTP API caller with configurable endpoints
- Code execution β Python sandbox with restricted built-ins
-
Checkpoint & intelligent backtracking: Every node transition is checkpointed; the quality gate evaluates retrieval results and triggers intelligent path selection when quality is insufficient:
- Quality evaluation: Observer LLM scores each path result (0.0β1.0) against
QUALITY_GATE_THRESHOLD - Smart path selection:
PathSelectoranalyzes remaining untried paths based on query keywords, hop count, and path characteristics to select the most suitable alternative - Backtrack limits: Configurable via
MAX_BACKTRACK_COUNTto prevent infinite loops - State tracking:
tried_pathsfield prevents re-attempting failed strategies
- Quality evaluation: Observer LLM scores each path result (0.0β1.0) against
-
3-way retrieval routing: Query complexity (hop count) determines the optimal retrieval strategy:
- Path 1 β Vector RAG (β€ 2 hops): Fast semantic similarity search on late-chunked TextDocument corpus. Ideal for direct factual questions.
- Path 2 β Weaviate Cross-Reference GraphRAG (3β5 hops): BM25 seed entity search followed by multi-hop cross-reference traversal (source/target/event refs) within Weaviate. Surfaces query-adjacent entities and events through relationship walking.
- Path 3 β Neo4j Deep Graph Traversal (β₯ 6 hops): Cypher-based deep graph exploration for schema-intensive relationship reasoning. Handles complex multi-entity queries requiring extensive graph traversal.
- Hop classification: Hybrid LLM + heuristic approach estimates query complexity, with LLM primary classification and keyword-based fallback
-
Graph-of-Thought expansion: Multi-branch reasoning with snapshot-based backtracking for complex analytical queries:
- Branch exploration: Each step fans out
GOT_BRANCH_FACTORcandidate queries in parallel - Quality scoring: Observer LLM evaluates each branch (0.0β1.0) for relevance, coverage, and novelty
- Intelligent merging: Branches above
GOT_THOUGHT_SCORE_THRESHOLDare merged via configurable strategy (top_k/weighted_union/vote) - Edge pruning: Low-quality connections removed by keyword-overlap scoring (
GOT_EDGE_PRUNE_THRESHOLD) - Failure recovery: Consecutive all-branch failures trigger snapshot rollback to last successful merge point
- Branch exploration: Each step fans out
-
SGLang inference ecosystem: All LLM operations (generation, embedding, reranking, hop classification, quality evaluation) run on SGLang servers with intelligent lifecycle management:
- Lazy-loading architecture: Servers auto-start on first request, eliminating cold-start overhead during initialization
- Idle timeout: GPU memory automatically released after
SGLANG_IDLE_TIMEOUT(default 60s) of inactivity - GPU allocation: Configurable device assignment and memory fractions per server (generator, embedding, reranker, refiner)
- Keepalive mechanism: Background thread maintains server health during long-running operations
- Chunk retry logic: LLM metadata extraction auto-retries failed chunks with server restart (configurable via
GRAPH_EXTRACTOR_RETRY_ON_FAILURE)
-
Tool calling framework: MCP (Model Context Protocol) server integration with local fallback:
- MCP-first architecture: Primary execution via MCP server when available (
MCP_SERVER_ENABLED=true) - Local fallback: Automatic fallback to local implementations when MCP server is unavailable
- Calculator: AST-based safe expression evaluation supporting advanced math functions (sqrt, cbrt, log, exp, sin, cos, tan, sigma)
- Natural language parsing: Converts Korean/English math expressions ("144μ μ κ³±κ·Ό", "λ£¨νΈ 144") to executable code
- API caller: HTTP request execution with configurable endpoints and methods
- Code runner: Python sandbox with restricted built-ins and token filtering for security
- SQL executor: Placeholder for future database query support
- MCP-first architecture: Primary execution via MCP server when available (
-
Automatic graph construction: End-to-end pipeline from raw documents to queryable knowledge graph:
- OCR processing: SGLang-powered OCRFlux converts documents to Markdown
- LLM extraction: Entity/event/relation extraction from Markdown chunks with configurable chunk size and timeout
- Dual storage: Simultaneous upsert to Weaviate (cross-reference graph) and Neo4j (deep graph) with deterministic UUIDs
- Schema management: Automatic Weaviate collection creation with cross-reference definitions
-
Async job monitoring: Full task lifecycle tracking through REST APIs:
- Upload progress: Real-time status for file upload and processing stages
- OCR progress: Per-page OCR completion tracking
- Embedding progress: Chunk-level indexing status
- Task cancellation: Graceful termination of long-running operations
ββββββββββββββββ βββββββββββββββββββββββββββββββ
β Input Layer β β β LangGraph Upload Pipeline β β Markdown + *.graph.json
β (PDF/IMG/β¦) β β (MemorySaver checkpoint) β
ββββββββββββββββ βββββββββββββββββββββββββββββββ
β
ββββββββββββββββ βββββββββββββββββββββββββββββββ
β User Query β β β LangGraph RAG Workflow β
ββββββββββββββββ β (MemorySaver checkpoint) β
βββββββββββββββββββββββββββββββ
β
βββββββββββββββ΄βββββββββββββββ
β Tool Router (LLM intent) β
β (ToolExecutor.classify) β
βββββββββββββββ¬βββββββββββββββ
β
βββββββββββββββ΄βββββββββββββββ
β β
Knowledge Query Computational Task
β β
βββββββββββββ΄βββββββββββββββ βββββββββ΄βββββββββ
β RAG Router (reasoner) β β Tool Executor β
β HopClassifier+PathSelect β β (MCP/Local) β
βββββββββββββ¬βββββββββββββββ ββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β β β
Path 1 Path 2 Path 3
VectorRetriever CrossRefRetriever GraphDBRetriever
BM25 Search Weaviate Ref Neo4j Cypher
(β€ 2 hop) (3β5 hop) (β₯ 6 hop)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β Quality Gate (QualityEvaluator) β
β Observer LLM (QUALITY_GATE_THRESHOLD) β
ββββββββββββββββββ¬βββββββββββββββββββββββ
β
βββββββββββΌβββββββββββββββ
β GoT Thought Expander β
β Branch merge + pruning β
βββββββββββ¬βββββββββββββββ
β
ββββββββΌβββββββ
β LLM Answer β
βββββββββββββββ
Reasoner Module (backend/notebooklm/reasoner/):
HopClassifier: Query complexity estimation (LLM + heuristic fallback)PathSelector: Optimal retrieval path selection for backtrackingQualityEvaluator: Observer LLM-based result quality assessmentVectorRetriever,CrossRefRetriever,GraphDBRetriever: Modular retrieval implementations
planner β tool_router β¬β rag_router ββ¬β vector_retriever ββββ
β ββ crossref_retriever ββββ quality_gate ββ¬β thought_expander β aggregator β END
β ββ graphdb_retriever ββββ ββ rag_router (backtrack)
β
ββ tool_executor βββββββββββββββββββββββββββββββββββββββββ aggregator β END
Tool Router: LLM-powered intent classification determines query routing strategy:
- Classification endpoint: Configurable via
TOOL_INTENT_CLASSIFIER_ENDPOINT(defaults to SGLang generator) - Intent categories:
knowledge,calculation,database,api_call,code_exec - Routing logic:
knowledgeβ RAG pipeline (3-way retrieval routing)- Other intents β Tool executor (MCP/local fallback)
- Fallback mechanism: Heuristic keyword matching when LLM classification fails
Tool Executor (backend/notebooklm/tools/tool_executor.py):
- MCP server integration with local fallback
- Supports: Calculator, API Caller, Code Runner, SQL Executor (planned)
- Details in Query Processing Flow section below
Handled by LangGraphUploadPipeline (langgraph_upload_pipeline.py) with MemorySaver checkpointing across all nodes:
- Conversion & Layout:
run_file_processor.pyhandles PDF/Office/image/audio inputs βResults/1.Converted_images+Results/2.LayoutDetection. - OCR & Markdown:
run_ocr_processing()with SGLang-powered OCRFlux produces per-page Markdown βResults/4.OCR_results. - LLM Metadata Extraction:
LLMMetadataExtractorextracts entities/events/relations from Markdown βResults/8.graph_metadata/*.graph.json.- Chunk size: configurable via `GRAPH_EXTRACTOR_CHUNK_SIZE
- Timeout: configurable via `GRAPH_EXTRACTOR_API_TIMEOUT
- Retry logic: On timeout, SGLang generator server restarts and retries the same chunk once
- Keepalive: Background thread touches server every
SGLANG_KEEPALIVE_INTERVALseconds during processing
- Graph Upsert:
GraphSchemaManagerensures Weaviate GraphEntity/GraphEvent/GraphRelation collections exist with cross-references (source/target/event)LegacyGraphIngestor/Neo4jManagerMERGEs nodes/relationships into Neo4j with deterministic UUIDs
- Late Chunking & Embedding:
embedding_text.pysplits Markdown into chunks and uploads into the Weaviate TextDocument collection viaSharedEmbeddingModel(model configurable viaEMBEDDING_MODEL).
backend/
βββ main.py # FastAPI server entry point
βββ config.py # Server-level configuration
βββ logging_config.py # Logging configuration
β
βββ api/ # API layer
β βββ routes.py # Main upload/file/session routes
β βββ chat.py # POST /v1/chat endpoint
β βββ ocr_routes.py # OCR processing endpoints
β βββ pause_api.py # Task pause/resume API
β
βββ notebooklm/ # RAG core modules
β βββ config.py # Model/path/graph configuration
β βββ rag_pipeline.py # LangGraph RAG workflow orchestrator
β βββ graph_reasoner.py # LangGraph workflow orchestration
β βββ graph_schema.py # Weaviate Entity/Event/Relation schema
β βββ hop_classifier.py # Query complexity estimator
β βββ reasoner/ # Refactored GraphReasoner modules
β β βββ state.py # GraphReasonerState definition
β β βββ routing.py # PathSelector, HopClassifier
β β βββ quality.py # QualityEvaluator
β β βββ retrievers.py # VectorRetriever, CrossRefRetriever, GraphDBRetriever
β β βββ __init__.py
β βββ legacy_graph_client.py # Neo4j Cypher traversal client
β βββ legacy_graph_ingestor.py # Neo4j upsert helper
β βββ embedding_text.py # Late chunking + Weaviate text indexing
β βββ embedding_image.py # Image embedding + Weaviate image indexing
β βββ image_processor.py # Image processing utilities
β βββ shared_embedding.py # SGLang embedding/reranker client (singleton)
β βββ sglang_server_manager.py # SGLang server lifecycle manager
β βββ generator.py # LLM answer generation
β βββ refiner.py # Answer refinement
β βββ evaluator.py # Answer quality evaluation
β βββ router.py # Query type routing
β βββ query_rewriter.py # Query rewriting
β βββ parallel_search.py # Parallel text+image search
β βββ weaviate_utils.py # Weaviate client utilities
β βββ clean_weaviate.py # Weaviate + Neo4j data cleanup script
β βββ tools/ # Tool calling & MCP integration
β β βββ mcp_client.py # MCP server REST client
β β βββ tool_executor.py # Tool execution (MCP/local fallback)
β β βββ __init__.py
β βββ rag_text/ # Text search + reranker
β βββ rag_image/ # Image search + reranker
β
βββ data_pipeline/ # Data processing pipeline
β βββ pipe/
β βββ langgraph_upload_pipeline.py # LangGraph upload workflow (checkpointed)
β βββ llm_metadata_extractor.py # Entity/event/relation extraction
β βββ neo4j_manager.py # Neo4j upsert manager
β βββ run_file_processor.py # Convert/layout/OCR orchestrator
β βββ pipeline_image.py # Image pipeline
β βββ pipeline_sound.py # Audio pipeline
β βββ main_pipe/
β βββ ocr_pipe/ # SGLang-based OCRFlux engine
β βββ udp_pdftopng_300dpi.py # PDF β PNG conversion
β βββ udp_layoutdetection.py # Layout detection
β
βββ services/ # Business logic services
β βββ model_manager.py # LazyModelManager (GPU lifecycle)
β βββ ocr_vision_manager.py # OCR engine management
β βββ rag_service.py # RAG service orchestration
β
βββ utils/
βββ task_queue.py # GPU task queue (async job management)
βββ helpers.py # Shared utility functions
βββ path_helpers.py # Path calculation helpers
βββ file_utils.py # File operation utilities
- File upload (
POST /upload/files)api/routes.pystores files in per-session folders and enqueuesrun_processing_pipelineviatask_queue.py.
- GPU task queue
task_queue.pymanages sequential GPU-bound tasks (convert β layout β OCR) with progress tracking.
- Text indexing (
run_text_indexing/run_text_indexing_v2)- Initializes
SharedEmbeddingModelβ runsprocess_markdown_filesfor Weaviate late-chunking indexing.
- Initializes
- Graph extraction & ingestion
LLMMetadataExtractorproduces*.graph.jsonβGraphSchemaManagerupserts to Weaviate βNeo4jManager/LegacyGraphIngestorupserts to Neo4j.
- Storage state
- Weaviate: TextDocument + GraphEntity/Event/Relation collections.
- Neo4j: Entity/Event nodes + relation edges with deterministic UUIDs and auto-created constraints.
-
Request initiation:
POST /v1/chatβRAGPipeline.process_query()invokesGraphReasoner.retrieve() -
LangGraph workflow execution (
graph_reasoner.py):Step 1: Planner (
plannernode)- Analyzes user query and extracts key concepts
- Records high-level search plan and reasoning steps
- Initializes query history for multi-turn context
- Output:
plan,query_analysis
Step 2: Tool Router (
tool_routernode)- LLM classifies query intent via
ToolExecutor.classify_intent():knowledgeβ routes to rag_router (knowledge retrieval path)calculationβ routes to tool_executor (calculator)databaseβ routes to tool_executor (SQL executor)api_callβ routes to tool_executor (API caller)code_execβ routes to tool_executor (code runner)
- Output:
intent, routing decision
Branch A: Knowledge Query Path
Step 3a: RAG Router (
rag_routernode, for knowledge queries)- Performs hop classification (LLM + heuristic hybrid)
- Sets
max_hops=min(llm_estimate, heuristic_estimate, GRAPH_MAX_HOPS) - Selects initial retrieval path:
- hop β€ 2 β
vector_retriever(Path 1) - hop 3β5 β
crossref_retriever(Path 2) - hop β₯ 6 β
graphdb_retriever(Path 3)
- hop β€ 2 β
- Output:
max_hops,retrieval_path,tried_paths
Step 4a: Retrieval Execution (Path 1/2/3)
- Path 1: semantic search + reranker on TextDocument
- Path 2: BM25 seed search + Weaviate cross-reference multi-hop traversal
- Path 3: Neo4j Cypher deep graph traversal
- Output:
context_snippets,entities,events,relations
Step 5a: Quality Gate (
quality_gatenode)- Observer LLM scores retrieval result (0.0β1.0)
- If quality β₯
QUALITY_GATE_THRESHOLD:- Proceeds to aggregator or thought expander (if GoT enabled)
- If quality < threshold:
- Triggers intelligent backtracking:
PathSelector.select_best_path()analyzes remaining paths- Scores based on query keywords, hop count, path characteristics
- Selects most suitable alternative (not random)
- Returns to Step 4a with new path
- Termination: After
MAX_BACKTRACK_COUNTretries or all paths exhausted
- Triggers intelligent backtracking:
- Output:
retrieval_quality,backtrack_count,tried_paths
Step 6a: Thought Expander (
thought_expandernode, ifGOT_MODE_ENABLED=true)- Fans out
GOT_BRANCH_FACTORcandidate queries in parallel - Observer LLM scores each branch (0.0β1.0)
- Merges branches above
GOT_THOUGHT_SCORE_THRESHOLDvia strategy - Prunes low-quality edges (<
GOT_EDGE_PRUNE_THRESHOLD) - Snapshot-based backtracking on consecutive failures
- Output:
thought_steps, expanded context
Branch B: Tool Execution Path
Step 3b: Tool Executor (
tool_executornode, for computational tasks)- Maps intent to specific tool:
calculationβ Calculator (AST-based safe evaluation)api_callβ API Caller (HTTP requests)code_execβ Code Runner (Python sandbox)databaseβ SQL Executor (placeholder)
- Prepares tool inputs:
- Calculator: Extracts math expression, converts natural language ("144μ μ κ³±κ·Ό" β
sqrt(144)) - API Caller: Extracts URL from prompt
- Code Runner: Extracts code snippet
- Calculator: Extracts math expression, converts natural language ("144μ μ κ³±κ·Ό" β
- Executes tool:
- MCP-first: Attempts execution via MCP server REST API
- Local fallback: Falls back to local implementation if MCP unavailable
- Output:
tool_resultwith status (ok/error), result value, metadata - Note: Tool failures return error messages; does NOT fall back to RAG
Step 7: Aggregator (
aggregatornode)- For RAG queries: Builds context snippets from entities/events/relations/thoughts
- For tool queries: Formats tool result into natural language:
- Calculator:
"expression = result"(e.g.,"sqrt(144) = 12.0") - API/Code: Result value or success message
- Errors: User-friendly error messages
- Calculator:
- Collects metadata:
context_snippets,thought_steps,backtrack_count,tried_paths,tool_result - Output: Aggregated context or formatted tool result
-
Answer generation:
- For RAG queries:
generator.pysynthesizes answer from original query + context snippets - For tool queries: Returns formatted tool result directly (no LLM generation needed)
- Optional post-processing:
refiner.pypolishes answer,evaluator.pylogs quality notes
- For RAG queries:
-
Response construction (
RAGPipeline._build_response()or_build_tool_only_response()):- For RAG queries: Full response with answer, context, snippets, search results
- For tool queries: Streamlined response with tool result as answer
- Metadata included:
plan,max_hops,retrieval_quality,backtrack_count,tried_paths,thought_steps,tool_result(if applicable) - Debugging support: All workflow state exposed for traceability
-
Return to client: JSON response with answer, metadata, and debugging information
- Graph RAG toggles:
GRAPH_RAG_ENABLED,LANGGRAPH_ENABLED,GOT_MODE_ENABLED,GRAPH_MAX_HOPS. - GoT tuning:
GOT_BRANCH_FACTOR,GOT_MERGE_STRATEGY(top_k/weighted_union/vote),GOT_MERGE_TOP_K,GOT_THOUGHT_SCORE_THRESHOLD,GOT_EDGE_PRUNE_THRESHOLD,GOT_MAX_STEPS,GOT_MAX_CONSECUTIVE_FAILURES,GOT_OBSERVER_ENDPOINT/GOT_OBSERVER_MODEL. - Neo4j:
NEO4J_URI,NEO4J_USER,NEO4J_PASSWORD,GRAPH_MAX_HOPS. - Weaviate:
WEAVIATE_HOST/PORT,WEAVIATE_TEXT_CLASS(TextDocument),WEAVIATE_VECTORIZER(text2vec-model2vec). - SGLang models:
LLM_MODEL,EMBEDDING_MODEL,RERANKER_MODEL,REFINER_MODEL,QUERY_REWRITER_MODEL. - SGLang servers:
SGLANG_GENERATOR_ENDPOINT,SGLANG_EMBEDDING_ENDPOINT,SGLANG_RERANKER_ENDPOINT,SGLANG_REFINER_ENDPOINT,SGLANG_QUERY_REWRITER_ENDPOINT. - SGLang lifecycle:
SGLANG_IDLE_TIMEOUT(60s),SGLANG_KEEPALIVE_INTERVAL(20s). - Graph extractor:
GRAPH_EXTRACTOR_API_TIMEOUT(60s),GRAPH_EXTRACTOR_CHUNK_SIZE(800),GRAPH_EXTRACTOR_RETRY_ON_FAILURE(true). - Session directories:
DATA_ROOT/Results,sessions/<id>layout.
| Endpoint | Description |
|---|---|
POST /upload/files |
Triggers the LangGraph upload pipeline |
POST /v1/chat |
Runs 3-way RAG with checkpoint/backtracking |
GET /api/v1/tasks/{task_id} |
Monitors queued upload/OCR tasks |
GET /files |
Lists session artifacts |
POST /pause |
Pauses/resumes background tasks |
sglang_embedding_server.log,sglang_reranker_server.logβ SGLang model server health.Results/8.graph_metadata/*.graph.jsonβ archive of LLM extraction results.LegacyGraphIngestorauto-creates constraints on first run; no manual setup required.SGLangServerManagerreleases GPU memory after 60 seconds of idling (configurable viaSGLANG_IDLE_TIMEOUT).- All LangGraph workflows log checkpoint IDs and backtrack counts for traceability.
- SGLang cold start: lazy loading means the first
/v1/chat(or hop-classifier) request must warm each SGLang server, which can take 20β60s VRAM load time; issue a warm-up request or keep-alive cron to avoid client timeouts. - Chunk retry mechanism: If LLM metadata extraction times out (default 60s), the generator server is automatically restarted and the same chunk is retried once. This prevents hanging on problematic chunks while maintaining extraction quality.
- Volatile checkpoints:
MemorySaverstores graph snapshots in-process, so any FastAPI restart drops in-flight state until the planned migration toSqliteSaver/PostgresSaverlands. - Weaviate v4 API: Uses
weaviate.connect_to_custom()with gRPC support (port 50051).
GoT (Graph of Thought)thought_expandernow performs graph-shaped exploration: each step fans outGOT_BRANCH_FACTORbranches, an observer LLM scores each branch, and the best results are merged viaGOT_MERGE_STRATEGY. Low-quality edges are pruned, and consecutive failures trigger snapshot-based backtracking.
- Advanced hop classifier
- Augment with query metadata (token length, entity counts) for a hybrid router.
- Multi-graph retrieval optimization
- Improve context filtering/dedup for 3β5 hop Weaviate traversals and add Cypher templates for β₯ 6 hop Neo4j exploration.
- LangGraph workflow observability
- Emit per-node latency/error metrics and integrate retry policies inside
GraphReasonerandLegacyGraphClient.
- Emit per-node latency/error metrics and integrate retry policies inside
- Persistent checkpointer
- Migrate from
MemorySavertoSqliteSaver/PostgresSaverfor cross-session state recovery.
- Migrate from
Issues and PRs are welcome. For questions or concerns, please open an issue on GitHub or email us at jeongnext@hnextits.com.
This project is dual-licensed under:
- MIT License - see the LICENSE file for details
- Apache License 2.0 - see the LICENSE-APACHE file for details
You may choose either license to govern your use of this software.
If you use this project in your research, please cite the following:
@misc{zheng2023sglang,
title={SGLang: Efficient Execution of Structured Language Model Programs},
author={Lianmin Zheng and Liangsheng Yin and Zhiqiang Xie and Jeff Huang and Chuyue Sun and Cody Hao Yu and Shiyi Cao and Christos Kozyrakis and Ion Stoica and Joseph E. Gonzalez and Clark Barrett and Ying Sheng},
year={2023},
url={https://github.com/sgl-project/sglang}
}@software{langgraph2024,
title={LangGraph: A Framework for Building Stateful Multi-Actor Applications},
author={LangChain AI},
year={2024},
url={https://github.com/langchain-ai/langgraph}
}@software{weaviate2024,
title={Weaviate: An Open-Source Vector Database},
author={Weaviate B.V.},
year={2024},
url={https://github.com/weaviate/weaviate}
}@software{neo4j2024,
title={Neo4j: The Graph Database Platform},
author={Neo4j, Inc.},
year={2024},
url={https://github.com/neo4j/neo4j}
}@software{ocrflux2024,
title={OCRFlux: Vision-Language Model for OCR},
author={ChatDOC},
year={2024},
url={https://huggingface.co/ChatDOC/OCRFlux-3B}
}