Skip to content

Add KB Arena (knowledge graph + hybrid retrieval benchmark)#28

Open
xmpuspus wants to merge 1 commit into
DEEP-PolyU:mainfrom
xmpuspus:add-kb-arena
Open

Add KB Arena (knowledge graph + hybrid retrieval benchmark)#28
xmpuspus wants to merge 1 commit into
DEEP-PolyU:mainfrom
xmpuspus:add-kb-arena

Conversation

@xmpuspus
Copy link
Copy Markdown

Adds KB Arena to the Open-source Project section.

KB Arena is an open-source benchmark that runs nine architecturally distinct retrieval strategies head-to-head on user-supplied corpora. The two GraphRAG-relevant strategies:

  • knowledge_graph — extraction-driven Neo4j graph using a universal 5-node-type / 7-rel-type schema (Topic, Component, Process, Config, Constraint + DEPENDS_ON, CONTAINS, CONNECTS_TO, TRIGGERS, CONFIGURES, ALTERNATIVE_TO, EXTENDS). Source provenance is stamped on every entity end-to-end so chunk-level retrieval matches against section ground truth.
  • hybrid — RRF-fused vector + graph retrieval with three-stage intent routing (keyword scan → Haiku LLM → regex fallback). Domain-agnostic.

Why it matters for the GraphRAG community: the field still lacks a clean apples-to-apples way to say "graph beats vector on this corpus by X with p<Y." KB Arena ships paired-bootstrap 95% CIs and Wilcoxon paired two-sided p-values on per-question IR metrics (Recall@k, NDCG, MAP, R-Precision, bpref), so GraphRAG vs vector vs hybrid comparisons report effect size with statistical confidence rather than mean-only deltas.

  • Active: v0.8.1 released 2026-05-21, 617 tests, MIT, on PyPI as kb-arena
  • Zenodo archive (concept DOI): 10.5281/zenodo.20319678
  • Ships an aws-compute benchmark corpus (75 questions, 5 difficulty tiers, 35 with chunk-level ground truth) as a pedagogical baseline

Entry appended to the existing Open-source Project list following the badge-prefix format used by the other 14 entries. No other sections modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant