LLM-as-a-Judge evaluation platform for ecommerce search. Scores relevance, computes IR metrics, and flags quality issues across multiple retail verticals
-
Updated
Mar 15, 2026 - Python
LLM-as-a-Judge evaluation platform for ecommerce search. Scores relevance, computes IR metrics, and flags quality issues across multiple retail verticals
Search intent classification dataset + rater calibration examples for AI search evaluation
Portfolio of search evaluation, AI response grading, and dataset labeling artifacts
AI-powered search quality auditor using Playwright browser automation and LLM-as-a-judge scoring
Add a description, image, and links to the search-quality topic page so that developers can more easily learn about it.
To associate your repository with the search-quality topic, visit your repo's landing page and select "manage topics."