rubric-based-evaluation

Here are 7 public repositories matching this topic...

yuvhaim-gif / LLM_InSight

This is my personal home rig for serious LLM experimentation. I built it to test models head-to-head, create custom evaluation rubrics, automatically improve prompts based on the previous run’s results, and generate high-quality synthetic training data. Everything runs locally first (Ollama by default), with optional cloud support. logged locally.

frontend-web ab-testing evaluation-metrics human-in-the-loop evaluation-framework grading-system local-first synthetic-data-generation dataset-curation ollama llm-evaluation prompt-optimization preference-optimization rubric-based-evaluation

Updated Jun 15, 2026
Python

abhinavag-svg / ai-coding-sessionprompt-analyzer

Star

Analyze Claude Code session logs and generate efficiency reports, cost diagnostics, and actionable recommendations. This project reads local JSONL session logs, computes deterministic efficiency signals, and can optionally add local LLM recommendations using Ollama.

python3 analyzer efficiency-analysis ai-code-review ollama claude-code rubric-based-evaluation composite-scoring

Updated Mar 12, 2026
Python

hawkelement333-glitch / career-signal-lab

Star

AI-trainer portfolio project for evaluating model responses, scoring career-readiness signals, and documenting rubric-based quality reviews.

frontend java-script front-end-development ai-evaluation llm-evaluation prompt-evaluation rubric-based-evaluation rubric-scoring

Updated Jun 13, 2026
JavaScript

PabloCabaleiro / pondera

Star

Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge.

python ai agents model-agnostic ai-evaluation llms llm-evaluation llm-evaluation-framework llm-judge agent-evaluation ai-evaluation-framework rubric-based-evaluation yaml-first

Updated Oct 23, 2025
Python

VisualPeerReview / visual-nudges-study

Star

Research repository for the Visual Nudges study, examining how lightweight interface interventions structure analytic judgment in visualization-based peer review.

visualization visual-analytics rubric-based-evaluation visual-nudges analytic-judgment evaluative-behavior

Updated Jun 7, 2026
Python

renataennes / llm-annotation-testset

Star

Bilingual LLM annotation dataset — EN/PT quality evaluation

python nlp annotation portuguese data-annotation bilingual cohen-kappa llm-evaluation rubric-based-evaluation

Updated Apr 13, 2026
Jupyter Notebook

anjaliy11 / Hybrid_Search_RAG

Star

A multi-turn agentic dialogue system built on hybrid dense-sparse retrieval, hierarchical agent coordination, and rubric-based evaluation. Designed for real-world deployment with FastAPI serving, streaming responses, and full observability

python langchain pineconedb googlegemini rag-pipeline agentic-ai hybridsearch rubric-based-evaluation

Updated Jun 17, 2026
Python

Improve this page

Add a description, image, and links to the rubric-based-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rubric-based-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rubric-based-evaluation

Here are 7 public repositories matching this topic...

yuvhaim-gif / LLM_InSight

abhinavag-svg / ai-coding-sessionprompt-analyzer

hawkelement333-glitch / career-signal-lab

PabloCabaleiro / pondera

VisualPeerReview / visual-nudges-study

renataennes / llm-annotation-testset

anjaliy11 / Hybrid_Search_RAG

Improve this page

Add this topic to your repo