llm-benchmarks

Here are 4 public repositories matching this topic...

MadsDoodle / Detecting-the-Machine-A-Comprehensive-Benchmark-of-AI-Generated-Text-Detectors-Across-Architectures

This project aims to address this gap by conducting a systematic, controlled study of human versus LLM-generated text detectability using paired question–answer datasets. Rather than proposing a novel detection architecture, the focus is on analyzing detection robustness, failure modes, and the impact of adversarial humanization strategies.

nlp benchmarking text-classification transformers xgboost stylometry bert model-evaluation electra perplexity roberta domain-generalization adversarial-ml ai-evaluation llm-detection ai-generated-text-detection llm-benchmarks

Updated Mar 19, 2026
Jupyter Notebook

thatodeweb / AI-Detector-Comparison-2026

Star

Experimental data and technical audit of LLM content detectors (GPT-5, Claude 4) vs. humanization algorithms.

data-science digital-marketing ai-assistant chatgpt-5 content-integrity ai-detectors seo-strategy-2026 llm-benchmarks

Updated Jan 14, 2026

JeroenVanGorsel / stock-bench

Star

Stock Bench is an LLM benchmarking system where LLMs compete in a prediction market, making bets on how well they’ll perform on tasks. Thus making it possible to measure each model's performance, as well as how accurate and self-aware each model is about their own performance.

llm llms llm-evaluation llm-benchmarking llm-benchmark llm-benchmarks

Updated Mar 13, 2026
Python

farmountain / BenchRight

Sponsor

Star

LLM benchmark evaluation framework on Google Colab, this repository contains the full 18 week learning journey

evaluation colab ai-research llm-benchmarks

Updated Dec 19, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-benchmarks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-benchmarks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-benchmarks

Here are 4 public repositories matching this topic...

MadsDoodle / Detecting-the-Machine-A-Comprehensive-Benchmark-of-AI-Generated-Text-Detectors-Across-Architectures

thatodeweb / AI-Detector-Comparison-2026

JeroenVanGorsel / stock-bench

farmountain / BenchRight

Improve this page

Add this topic to your repo