refute-inspect

Inspect AI adapter for REFUTE — judge-free tasks for scientific critique and epistemic calibration on recent science paper summaries.

Tasks

Task	Description
`refute_forced_choice`	Pick the more flawed of twin summaries (contamination-proof, chance 50%)
`refute_soundness`	Binary sound/flawed classification

Install

pip install inspect-ai datasets
git clone https://github.com/connerlambden/refute-inspect.git
cd refute-inspect && pip install -e .

Run

inspect eval src/refute_inspect/refute_inspect.py@refute_forced_choice --model openai/gpt-4o
inspect eval src/refute_inspect/refute_inspect.py@refute_soundness --model openai/gpt-4o

Dataset loads from Hugging Face BGPT-OFFICIAL/refute (config refute_soundness, revision pinned at runtime).

Links

Dataset: https://huggingface.co/datasets/BGPT-OFFICIAL/refute
Technical report: https://huggingface.co/datasets/BGPT-OFFICIAL/refute/blob/main/TECHNICAL_REPORT.md
Leaderboard: https://huggingface.co/spaces/BGPT-OFFICIAL/refute-leaderboard

Hub integrator index

See also the dataset INTEGRATORS.md for all registration links.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/refute_inspect		src/refute_inspect
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

refute-inspect

Tasks

Install

Run

Links

Hub integrator index

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

refute-inspect

Tasks

Install

Run

Links

Hub integrator index

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages