over-refusal

Here are 4 public repositories matching this topic...

ant-research / Awesome-Refusal-Suppression

A bilingual awesome list for refusal suppression research: benchmarks, papers, tools, models, and ecosystem updates.

awesome-list activation-steering over-refusal refusal-suppression safety-neurons safety-degradation evading-safety-alignment

Updated Jun 12, 2026

TheoMarkopoulos / blindspot

Star

Open-source evaluation suite that stress-tests LLM over-refusal on legitimate requests involving defeated, unjust, or absurd rules. 5 defeat families × 19 authority types = 95 scenario types.

python benchmark nextjs evaluation click ai-safety tailwindcss github-actions pydantic instructor supabase llm litellm over-refusal

Updated Apr 9, 2026
Python

uninhibited-scholar / defensive-refusal-bench-zh

Star

中文网安防御问题误拒(over-refusal)评测基准：测安全模型是否误拒正当防御/安全教育问题，含真实危害对照组，可机器评分。补全 agent-safety-bench-zh 的另一半。

benchmark cybersecurity chinese ai-safety defensive-security llm-safety dual-use over-refusal

Updated Jun 20, 2026
Python

jang1563 / bio-overrefusal-v0.1

Star

Domain-expert-authored benchmark for LLM over-refusal on legitimate biology research queries.

benchmark biology dataset ai-safety false-positive biosecurity huggingface-datasets anthropic llm-evaluation llm-safety over-refusal

Updated Jun 5, 2026
Python

Improve this page

Add a description, image, and links to the over-refusal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the over-refusal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

over-refusal

Here are 4 public repositories matching this topic...

ant-research / Awesome-Refusal-Suppression

TheoMarkopoulos / blindspot

uninhibited-scholar / defensive-refusal-bench-zh

jang1563 / bio-overrefusal-v0.1

Improve this page

Add this topic to your repo