Skip to content

feat(daily): 2026-05-07 digest#32

Open
yayajjiang wants to merge 1 commit into
mainfrom
digest/daily-2026-05-07
Open

feat(daily): 2026-05-07 digest#32
yayajjiang wants to merge 1 commit into
mainfrom
digest/daily-2026-05-07

Conversation

@yayajjiang
Copy link
Copy Markdown
Owner

Papers added (2026-05-07)

  • ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning (2605.00380) — SVD负样本投影解耦正负共享语义,12基准数学推理Avg@16提升9.4%、Pass@128提升7.0%,超越NSR基线。⭐ pick
  • MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization (2604.06798) — 首个面向MoE LLM的二值化框架,联合SVD分解与梯度感知Hessian量化,压缩内存同时保持专家路由稳定性。
  • Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation (2605.02944) — 严格对照实验表明,无critic代码RL中pass-rate奖励并不稳定优于二元奖励,更密集奖励信号不能可靠引导梯度指向全通过解。

https://claude.ai/code/session_011xfrMRQbAqLkAB11AJMr5U


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants