Skip to content

feat(daily): weekly digest 2026-W19#35

Open
yayajjiang wants to merge 1 commit into
mainfrom
digest/weekly-2026-05-10
Open

feat(daily): weekly digest 2026-W19#35
yayajjiang wants to merge 1 commit into
mainfrom
digest/weekly-2026-05-10

Conversation

@yayajjiang
Copy link
Copy Markdown
Owner

Weekly Digest — 2026-W19 (May 4–10)

6 new papers added to src/lib/daily.ts (3 editor's picks ⭐):

  • Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning (2605.06241) — RL不教推理新技巧,仅在1-3%高熵决策点稀疏调整概率,从根本上重构了对RL后训练的认知。
  • Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key (2605.06638) — ScaleLogic分离推理深度与逻辑表达力,指数由表达力决定(1.04→2.60),揭示长程推理泛化关键。
  • RVPO: Risk-Sensitive Alignment via Variance Regularization (2605.05750) — RVPO惩罚优势聚合中的跨奖励方差,修复多目标RLHF的均值聚合缺陷,17个并发奖励信号验证。
  • A Unified Pair-GRPO Family (2605.06375) — Soft/Hard Pair-GRPO统一隐式与显式偏好约束,解决GRPO类方法训练不稳定问题。
  • Federation of Experts: Communication Efficient Distributed Inference (2605.06206) — FoE将MoE模块重组为专家集群,切断跨节点token嵌入通信瓶颈。
  • Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding (2605.00342) — EVICT无需训练实现专家感知草稿树剪枝,比AR快2.35倍、比EAGLE-3快1.21倍。

https://claude.ai/code/session_015ootf94u65YKT9eXtiMwVz


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants