Skip to content

feat(daily): weekly digest 2026-W20#39

Open
yayajjiang wants to merge 1 commit into
mainfrom
digest/weekly-2026-05-17
Open

feat(daily): weekly digest 2026-W20#39
yayajjiang wants to merge 1 commit into
mainfrom
digest/weekly-2026-05-17

Conversation

@yayajjiang
Copy link
Copy Markdown
Owner

Weekly Digest 2026-W20 (May 11–17)

Week theme: Diffusion LMs enter the RL post-training era — plus new theory on what RL actually does to LLMs, and Apple's information-theoretic limits for alignment.

Papers added (6 entries, 3 picks)

  • Beyond Reasoning: RL Unlocks Parametric Knowledge in LLMs (2605.07153) ⭐ — RL在无CoT零样本问答中提升约27%事实召回率,机制是重分配已有知识概率质量而非习得新知,有力挑战「强化学习=推理」叙事。
  • Theoretical Limits of Language Model Alignment (2605.07105) ⭐ — Apple推导对齐的信息论KL-奖励帕累托前沿,best-of-N接近理论最优,PPO/GRPO仍大幅低于极限,为对齐研究提供理论锚点。
  • Block-R1: Block Size in Multi-domain RL for Diffusion LLMs (2605.11726) ⭐ — 块尺寸冲突是多领域dLLM RL后训练的关键瓶颈,Block-R1-41K为每个样本分配最优块尺寸,有效缓解GRPO跨领域训练张力。
  • Break the Block: Dynamic-size Reasoning Blocks for Diffusion LLMs (2605.02263) — 固定块尺寸破坏扩散LLM推理连贯性,单调熵下降RL框架自适应学习块边界,在多类推理任务中显著提升生成质量。
  • Continuous Latent Diffusion Language Model (Cola DLM) (2605.06548) — 分层Text VAE与块因果DiT在连续潜空间中建模文本,彻底绕开离散token瓶颈,为扩散语言模型开辟全新范式。
  • LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG (2605.06285) — 在连续潜空间中完成智能体RAG的推理与检索,完全替代逐token生成,7项基准上匹配显式推理精度,延迟降低约90%。

https://claude.ai/code/session_01DUXCCYevfJ8ALPNqk7sdoZ


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants