Skip to content

[CI] Re-enable deep_ep and deep_gemm unit tests#8027

Merged
EmmonsCurse merged 1 commit into
PaddlePaddle:developfrom
EmmonsCurse:reuse_paddle_deep_ep
Jun 9, 2026
Merged

[CI] Re-enable deep_ep and deep_gemm unit tests#8027
EmmonsCurse merged 1 commit into
PaddlePaddle:developfrom
EmmonsCurse:reuse_paddle_deep_ep

Conversation

@EmmonsCurse

@EmmonsCurse EmmonsCurse commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Motivation

deep_ep and deep_gemm unit tests were previously disabled due to compatibility issues introduced by upstream Paddle changes.

Since the related changes have been reverted in PaddlePaddle/Paddle#79249, the original incompatibility no longer exists and the affected test cases can be restored.

Re-enabling these tests helps recover validation coverage and ensures continued regression protection for deep_ep and deep_gemm functionality.

Modifications

  • Re-enabled deep_ep related unit tests.
  • Re-enabled deep_gemm related unit tests.
  • Restored CI coverage for affected functionalities.
  • Removed the temporary test exclusion introduced as a workaround for the upstream issue.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@EmmonsCurse

Copy link
Copy Markdown
Collaborator Author

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_xpu
/skip-ci pre_ce_test
/skip-ci stable_test
/skip-ci base_test
/skip-ci logprob_test
/skip-ci gpu_4cards_test

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-09 13:38:37

📋 Review 摘要

PR 概述:恢复 cov pytest 对 DeepEP/DeepGEMM 相关单测的收集与执行。
变更范围tests/cov_pytest.ini 覆盖率测试忽略列表。
影响面 Tag[CI]

问题

级别 文件 概述
🔴 Bug tests/cov_pytest.ini:17 恢复 test_fusedmoe_ep_entry.py 后,pytest 收集/导入阶段会直接启动分布式测试,导致覆盖率任务重复执行且绕过 coverage_run.sh 的调度与日志包装

📝 PR 规范检查

符合规范。

总体评价

当前改动方向是恢复覆盖率,但至少这个分布式入口测试还不满足被全量收集的条件。请先修正测试入口的导入副作用,再从 ignore 列表移除。

Comment thread tests/cov_pytest.ini
--ignore=tests/graph_optimization/test_cuda_graph_dynamic_subgraph.py
--ignore=tests/e2e/test_ernie_03b_pd_decode_unified_attention.py
--ignore=tests/e2e/test_ernie_03b_pd_router_v1_ipc.py
--ignore=tests/distributed/test_fusedmoe_ep_entry.py

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 删除这条 ignore 后,coverage_run.sh 的收集阶段会导入并实际执行 tests/distributed/test_fusedmoe_ep_entry.py

coverage_run.sh 先执行 python -m pytest --collect-only -q -c ${PYTEST_INI} tests 来收集文件;被恢复的测试文件底部有未受 if __name__ == "__main__" 保护的 test_fused_moe_launch(),导入时就会启动 paddle.distributed.launch --gpus 0,1。后续 run_test_with_logging 再跑该文件时,pytest 导入会再执行一次,随后测试函数本身还会执行一次,导致覆盖率任务重复启动分布式子进程,并且第一次发生在收集阶段,绕过分类、重试和日志目录隔离。

建议修复方式:
先保留这条 ignore,或在恢复前把 tests/distributed/test_fusedmoe_ep_entry.py 的模块级调用移到 if __name__ == "__main__": 下,并给 pytest 入口补上显式的多 GPU 可用性 skip/gate,确保 collect-only 只收集、不执行分布式任务。

@PaddlePaddle-bot

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-09 14:43:50 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 370dbc1 | Merge base: 03d837a (branch: develop)


1 Required任务 : 2/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
40(0) 40 28 0 1 0 11

当前 required 无失败,CI 仍在运行:失败 0,运行中 1,等待中 0

任务 错误类型 置信度 日志

2 失败详情

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@03d837a). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8027   +/-   ##
==========================================
  Coverage           ?   78.32%           
==========================================
  Files              ?      404           
  Lines              ?    57430           
  Branches           ?     9032           
==========================================
  Hits               ?    44984           
  Misses             ?     9572           
  Partials           ?     2874           
Flag Coverage Δ
GPU 78.32% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@EmmonsCurse EmmonsCurse merged commit d7d7373 into PaddlePaddle:develop Jun 9, 2026
42 checks passed
@EmmonsCurse EmmonsCurse deleted the reuse_paddle_deep_ep branch June 9, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants