feat: add single-operator subgraph dataset generation script by ywh555hhh · Pull Request #635 · PaddlePaddle/GraphNet

ywh555hhh · 2026-02-05T07:52:15Z

PR Category

Feature Enhancement

Summary

This PR adds a new shell script generate_single_op_dataset.sh to support the generation of single-operator subgraphs. This is a critical step for building the single-op dataset used in kernel benchmarking.

Key Changes

New Script: generate_single_op_dataset.sh
Workflow:
1. Generation: Uses multiprocessing to extract single-op subgraphs from models defined in the input list.
2. Renaming: Standardizes graph variable names to ensure consistent hashing.
3. Deduplication: Removes structurally identical subgraphs.

Test Plan

Ran the script on 10 sample models (small scale).

Command: bash generate_single_op_dataset.sh
Results:
- Input: ~2088 raw subgraphs.
- Output: 206 unique subgraphs.
- Directory structure validated.
- Log files confirm correct handling of individual model failures without breaking the pipeline.

Checklist

Script executable permissions set.
Warning block added for hardcoded paths.
Verified deduplication logic.>

This commit introduces `generate_single_op_dataset.sh` to automate the workflow for generating single-operator subgraph datasets.

paddle-bot · 2026-02-05T07:52:22Z

Thanks for your contribution!

Xreki · 2026-02-05T11:09:08Z

graph_net/test/generate_single_op_dataset.sh

+# Virtual Environment Python Executable Path
+PYTHON_EXEC="/workspace/venv_graphnet/bin/python3"
+# Project Root Directory
+GRAPH_NET_ROOT="/workspace/GraphNet"


你参考下generate_subgraph_dataset.sh，这些都不要hardcode。

Xreki · 2026-02-05T11:11:41Z

graph_net/test/generate_single_op_dataset.sh

+            "model_path_prefix": PROJECT_ROOT, 
+            "output_dir": workspace
+        })
+        run_stage_cmd(env, PROJECT_ROOT, [


直接起shell命令吧，不需要调python

Refactor script for dynamic path detection and improved error handling. Added logging and workspace setup enhancements.

Xreki

脚本放到graph_net/tools目录下面

Xreki · 2026-02-09T07:11:34Z

graph_net/test/generate_single_op_dataset.sh

+LOG_DIR="${WORKSPACE}/logs"  # New: Dedicated log directory
+
+export PYTHONPATH="${GRAPH_NET_ROOT}:${PYTHONPATH}"
+export GRAPH_NET_ROOT PYTHON_EXEC WORKSPACE OP_NAMES_DIR RANGES_DIR RAW_SUBGRAPH_DIR RESUME LOG_DIR


这里多余了。L21能成功就不需要L54，另外L55什么作用？

Xreki · 2026-02-09T07:18:30Z

graph_net/test/generate_single_op_dataset.sh

+# ==============================================================================
+# Core Logic: Single Model Processing (V3: Strict Error Checking)
+# ==============================================================================
+process_single_model() {


每个步骤都是批量执行模型的，不是single_model

Xreki · 2026-02-09T07:18:39Z

graph_net/test/generate_single_op_dataset.sh

+EOF
+)"
+
+    run_step "OpNames" "$cmd_s1" || { rm -f "${tmp_list}"; return 1; }


不要再封装run_step函数了，这让整个脚本变得复杂，直接执行。

Xreki · 2026-02-09T07:25:08Z

graph_net/test/generate_single_op_dataset.sh

+    split -l ${lines_per_gpu} -d "${WORKSPACE}/clean_list.txt" "${WORKSPACE}/gpu_chunk_"
+
+    # 3. Parallel Execution
+    for (( i=0; i<NUM_GPUS; i++ )); do


脚本里面不必加并行执行。若要支持并行支持，应该是在apply_sample_pass和model_path_handler层面统一添加。
即使在脚本里面加，也不需要L179和L188两层循环。所有处理步骤都是接受一个model_path_list，把总的model_list拆成NUM_GPUS，每个GPU处理一个list。

…tput

Xreki · 2026-02-09T09:33:13Z

graph_net/test/generate_single_op_dataset.sh

+    exit 1
+fi
+
+grep -v "^#" "${MODEL_LIST}" | grep -v "^$" > "${WORKSPACE}/clean_list.txt"


原始的model_list有什么问题吗？

Xreki · 2026-02-09T09:41:25Z

graph_net/test/generate_single_op_dataset.sh

+find ${RAW_SUBGRAPH_DIR} -name "model.py" \
+    | xargs dirname \
+    | xargs realpath --relative-to=${RAW_SUBGRAPH_DIR} \
+    > "${WORKSPACE}/raw_list.txt"


文件名最好更有意义

…subgraphs_list.txt

feat: add single-operator subgraph dataset generation script

9ec7bdc

This commit introduces `generate_single_op_dataset.sh` to automate the workflow for generating single-operator subgraph datasets.

Xreki reviewed Feb 5, 2026

View reviewed changes

Refactor generate_single_op_dataset.sh for dynamic paths

ca2699b

Refactor script for dynamic path detection and improved error handling. Added logging and workspace setup enhancements.

Xreki reviewed Feb 9, 2026

View reviewed changes

ywh555hhh added 3 commits February 9, 2026 15:52

Merge branch 'PaddlePaddle:develop' into develop

e0c58b6

Refactor dataset generation script to strict serial execution mode

3e250b0

extract raw_list.txt generation as separate step and copy to final ou…

f4f8c95

…tput

Xreki reviewed Feb 9, 2026

View reviewed changes

ywh555hhh added 2 commits February 9, 2026 18:37

remove redundant clean_list.txt and rename raw_list.txt to generated_…

570d7d1

…subgraphs_list.txt

move generate_single_op_dataset.sh to tools directory

304dfe6

lixinqi approved these changes Feb 9, 2026

View reviewed changes

lixinqi merged commit c0fd47d into PaddlePaddle:develop Feb 9, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add single-operator subgraph dataset generation script#635

feat: add single-operator subgraph dataset generation script#635
lixinqi merged 7 commits intoPaddlePaddle:developfrom
ywh555hhh:develop

ywh555hhh commented Feb 5, 2026

Uh oh!

paddle-bot bot commented Feb 5, 2026

Uh oh!

Xreki Feb 5, 2026

Uh oh!

Xreki Feb 5, 2026

Uh oh!

Xreki left a comment

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Xreki Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ywh555hhh commented Feb 5, 2026

PR Category

Summary

Key Changes

Test Plan

Checklist

Uh oh!

paddle-bot bot commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants