Conversation
There was a problem hiding this comment.
Pull Request Overview
Adds support for an NPU-compatible SortedNMS operator by extending both the Python-level API and underlying C++ implementations to accept sorted scores and input indices.
- Python wrapper (
nms_op) now branches for NPU to gather and passsorted_scoresandscore_inds. - Core op registration and functor logic updated to include optional
scoresandinput_indices, with new fused operator for NPU. - Data type inference and API signature modified for NPU support in
nms_op.cppandfunctional_api.yaml.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/oneflow/nn/modules/nms.py | Branch NPU path to gather sorted scores and pass indices |
| oneflow/user/ops/nms_op.cpp | InferNmsDataType now sets different dtype for NPU |
| oneflow/ir/include/OneFlow/OneFlowUserOps.td | Added optional scores and input_indices inputs |
| oneflow/core/functional/impl/nn_functor.cpp | Built fused_op_ and dispatch based on device and inputs |
| oneflow/core/functional/impl/array_functor.cpp | Route argwhere through CPU for NPU |
| oneflow/core/functional/functional_api.yaml | Updated nms signature with new optional tensor args |
Comments suppressed due to low confidence (1)
oneflow/user/ops/nms_op.cpp:34
- The CPU path sets output dtype to Int8, but NMS usually returns index tensors; Int32 or Int64 may be more appropriate.
ctx->SetOutputDType("out", 0, DataType::kInt8);
| return OpInterpUtil::Dispatch<Tensor>(*op_, {x}, attrs); | ||
| DeviceType device_type = JUST(x->device())->enum_type(); | ||
| if (device_type == DeviceType::kNPU) { | ||
| if (scores) { |
There was a problem hiding this comment.
The fused NMS path checks only scores; it should verify both scores and input_indices are provided to avoid passing a null optional downstream.
| if (scores) { | |
| if (scores && input_indices) { |
|
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
|
|
||
| - name: "nms" | ||
| signature: "Tensor (Tensor x, Float iou_threshold, Int32 keep_n=-1) => Nms" | ||
| signature: "Tensor (Tensor x, Tensor scores=None, Tensor input_indices=None, Float iou_threshold, Int32 keep_n=-1) => Nms" |
There was a problem hiding this comment.
看看是否需要/方便,为npu的nms导出独立的api/functor?
This PR is mainly to support the NPU-compatible SortedNMS operator, so sorted_score and input_indices have been added.