feat(ascend): op-simple group — Add, Mul, Cast, Cat, Matmul, Gemm, Linear by zhangyue207 · Pull Request #65 · InfiniTensor/InfiniOps

zhangyue207 · 2026-04-18T05:35:31Z

Summary

Seven foundational Ascend operators — Add, Mul, Cast, Cat, Matmul, Gemm,
Linear — implemented via the ACLNN API set.

This is part 2 of 4 in the Ascend operator split (part 1 =
feat/ascend-framework-pr). Each category PR ships its operators as an
atomic unit: src/base/<op>.h declaration + src/ascend/<op>/*.h Ascend
impl + src/cpu/<op>/<op>.h CPU reference + tests/test_<op>.py.

Depends on: feat/ascend-framework-pr must merge first (shared
framework headers, generator fixes, CI fixes, and test infra).

Operators

op	impl	`src/base/<op>.h`
Add	`aclnnAdd`	MODIFY (exists on master)
Mul	`aclnnMul`	NEW
Cast	`aclnnCast`	NEW
Cat	`aclnnCat`	NEW
Matmul	`aclnnMatmul`	NEW (replaces master's `mat_mul.h` — class renamed `MatMul` → `Matmul`)
Gemm	`aclnnMm`	MODIFY (exists on master) — also carries the cached-executor / workspace-pool rework used by all ACLNN operators
Linear	`aclnnMatmul` + optional bias	NEW

Kernels are header-only under src/ascend/<op>/kernel.h; the build picks
them up automatically through the Ascend glob in src/CMakeLists.txt.

CPU reference implementations

src/cpu/{cast,cat,linear,mul}/ added as reference implementations for
the new ops. add, gemm, and matmul already had CPU references on
master (mat_mul.h → matmul.h rename handled in this PR).

Removed

src/base/mat_mul.h — the old MatMul class had no implementation on
any backend. Replaced by the new Matmul class in src/base/matmul.h.

Verification

python3 .ci/run.py --local --gpu-id <N> (Ascend 910B + CANN 8.5.1):
3435 passed / 1746 skipped / 0 failed
Tests for ops not shipped in this PR (CausalSoftmax / RmsNorm / etc.)
skip cleanly via the framework PR's skip_op_without_platform_impl
autouse fixture.

Test plan

python3 .ci/run.py --local
clang-format passes locally
CUDA / Metax / Cambricon / Moore / Iluvatar regressions (CI-verified)

zhangyue207 · 2026-04-20T01:59:02Z

merge test:

[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape3-input_strides3-gate_strides3-out_strides3] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 3767 passed, 1664 skipped in 45.95s ======================
========== Summary ==========
[warn] job ascend_npu: container exited with 137 (likely docker teardown SIGKILL after clean pytest); junit XML reports no failures — treating as success
EXIT=0

Ziminli

整体有一些格式和 custom_kernel 这个文件夹的设计问题，在 #64 里同个问题的详细反馈。

…near Seven foundational Ascend operators: | op | impl | |---|---| | Add | aclnnAdd | | Mul | aclnnMul | | Cast | aclnnCast | | Cat | aclnnCat | | Matmul | aclnnMatmul | | Gemm | aclnnMm (also carries the cached-executor / workspace-pool rework) | | Linear | aclnnMatmul + optional bias | Also ships: - `src/base/<op>.h` for the 5 new ops (cast/cat/linear/matmul/mul); `add.h` and `gemm.h` existed on master and are updated in-place - `src/cpu/<op>/<op>.h` reference impls for cast/cat/linear/mul (add/gemm/matmul had CPU refs on master already) - `tests/test_<op>.py` for each operator (add and gemm have MODIFY diffs; others are new)

…caches - `add/kernel.h`: swap destroy() → release() on in_cache_/oth_cache_/out_cache_ and drop aclDestroyAclOpExecutor (both are referenced by the Repeatable executor; destroying them causes double-free at shutdown per the pattern documented in common.h and commit 64c367c). - `cat/kernel.h`: release all in_caches_[i] in the destructor; without it, ~AclTensorCache() on vector teardown double-frees descriptors held by tensor_list_ / executor_. - Also group the alpha_* storage members with blank lines to match file convention.

…entation_indices` Replaces hardcoded `(0, 1)` / `(0, 1, 2)` tuples in test_add, test_gemm, test_rms_norm, test_swiglu with a union over the locally-available devices' active implementation indices. New helper `tests.utils.all_active_implementation_indices(op_cls)` only iterates `get_available_devices()` to avoid `DispatchFunc::std::abort` on device types outside the build's `ActiveDevices` set. Effect on Ascend CI: skipped-test count drops from 3246 to 1686 — impl=1 (`cuBLASLt`) no longer parametrized when no CUDA device is visible, and RmsNorm/Swiglu's custom-kernel slot drops out of the matrix on op-simple where the framework layer hasn't merged the AscendC impl yet.

Replaces the per-test `@pytest.mark.parametrize("implementation_index", ...)` + runtime `if impl not in active_indices: skip` pattern with a single hook in `conftest.pytest_generate_tests` that emits only the (device, impl) pairs actually active on each device. Rationale: kernel dispatch is per-device, so cross-device union (previous `all_active_implementation_indices` helper) polluted the matrix with impls that the selected device can't run — runtime-skipped noise. Joint generation keeps the matrix to its semantic cell: "this device has this impl, so run it". - `tests/conftest.py`: when both `device` and `implementation_index` are in fixturenames, emit pairs via `op_cls.active_implementation_indices(dev)`; fall back to a skipped placeholder (`id="skip"`) when no device has an active impl, avoiding `[NOTSET-...]` test IDs. - `tests/{test_add,test_gemm,test_rms_norm,test_swiglu}.py`: drop the hardcoded `implementation_index` parametrize decorator and the runtime `active_indices` guard — conftest now handles both. - `tests/utils.py`: remove the `all_active_implementation_indices` helper (superseded by per-device generation in conftest). Same test outcome on Ascend CI (1935 passed / 1686 skipped) but the remaining skips are now either semantically mandatory (uint dtypes unsupported by `torch_npu`, Gemm impl=2 SFINAE-only workaround, op missing ascend impl on op-simple pending PR #66) rather than mechanism artifacts.

…undant fixture Post-review cleanup of the joint-parametrize refactor (1dd288f): - Extract `_op_class_from_module` as a shared helper; `skip_op_without_platform_impl` fixture now calls it instead of re-deriving the snake→pascal class name inline. - Short-circuit the fixture when `implementation_index` is already in callspec — `pytest_generate_tests` has already pruned empty-impl pairs, so per-case `active_implementation_indices` calls are wasted work. - Drop `try/except ImportError` inside the helper — collection has already imported `infini.ops` via test modules; masking a real import failure only turns it into a cryptic NOTSET fixture. - Drop the `devices[0] if devices else "cpu"` fallback — `get_available_devices()` always includes `"cpu"`, making the `else` arm unreachable.

zhangyue207 · 2026-04-21T14:25:29Z

ascend:

[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype1-0.001-0.001-shape5-input_strides5-gate_strides5-out_strides5] 
tests/test_swiglu.py::test_swiglu[skip-dtype1-0.001-0.001-shape6-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype1-0.001-0.001-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[skip-dtype1-0.001-0.001-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype1-0.001-0.001-shape7-input_strides7-gate_strides7-out_strides7] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape0-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape0-None-None-None] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape1-input_strides1-gate_strides1-out_strides1] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape1-input_strides1-gate_strides1-out_strides1] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape2-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape2-None-None-None] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape3-input_strides3-gate_strides3-out_strides3] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape3-input_strides3-gate_strides3-out_strides3] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape4-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape4-None-None-None] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape5-input_strides5-gate_strides5-out_strides5] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape5-input_strides5-gate_strides5-out_strides5] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape6-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] SKIPPED tests/test_swiglu.py::test_swiglu[skip-dtype2-0.01-0.005-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 1935 passed, 1686 skipped in 32.37s ======================
========== Summary ==========
[warn] job ascend_npu: container exited with 137 (likely docker teardown SIGKILL after clean pytest); junit XML reports no failures — treating as success

nvidia:

=========================== short test summary info ============================
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-False-False-True-a_shape2-b_shape2-out_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype1-0.01-0.01-False-False-a_shape3-b_shape3-c_shape3]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype2-0.01-0.01-False-True-a_shape3-b_shape3-c_shape3]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape0-None-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype5-0-0-shape0-None-None-None]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape1-input_strides1-out_strides1]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype5-out_dtype5-0.01-0.005-shape0-None-None]
FAILED tests/test_cat.py::test_cat[cuda-dtype0-1e-07-1e-07-shapes0-0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-False-False-True-a_shape3-b_shape3-out_shape3]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-True-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-False-False-True-a_shape0-b_shape0-out_shape0]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype1-0.01-0.01-False-False-a_shape0-b_shape0-c_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-False-False-a_shape4-b_shape4-out_shape4]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype2-0.01-0.01-False-True-a_shape2-b_shape2-c_shape2]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape1-input_strides1-other_strides1-out_strides1]
FAILED tests/test_mul.py::test_mul[cuda-dtype2-0.01-0.005-shape8-input_strides8-other_strides8-out_strides8]
FAILED tests/test_mul.py::test_mul[cuda-dtype5-0-0-shape11-input_strides11-other_strides11-out_strides11]
FAILED tests/test_mul.py::test_mul[cuda-dtype8-0-0-shape10-None-None-None]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype3-out_dtype3-0.01-0.005-shape2-None-None]
FAILED tests/test_cat.py::test_cat[cuda-dtype2-0.01-0.005-shapes1-1-out_shape1]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-False-True-False-a_shape3-b_shape3-out_shape3]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-True-False-False-a_shape2-b_shape2-out_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype0-0.01-0.01-True-True-a_shape2-b_shape2-c_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-True-True-a_shape0-b_shape0-out_shape0]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype2-0.01-0.01-False-False-a_shape1-b_shape1-c_shape1]
FAILED tests/test_mul.py::test_mul[cuda-dtype1-0.001-0.001-shape10-None-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape2-input_strides2-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype7-0-0-shape3-None-None-None]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype2-out_dtype2-0.01-0.005-shape0-None-None]
========== 33 failed, 6225 passed, 2053 skipped in 117.96s (0:01:57) ===========
Stage 'test' failed with exit code 1
========== Summary ==========
job nvidia_gpu failed (exit code 1)

iluvatar

=========================== short test summary info ============================
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-True-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-True-True-False-a_shape3-b_shape3-out_shape3]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-False-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype1-0.01-0.01-True-False-a_shape1-b_shape1-c_shape1]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype1-0.01-0.01-True-True-a_shape3-b_shape3-c_shape3]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape0-None-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype2-0.01-0.005-shape11-input_strides11-other_strides11-out_strides11]
FAILED tests/test_mul.py::test_mul[cuda-dtype5-0-0-shape10-None-None-None] - ...
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape1-input_strides1-out_strides1]
FAILED tests/test_cat.py::test_cat[cuda-dtype0-1e-07-1e-07-shapes4-0-out_shape4]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape2-b_shape2-out_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-True-False-a_shape2-b_shape2-out_shape2]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype1-0.01-0.01-False-True-a_shape2-b_shape2-c_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-True-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-True-False-True-a_shape3-b_shape3-out_shape3]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype2-0.01-0.01-True-True-a_shape0-b_shape0-c_shape0]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape1-input_strides1-other_strides1-out_strides1]
FAILED tests/test_mul.py::test_mul[cuda-dtype3-0-0-shape0-None-None-None] - w...
FAILED tests/test_mul.py::test_mul[cuda-dtype6-0-0-shape2-input_strides2-None-None]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype2-0.01-0.01-False-True-a_shape2-b_shape2-c_shape2]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape2-None-None]
FAILED tests/test_causal_softmax.py::test_causal_softmax[cuda-dtype0-1e-05-1e-05-shape0-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-True-True-a_shape2-b_shape2-out_shape2]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-False-True-True-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-True-True-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape2-input_strides2-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape5-input_strides5-other_strides5-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype6-0-0-shape9-input_strides9-other_strides9-out_strides9]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype0-0.01-0.01-False-True-a_shape3-b_shape3-c_shape3]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-True-False-False-a_shape3-b_shape3-out_shape3]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape3-b_shape3-out_shape3]
FAILED tests/test_cat.py::test_cat[cuda-dtype0-1e-07-1e-07-shapes0-0-out_shape0]
=========== 35 failed, 5766 passed, 1072 skipped in 73.83s (0:01:13) ===========
Stage 'test' failed with exit code 1
========== Summary ==========
job iluvatar_gpu failed (exit code 1)

moore

zhangyue@mccx:~/InfiniOps$ python3 .ci/run.py --local --gpu-id 0 --stage build
platform: moore
==> running job: moore_gpu
========== Setup ==========
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (7.2.2)
Requirement already satisfied: pytest-cov in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (2.5.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (6.0.2)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (25.3.0)
Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (24.2)
Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (2.2.1)
Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from exceptiongroup>=1.0.0rc8->pytest->InfiniOps==0.1.0) (4.15.0)
Requirement already satisfied: coverage>=7.10.6 in /usr/local/lib/python3.10/dist-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/lib/python3.10/dist-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.19.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (2025.9.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=618509 sha256=1a741ff896a5cfaf9e735b38db0130812ed36277c0a71caf64bc402af6805ef0
  Stored in directory: /tmp/pip-ephem-wheel-cache-zqc59rna/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python -m pip install --upgrade pip
========== Stage: build ==========
========== Summary ==========

cambricon

[zhangyue@localhost InfiniOps]$ python .ci/run.py --local --stage build
platform: cambricon
==> running job: cambricon_gpu
========== Setup ==========
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: ruff in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: pytest in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (9.0.2)
Requirement already satisfied: pytest-cov in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: pyyaml in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (5.3.1)
Requirement already satisfied: pygments>=2.7.2 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.19.2)
Requirement already satisfied: pluggy<2,>=1.5 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: exceptiongroup>=1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: tomli>=1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.4.0)
Requirement already satisfied: packaging>=22 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: iniconfig>=1.0.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: coverage[toml]>=7.10.6 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: sympy in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: fsspec in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (2025.5.1)
Requirement already satisfied: typing-extensions in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (4.14.0)
Requirement already satisfied: networkx in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: filelock in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.18.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/python3.10/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/python3.10/lib/python3.10/site-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=276075 sha256=d63c6618589beb7b4740344b69b11262eae6475b75487b379dedf871aac90d42
  Stored in directory: /tmp/pip-ephem-wheel-cache-v2w22n8k/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: There was an error checking the latest version of pip.
========== Stage: build ==========
========== Summary ==========

metax

zhangyue@test:~/InfiniOps$ python3 .ci/run.py  --local --stage build
platform: metax
==> running job: metax_gpu
========== Setup ==========
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (8.4.1)
Requirement already satisfied: pytest-cov in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (2.4.0+metax3.2.1.3)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (6.0.3)
Requirement already satisfied: exceptiongroup>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: iniconfig>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: packaging>=20 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: pluggy<2,>=1.5 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: pygments>=2.7.2 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.19.2)
Requirement already satisfied: tomli>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: coverage>=7.10.6 in /opt/conda/lib/python3.10/site-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.11.0)
Requirement already satisfied: execnet>=2.1 in /opt/conda/lib/python3.10/site-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.20.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (4.15.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (2025.5.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=791843 sha256=96c5b77525dbc526324ae52ba9fff744bbde6ba0a7b10612d702af04fe8fe630
  Stored in directory: /tmp/pip-ephem-wheel-cache-is6c9cut/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
========== Stage: build ==========
========== Summary ==========

…ables in Linear Per PR #65 review: - `src/cpu/cast/cast.h`: replace nested `DispatchFunc(in_dtype, ...)` inside `DispatchFunc(out_dtype, ...)` with a single multi-dispatch call `DispatchFunc<kCpu, AllTypes, AllTypes>({in, out}, [](in_tag, out_tag) {...})` per the multi-dispatch idiom documented in `CONTRIBUTING.md`. - `src/cpu/linear/linear.h`: rename PascalCase locals to snake_case: `A/B/Out/Bias` → `a_ptr/b_ptr/out_ptr/bias_ptr`, `A_batch/B_batch/Out_batch` → `a_batch/b_batch/out_batch`, `M/N/K` → `m/n/k` (matching master's `src/cpu/gemm/gemm.h` which already uses lowercase dim names `m_/n_/k_`).

- `if (bias_ptr && bias)` → `if (bias_ptr)` (line 75). `bias_ptr` is `nullptr` iff `!bias` by construction at line 38, so `&& bias` is dead. - Remove `// Determine `m`, `n`, `k` from shapes and transpose flags.` — the three lines below literally do exactly that; self-describing now that names are snake_case.

Ziminli requested changes Apr 20, 2026

View reviewed changes

zhangyue207 force-pushed the feat/ascend-op-simple branch 13 times, most recently from 78a0628 to 7eeec7a Compare April 21, 2026 06:17

zhangyue207 force-pushed the feat/ascend-op-simple branch from 7eeec7a to 7649042 Compare April 21, 2026 06:40

Ziminli approved these changes Apr 21, 2026

View reviewed changes

zhangyue added 4 commits April 21, 2026 16:15

voltjia requested a review from Ziminli April 22, 2026 01:41

Ziminli requested changes Apr 22, 2026

View reviewed changes

Comment thread src/cpu/cast/cast.h Outdated

Comment thread src/cpu/linear/linear.h Outdated

Comment thread src/cpu/linear/linear.h Outdated

zhangyue added 2 commits April 22, 2026 13:54

Ziminli approved these changes Apr 22, 2026

View reviewed changes

voltjia merged commit 13cf84a into master Apr 22, 2026
4 checks passed

voltjia deleted the feat/ascend-op-simple branch April 22, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ascend): op-simple group — Add, Mul, Cast, Cat, Matmul, Gemm, Linear#65

feat(ascend): op-simple group — Add, Mul, Cast, Cat, Matmul, Gemm, Linear#65
voltjia merged 7 commits intomasterfrom
feat/ascend-op-simple

zhangyue207 commented Apr 18, 2026 •

edited

Loading

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

Ziminli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangyue207 commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhangyue207 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operators

CPU reference implementations

Removed

Verification

Test plan

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

Ziminli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangyue207 commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangyue207 commented Apr 18, 2026 •

edited

Loading