Skip to content

Enable CUDA CI#1473

Draft
XuehaoSun wants to merge 49 commits intomainfrom
xuehao/cuda-ci
Draft

Enable CUDA CI#1473
XuehaoSun wants to merge 49 commits intomainfrom
xuehao/cuda-ci

Conversation

@XuehaoSun
Copy link
Contributor

@XuehaoSun XuehaoSun commented Feb 27, 2026

Description

Enable CUDA CI

TODO

  • Skip gptqmodel, auto-gptq test
  • Fix absolute path (like /models/xxx)
  • Skip tests that require a lot of hardware resources
  • Separate the vLLM-related unit tests @xin3he
  • triton issue
  • CUDA compatibility

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Copilot AI review requested due to automatic review settings February 27, 2026 02:55
@XuehaoSun XuehaoSun marked this pull request as draft February 27, 2026 02:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces CUDA CI infrastructure using RunPod for GPU-based testing in Azure Pipelines. The implementation creates a three-stage pipeline that dynamically provisions GPU instances, runs CUDA unit tests, and ensures proper cleanup of cloud resources.

Changes:

  • Added Azure pipeline configuration for CUDA tests with RunPod integration
  • Created Python scripts to manage RunPod instance lifecycle and Azure DevOps agent registration
  • Implemented bash script to execute CUDA unit tests with multiple test suites (standard, LLMC, SGLang)
  • Commented out auto-gptq requirement in test dependencies

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
test/test_cuda/requirements.txt Commented out auto-gptq dependency per TODO item
.azure-pipelines/unit-test-cuda.yml New pipeline with 3 stages: pod provisioning, GPU testing, and cleanup
.azure-pipelines/scripts/cuda_unit_test/runpod_manager.py Manages RunPod GPU instance creation, monitoring, and termination
.azure-pipelines/scripts/cuda_unit_test/run_cuda_ut.sh Executes CUDA unit tests with separate functions for standard, LLMC, and SGLang tests
.azure-pipelines/scripts/cuda_unit_test/azure_agent.py Manages Azure DevOps agent registration and deregistration

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
root and others added 2 commits March 4, 2026 08:56
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
XuehaoSun and others added 9 commits March 6, 2026 11:28
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add parts of CUDA UT into CI Refactor CUDA Unit Tests: Segregate into Weekly Full Suite and CI Lightweight Suite

3 participants