Skip to content

Update GPU support for V100,T4,A5500.#6639

Merged
DaanHoogland merged 5 commits intoapache:mainfrom
pdion891:main-gpu-update
Nov 2, 2023
Merged

Update GPU support for V100,T4,A5500.#6639
DaanHoogland merged 5 commits intoapache:mainfrom
pdion891:main-gpu-update

Conversation

@pdion891
Copy link
Copy Markdown
Contributor

@pdion891 pdion891 commented Aug 15, 2022

Description

Update GPU and vGPU support for Nvidia V100, T4 and A5500 cards.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

@acs-robot
Copy link
Copy Markdown

Found UI changes, kicking a new UI QA build
@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@acs-robot a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: http://qa.cloudstack.cloud:8080/client/pr/6639 (SL-JID-2149)

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3990

@DaanHoogland
Copy link
Copy Markdown
Contributor

@pdion891 could you advice on how to test this, or report on your own testing?

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 15, 2022

Codecov Report

Merging #6639 (454aa4c) into main (235e4fe) will decrease coverage by 0.02%.
Report is 3 commits behind head on main.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##               main    #6639      +/-   ##
============================================
- Coverage     29.15%   29.13%   -0.02%     
+ Complexity    30909    30884      -25     
============================================
  Files          5165     5165              
  Lines        364111   364202      +91     
  Branches      53306    53306              
============================================
- Hits         106153   106119      -34     
- Misses       243397   243519     +122     
- Partials      14561    14564       +3     
Flag Coverage Δ
simulator-marvin-tests 25.10% <0.00%> (-0.03%) ⬇️
uitests 4.53% <ø> (ø)
unit-tests 14.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
ui/src/views/offering/AddComputeOffering.vue 0.00% <ø> (ø)
api/src/main/java/com/cloud/gpu/GPU.java 2.77% <0.00%> (-14.87%) ⬇️

... and 30 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Copy Markdown
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM.

In UI, however, with new options for GPU interface looks not so good. Radio button could be converted to select item,

-          <a-radio-group
+          <a-select
             v-model:value="form.pcidevice"
-            buttonStyle="solid"
-            @change="selected => { handleGpuChange(selected.target.value) }">
-            <a-radio-button v-for="(opt, optIndex) in gpuTypes" :key="optIndex" :value="opt.value">
+            showSearch
+            optionFilterProp="label"
+            :filterOption="(input, option) => {
+              return option.children[0].children.toLowerCase().indexOf(input.toLowerCase()) >= 0
+            }"
+            :placeholder="$t('label.gpu')"
+            @change="handleGpuChange">
+            <a-select-option v-for="(opt, optIndex) in gpuTypes" :key="optIndex" :value="opt.value">
               {{ opt.title }}
-            </a-radio-button>
-          </a-radio-group>
+            </a-select-option>
+          </a-select>

Screenshot from 2022-08-16 13-59-41

With select,
Screenshot from 2022-08-16 14-00-27

@pdion891
Copy link
Copy Markdown
Contributor Author

@DaanHoogland , I didn't test this yet, this is just a backport from previous version / UI.
I'll comment once tested and remove the PR from draft mode.

@shwstppr shwstppr marked this pull request as ready for review August 17, 2022 04:15
@shwstppr shwstppr marked this pull request as draft August 18, 2022 08:35
@shwstppr
Copy link
Copy Markdown
Contributor

Accidentally marked this ready for review yesterday, moved it back to draft

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 19, 2023

ping @pdion891 @shwstppr is this still relevant?

@pdion891
Copy link
Copy Markdown
Contributor Author

@rohityadavcloud yes, but at this point it would make sense to update it with recent GPU model.
I will rebase this PR and add some more GPU to it.

@DaanHoogland DaanHoogland added this to the 4.19.0.0 milestone Jun 22, 2023
@shwstppr
Copy link
Copy Markdown
Contributor

shwstppr commented Oct 3, 2023

@pdion891 any update on this?

@shwstppr
Copy link
Copy Markdown
Contributor

@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@shwstppr a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: https://qa.cloudstack.cloud/simulator/pr/6639 (QA-JID-213)

@DaanHoogland
Copy link
Copy Markdown
Contributor

@pdion891 I updated the branch. Can you comment on the state of this PR?

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7574

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@pdion891
Copy link
Copy Markdown
Contributor Author

pdion891 commented Oct 31, 2023

@shwstppr I've added your snippet suggestion, and thank you, it work great!

Here are some highlight about this PR: it remove vGPU support from legacy grid card not supported by recent NVIDIA driver.
because of the passthrough generic, any GPU passthrough is working but vGPU need to be defined.

This PR include known vGPU offering for A2, A10, A40, A5500, V100, T4 GPU card.

Some tests where done on XenServer 8.2cu1 with nvidia RTX driver 16.1. Not all vGPU definition have been tested, only made sure vGPU definition exist on XenServers.

Screenshot 2023-10-30 at 15 54 06 Screenshot 2023-10-30 at 15 54 15

@pdion891 pdion891 marked this pull request as ready for review October 31, 2023 14:24
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, just wondering if encoding vGPUs in the UI is a good thing?

@pdion891
Copy link
Copy Markdown
Contributor Author

pdion891 commented Nov 1, 2023

clgtm, just wondering if encoding vGPUs in the UI is a good thing?

it's not, it prevent us to easily add new GPU support. Currently, any GPU offering in passthrough mode can be created thru the API, but cannot thru the UI, they must be define like in this PR. ideally we would be able to define them a bit like GuestOS type works.

@shwstppr
Copy link
Copy Markdown
Contributor

shwstppr commented Nov 1, 2023

@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@shwstppr a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@shwstppr this has backend changes as well. Do you want to just test the UI?

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: https://qa.cloudstack.cloud/simulator/pr/6639 (QA-JID-216)

@shwstppr
Copy link
Copy Markdown
Contributor

shwstppr commented Nov 1, 2023

@DaanHoogland missed those

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7599

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test keepEnv

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-8205)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 48274 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6639-t8205-kvm-centos7.zip
Smoke tests completed. 115 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_upgrade_kubernetes_cluster Failure 579.13 test_kubernetes_clusters.py

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

image

and then

image

I think the way that vgpu is decided on needs an overhaul but would pull that out of scope of this PR

Copy link
Copy Markdown
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaanHoogland DaanHoogland merged commit a3565fc into apache:main Nov 2, 2023
@pdion891 pdion891 deleted the main-gpu-update branch November 2, 2023 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants