Skip to content

Conversation

@peterjunpark
Copy link
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

(cherry picked from commit 1512eb0)

clean up formatting

fmt

(cherry picked from commit b4e26d0)

rm redundant line

(cherry picked from commit 309ce05)

add note about N/A

(cherry picked from commit ae78279)
@peterjunpark peterjunpark requested review from a team and saadrahim as code owners January 28, 2026 22:06
@peterjunpark
Copy link
Contributor Author

Follow-up to #36

@peterjunpark peterjunpark merged commit 6c9cc02 into ROCm:develop Jan 28, 2026
3 checks passed
@peterjunpark peterjunpark deleted the partition-output branch January 28, 2026 22:26
Copy link

@gabrpham gabrpham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I looked at it more closely, there are a few other updates that should be made. I noted them in the comments below. Thanks!

Comment on lines +236 to +237
information. This is to be expected for security reasons and will be
addressed in a later feature update to ``amd-smi``.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can also leave it at 'This is to be expected for security reasons.' and chop the rest off, I think that would be completely correct. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +98 to +99
Upon a successful set, AMD SMI will then initiate an action to restart AMD GPU driver.
This action will change all GPU's in the hive to the requested memory (NPS) partition mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've changed this too. Users must initiate their own sudo modprobe -r amdgpu to unload the driver and then sudo modprobe amdgpu to reload the driver. We've removed the automatic reset since that was interfering with the user's already running workloads. Users should now initiate the driver reset as stated above when they are ready to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


GPU: 5
MEMORY_PARTITION: Successfully set memory partition to NPS4
Trying again - Updating memory partition for gpu 0: [██████████████..........................] 50/140 secs remain

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't be seeing this particular progress bar anymore since the driver reset is not user initiated. Here's the output for setting memory partitions now:

$ sudo amd-smi set -M NPS4

        ******WARNING******

        After changing memory (NPS) partition modes, users MUST restart
        (reload) the AMD GPU driver. This command NO LONGER AUTOMATICALLY
        reloads the driver, see `amd-smi reset -h` and
        `sudo amd-smi reset -r` for more information.

        This change is intended to allow users the ability to control when is
        the best time to restart the AMD GPU driver, as it may not be desired
        to restart the AMD GPU driver immediately after changing the
        memory (NPS) partition mode.

        Please use `sudo amd-smi reset -r` AFTER successfully
        changing the memory (NPS) partition mode. A successful driver reload
        is REQUIRED in order to complete updating ALL GPUs in the hive to
        the requested partition mode.

        ******REMINDER******
        In order to reload the AMD GPU driver, users MUST quit all GPU
        workloads across all devices.

Do you accept these terms? [Y/N] y

GPU: 0
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 1
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 2
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 3
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 4
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 5
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 6
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

GPU: 7
MEMORY_PARTITION: Successfully set memory partition to NPS4, reload driver when ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peterjunpark added a commit that referenced this pull request Jan 28, 2026
peterjunpark added a commit that referenced this pull request Jan 28, 2026
peterjunpark added a commit that referenced this pull request Jan 28, 2026
peterjunpark added a commit that referenced this pull request Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants