Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ The cluster nodes will follow the requirements described by Slurm or Kubernetes.
- CentOS 7, 8
- Red Hat Enterprise Linux / Rocky Linux 8 and 9 for the DGX software stack through the `nvidia-dgx` role

You may also install a supported operating system on all servers via a 3rd-party solution (i.e. [MAAS](https://maas.io/), [Foreman](https://www.theforeman.org/)) or utilize the provided [OS install container](docs/pxe/minimal-pxe-container.md).
You may also install a supported operating system on all servers via a 3rd-party solution such as [MAAS](https://maas.io/) or [Foreman](https://www.theforeman.org/), or via an existing site-standard automated installer.
For new Ubuntu 24.04 or DGX OS 7 deployments, prefer Ubuntu autoinstall/cloud-init or MAAS and then apply DeepOps roles after the OS is present.
For DGX platform software installation on top of vanilla Ubuntu or Red Hat family operating systems, see the [DGX software stack role guide](docs/deepops/dgx-software-stack.md).

### Kubernetes
Expand Down
4 changes: 3 additions & 1 deletion config.example/group_vars/all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,9 @@ maas_adminusers:
maas_dns_domain: 'deepops.local'
maas_region_controller: '192.168.1.1'
maas_region_controller_url: 'http://{{ maas_region_controller }}:5240/MAAS'
maas_repo: 'ppa:maas/3.5'
# MAAS 3.7 is the current Ubuntu 24.04 line. Keep the 3.5 PPA when the MAAS
# controller itself still runs Ubuntu 22.04.
maas_repo: "{{ 'ppa:maas/3.7' if ansible_distribution_version is version('24.04', '>=') else 'ppa:maas/3.5' }}"

# Defines if maas user should generate ssh keys
# Usable for remote KVM/libvirt power actions
Expand Down
62 changes: 0 additions & 62 deletions config.example/helm/dgxie.yml

This file was deleted.

9 changes: 0 additions & 9 deletions config.example/pxe/dnsmasq.extra.conf

This file was deleted.

49 changes: 0 additions & 49 deletions config.example/pxe/env

This file was deleted.

4 changes: 0 additions & 4 deletions config.example/pxe/ipmi.conf

This file was deleted.

4 changes: 0 additions & 4 deletions config.example/pxe/ipmi_host_list

This file was deleted.

50 changes: 0 additions & 50 deletions config.example/pxe/machines/machines.json

This file was deleted.

2 changes: 1 addition & 1 deletion docs/airgap/mirror-apt-repos.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ deb http://archive.ubuntu.com/ubuntu noble-security main restricted universe mul
deb http://archive.ubuntu.com/ubuntu noble-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu noble-proposed main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu noble-backports main restricted universe multiverse
deb http://ppa.launchpad.net/maas/3.5/ubuntu noble main
deb http://ppa.launchpad.net/maas/3.7/ubuntu noble main
deb http://archive.canonical.com/ubuntu noble partner

deb-src http://archive.ubuntu.com/ubuntu noble main restricted universe multiverse
Expand Down
3 changes: 2 additions & 1 deletion docs/k8s-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ Instructions for deploying a GPU cluster with Kubernetes

1. Install a supported operating system on all nodes.

Install a supported operating system on all servers via a 3rd-party solution (i.e. [MAAS](https://maas.io/), [Foreman](https://www.theforeman.org/)) or utilize the provided [OS install container](../pxe).
Install a supported operating system on all servers via a 3rd-party solution such as [MAAS](https://maas.io/) or [Foreman](https://www.theforeman.org/), or via an existing site-standard automated installer.
For new Ubuntu 24.04 or DGX OS 7 deployments, prefer Ubuntu autoinstall/cloud-init or MAAS and then apply DeepOps after the OS is present.

2. Set up your provisioning machine.

Expand Down
3 changes: 2 additions & 1 deletion docs/k8s-cluster/roce-perf-k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ add switch PFC, ECN configuration

2. Install a supported operating system on all nodes.

Install a supported operating system on all servers utilizing the [DGXie](/docs/pxe/dgxie-container.md) provisioning container, via a 3rd-party solution (i.e. [MAAS](https://maas.io/), [Foreman](https://www.theforeman.org/)), or server BMC/console.
Install a supported operating system on all servers via a 3rd-party solution such as [MAAS](https://maas.io/) or [Foreman](https://www.theforeman.org/), via an existing site-standard automated installer, or through server BMC/console.
For new Ubuntu 24.04 or DGX OS 7 deployments, prefer Ubuntu autoinstall/cloud-init or MAAS.

> NOTE: During OS installation, it is ideal if the identical user/password is configured. Otherwise, follow step 4 below to create an identical user across all nodes in the cluster.

Expand Down
10 changes: 5 additions & 5 deletions docs/pxe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ Most of the playbooks in DeepOps are agnostic to the OS install tooling, assumin
For example, DeepOps can be used to deploy a [Slurm cluster](../slurm-cluster/) or a [Kubernetes cluster](../k8s-cluster) regardless of how the OS was installed.
This makes it relatively easy to integrate with an existing datacenter environment.

However, DeepOps does provide tooling for several PXE installation mechanisms which can be used if an existing tool isn't already deployed.
These include:
DeepOps does not try to replace a site provisioning system.
For environments without an existing bare-metal provisioning workflow, DeepOps provides MAAS setup guidance:

- [MAAS](./maas.md), an open-source bare-metal provisioning tool developed by [Canonical](https://canonical.com/)
- [DGXIE](./dgxie-container.md), a containerized deployment tool developed specifically to deploy NVIDIA DGX OS
- [DGXIE on Kubernetes](./dgxie-on-k8s.md)
- A minimal [PXE container](./minimal-pxe-container.md) which wraps [Pixiecore](https://github.com/danderson/netboot/tree/master/pixiecore), an open source tool for network booting

For new Ubuntu 24.04 or DGX OS 7 cluster deployments, prefer MAAS, an existing site provisioning system, or Ubuntu autoinstall/cloud-init.
NVIDIA DGX OS 7 supports installing the DGX Software Stack on regular Ubuntu 24.04 for cluster deployments, which is a better fit for current automated installation tooling than the retired legacy DGX OS installer workflows.
62 changes: 0 additions & 62 deletions docs/pxe/dgxie-container.md

This file was deleted.

Loading
Loading