diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_index.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_index.md
new file mode 100644
index 0000000000..0f792de28f
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_index.md
@@ -0,0 +1,66 @@
+---
+title: Train and Benchmark AI Workloads with DeepSpeed on Google Cloud C4A Axion VM
+
+draft: true
+cascade:
+    draft: true
+    
+description: Set up PyTorch and DeepSpeed on Google Cloud C4A Axion Arm VMs running SUSE Linux to train neural network models, benchmark AI workloads, and validate scalable CPU-based AI execution on Arm64 processors.
+
+minutes_to_complete: 30
+
+who_is_this_for: This is an introductory topic for DevOps engineers, ML engineers, and software developers who want to run AI training and benchmarking workloads using PyTorch and DeepSpeed on SUSE Linux Enterprise Server (SLES) Arm64, validate CPU-based neural network execution, and benchmark AI performance on Arm processors.
+
+learning_objectives:
+    - Install and configure PyTorch and DeepSpeed on Google Cloud C4A Axion processors for Arm64
+    - Create and execute neural network training workloads using PyTorch
+    - Benchmark CPU-based AI workloads on Arm64 processors
+    - Validate scalable AI execution and workload performance on GCP Axion Arm VMs
+
+prerequisites:
+  - A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
+  - Basic familiarity with Python and machine learning concepts
+
+author: Pareena Verma
+
+##### Tags
+skilllevels: Introductory
+subjects: ML
+cloud_service_providers:
+  - Google Cloud
+
+armips:
+  - Neoverse
+
+tools_software_languages:
+  - DeepSpeed
+  - PyTorch
+  - Python
+
+operatingsystems:
+  - Linux
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+
+further_reading:
+  - resource:
+      title: DeepSpeed official documentation
+      link: https://www.deepspeed.ai/
+      type: documentation
+
+  - resource:
+      title: DeepSpeed GitHub repository
+      link: https://github.com/microsoft/DeepSpeed
+      type: documentation
+
+  - resource:
+      title: PyTorch documentation
+      link: https://pytorch.org/docs/stable/index.html
+      type: documentation
+
+weight: 1
+layout: "learningpathall"
+learning_path_main_page: yes
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_next-steps.md
new file mode 100644
index 0000000000..c3db0de5a2
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/background.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/background.md
new file mode 100644
index 0000000000..94ea23b629
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/background.md
@@ -0,0 +1,43 @@
+---
+title: Learn about DeepSpeed and Google Axion C4A for AI training
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Google Axion C4A Arm instances for AI and machine learning
+
+Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for AI, machine learning, data analytics, and modern cloud-native workloads.
+
+The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and efficiency advantages of the Arm architecture in Google Cloud.
+
+For AI and machine learning workloads, Axion processors provide high multi-core CPU throughput, efficient tensor computation performance, improved performance-per-watt, and scalable CPU execution for training and inference workloads. These capabilities make Axion Arm-based systems suitable for neural network training, benchmarking, experiment validation, and scalable AI development pipelines.
+
+To learn more, see the Google blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).
+
+## DeepSpeed for scalable AI training on Arm
+
+DeepSpeed is an open-source deep learning optimization framework developed by Microsoft to enable efficient and scalable training of large AI models. It is widely used for distributed deep learning, memory optimization, large language model (LLM) training, efficient inference execution, and high-performance AI workloads.
+
+DeepSpeed provides a unified optimization platform with capabilities such as:
+
+* ZeRO (Zero Redundancy Optimizer) memory optimization  
+* Distributed training acceleration  
+* Mixed precision training  
+* Pipeline and tensor parallelism  
+* Optimized inference execution  
+* Scalable AI workload management  
+
+Running DeepSpeed on Google Axion C4A Arm-based infrastructure enables efficient CPU-based AI training and benchmarking workflows by utilizing multi-core Arm processors and optimized memory performance. This results in improved performance-per-watt, reduced infrastructure costs, and scalable execution for AI experimentation and model training workloads.
+
+On SUSE Linux Enterprise Server Arm64 environments, some DeepSpeed native CPU communication extensions require newer GCC toolchains for compilation. For this reason, this Learning Path uses DeepSpeed compatibility-mode installation together with PyTorch CPU execution to provide stable AI workload validation and benchmarking on GCP Axion Arm64 processors.
+
+Common use cases include neural network training, AI benchmarking, scalable experimentation pipelines, distributed AI research environments, and CPU-based inference validation workflows.
+
+To learn more, see the [DeepSpeed documentation](https://www.deepspeed.ai/) and the [DeepSpeed GitHub repository](https://github.com/microsoft/DeepSpeed).
+
+## What you've learned and what's next
+
+You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and machine learning workloads. You were also introduced to core DeepSpeed capabilities including distributed training optimization, ZeRO memory optimization, scalable AI execution, and CPU-based AI benchmarking workflows.
+
+Next, you'll set up PyTorch and DeepSpeed on a GCP Axion Arm64 virtual machine, configure a Python AI/ML environment, and begin running AI training and benchmarking workloads on Arm processors.
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-pubip-ssh.png b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-pubip-ssh.png
new file mode 100644
index 0000000000..558745de3e
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-pubip-ssh.png differ
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-shell.png b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-shell.png
new file mode 100644
index 0000000000..7e2fc3d1b5
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-shell.png differ
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-vm.png b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-vm.png
new file mode 100644
index 0000000000..0d1072e20d
Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-vm.png differ
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/install-deepspeed-arm.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/install-deepspeed-arm.md
new file mode 100644
index 0000000000..fa8836016a
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/install-deepspeed-arm.md
@@ -0,0 +1,307 @@
+---
+title: Setup PyTorch and DeepSpeed on GCP Axion (Arm)
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Setup PyTorch and DeepSpeed on GCP Axion (Arm)
+
+This section guides you through setting up a Python AI/ML environment on a Google Cloud Axion Arm64 VM using SUSE Linux Enterprise Server.
+
+The setup validates:
+
+- PyTorch execution on Arm64
+- DeepSpeed installation in compatibility mode
+- CPU-only AI/ML runtime configuration
+- Arm64 AI development environment preparation
+
+## Learning Objectives
+
+- Verify Arm64 environment
+- Configure Python 3.11
+- Create Python virtual environment
+- Install PyTorch on Arm
+- Install DeepSpeed in compatibility mode
+- Validate AI/ML environment setup
+- Understand DeepSpeed limitations on SUSE Arm64
+
+
+## Verify ARM64 architecture
+
+Verify that the VM is running on Arm64 architecture.
+
+```bash
+uname -m
+```
+
+Expected output:
+
+```text
+aarch64
+```
+
+Check CPU details:
+
+```bash
+lscpu
+```
+
+
+## Install Python 
+Deep learning frameworks such as PyTorch and DeepSpeed work more reliably with modern Python versions.
+
+```bash
+sudo zypper install -y python311 python311-pip python311-devel
+```
+
+## Why Python 3.11 is used
+
+Python 3.11 provides:
+
+- Better runtime performance
+- Improved package compatibility
+- Stable PyTorch support
+- Better support for AI/ML frameworks
+
+Using Python 3.11 avoids compatibility issues commonly seen with older Python releases.
+
+## Create Python virtual environment
+Create an isolated Python environment to prevent dependency conflicts with system packages.
+
+```bash
+python3.11 -m venv deepspeed-env
+```
+
+Activate environment:
+
+```bash
+source ~/deepspeed-env/bin/activate
+```
+
+Verify:
+
+```bash
+python --version
+```
+
+
+## Upgrade Python tools
+Upgrade Python package management tools.
+
+```bash
+pip install --upgrade pip setuptools wheel
+```
+
+## Why this step is important
+
+Updated packaging tools help:
+
+- Avoid installation failures
+- Improve wheel compatibility
+- Reduce dependency resolution issues
+- Improve Arm64 package installation reliability
+
+## Install Ninja
+
+Install Ninja using pip instead of zypper.
+
+```bash
+pip install ninja
+```
+
+Verify:
+
+```bash
+ninja --version
+```
+
+The output is similar to:
+```output
+1.13.0.git.kitware.jobserver-pipe-1
+```
+
+Ninja is a lightweight build system used by:
+
+- PyTorch
+- DeepSpeed
+- native extension compilation workflows
+
+Using pip avoids SUSE repository dependency issues sometimes observed on cloud Arm64 images.
+
+
+## Install CPU-only PyTorch
+Install CPU-only PyTorch packages:
+
+```bash
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+```
+
+## Why CPU-only PyTorch is used
+
+GCP Axion VMs are CPU-only systems and do not contain NVIDIA GPUs.
+
+The CPU-only build:
+
+- Reduces package size
+- Avoids unnecessary CUDA dependencies
+- Improves installation stability
+- Matches the Axion hardware architecture
+
+## Verify PyTorch installation
+
+```bash
+python -c "import torch; print(torch.__version__)"
+```
+
+The output is similar to:
+```output
+2.11.0+cpu
+```
+
+Check CUDA availability:
+
+```bash
+python -c "import torch; print(torch.cuda.is_available())"
+```
+
+The output is similar to:
+```output
+False
+```
+
+This is expected because GCP Axion VMs are CPU-only systems.
+
+
+## DeepSpeed limitation on SUSE Arm64
+
+DeepSpeed distributed CPU extensions require newer GCC versions.
+
+Default SUSE Arm64 images typically include:
+
+```bash
+GCC 7.x
+```
+
+However, DeepSpeed native communication extensions require:
+
+```bash
+GCC 9+
+```
+
+DeepSpeed attempts to compile:
+
+```bash
+deepspeed_shm_comm
+```
+
+during launcher initialization.
+
+Because of this limitation, install DeepSpeed in compatibility mode without native extension compilation.
+
+## Install DeepSpeed
+
+DeepSpeed distributed CPU extensions require newer GCC versions.
+
+Since default SUSE Arm64 images use GCC 7.x, install DeepSpeed without native extension compilation.
+
+Export environment variables:
+
+```bash
+export DS_BUILD_OPS=0
+export DS_BUILD_SHM_COMM=0
+export DS_BUILD_CPU_ADAM=0
+export DS_BUILD_AIO=0
+```
+
+## What these variables do
+
+| Variable | Purpose |
+|---|---|
+| DS_BUILD_OPS=0 | Disables native op compilation |
+| DS_BUILD_SHM_COMM=0 | Disables shared memory communication extension |
+| DS_BUILD_CPU_ADAM=0 | Disables CPU Adam optimizer compilation |
+| DS_BUILD_AIO=0 | Disables async I/O extensions |
+
+This prevents DeepSpeed from compiling unsupported native CPU extensions on SUSE Arm64.
+
+## Install DeepSpeed:
+
+```bash
+DS_BUILD_OPS=0 pip install deepspeed
+```
+
+
+## Verify DeepSpeed installation
+
+```bash
+ds_report
+```
+
+The output is similar to:
+
+```output
+[NO] ....... [OKAY]
+```
+
+This is expected on CPU-only Arm64 environments.
+
+
+## Create project directory
+
+```bash
+mkdir ~/deepspeed-demo
+
+cd ~/deepspeed-demo
+```
+
+## Important note
+
+Do NOT run:
+
+```bash
+deepspeed train.py
+```
+
+on this VM because DeepSpeed attempts to compile native CPU communication extensions which require GCC 9 or later.
+
+
+## Troubleshooting
+
+### SUSE repository refresh issue
+
+You may encounter:
+
+```text
+Receive: script died unexpectedly
+```
+
+If this occurs:
+
+- Continue if Python 3.11 is already installed
+- Install Python packages using `pip`
+- Avoid dependency on SUSE development repositories
+
+
+## What you've learned
+
+You have learned how to:
+
+- Verify Arm64 environment
+- Configure Python 3.11
+- Create isolated AI/ML environments
+- Install PyTorch on Arm64
+- Install DeepSpeed in compatibility mode
+- Handle GCC limitations on SUSE Arm64
+- Prepare AI training environments on GCP Axion
+
+
+## Next
+
+You will:
+
+- Build AI training workloads
+- Run neural network training
+- Benchmark Arm64 AI workloads
+- Validate CPU training performance
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/instance.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/instance.md
new file mode 100644
index 0000000000..4133e8f6da
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/instance.md
@@ -0,0 +1,43 @@
+---
+title: Create a Google Axion C4A virtual machine for DeepSpeed
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Set up the virtual machine
+
+In this section, you'll create a Google Axion C4A Arm-based virtual machine (VM) on Google Cloud Platform (GCP). You'll use the `c4a-standard-4` machine type, which provides 4 vCPUs and 16 GB of memory. This VM will host your DeepSpeed tracking server and model serving API.
+
+{{% notice Note %}}For help with GCP setup, see the Learning Path [Getting started with Google Cloud Platform](/learning-paths/servers-and-cloud-computing/csp/google/).{{% /notice %}}
+
+## Configure the C4A virtual machine in Google Cloud Console
+
+To create a virtual machine based on the C4A instance type in the console:
+
+1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
+2. Go to **Compute Engine** > **VM Instances** and select **Create Instance**.
+3. Under **Machine configuration**, populate fields such as **Instance name**, **Region**, and **Zone**.
+4. Set **Series** to `C4A`, then select `c4a-standard-4` for **Machine type**.
+
+![Screenshot of the Google Cloud Console showing the Machine configuration section. The Series dropdown is set to C4A and the machine type c4a-standard-4 is selected#center](images/gcp-vm.png "Configuring machine type to C4A in Google Cloud Console")
+
+5. Under **OS and storage**, select **Change** and then choose an Arm64-based operating system image. For this Learning Path, select **SUSE Linux Enterprise Server**. 
+6. For the license type, choose **Pay as you go**. 
+7. Increase **Size (GB)** from **10** to **100** to allocate sufficient disk space, and then select **Choose**.
+8. Select **Create** to launch the virtual machine.
+
+After the instance starts, select **SSH** next to the VM in the instance list to open a browser-based terminal session.
+
+![Google Cloud Console VM instances page displaying running instance with green checkmark and SSH button in the Connect column#center](images/gcp-pubip-ssh.png "Connecting to a running C4A VM using SSH")
+
+A new browser window opens with a terminal connected to your VM.
+
+![Browser-based SSH terminal connected to the Google Axion C4A VM. The shell prompt confirms that the instance is running and ready for the next step, where you'll install DeepSpeed and its dependencies.#center](images/gcp-shell.png "Terminal session connected to the VM")
+
+## What you've accomplished and what's next
+
+You've now provisioned a Google Axion C4A Arm VM and connected to it using SSH.
+
+Next, you'll install DeepSpeed and the required dependencies on your VM.
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/train-benchmark-deepspeed-arm.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/train-benchmark-deepspeed-arm.md
new file mode 100644
index 0000000000..26461a9aba
--- /dev/null
+++ b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/train-benchmark-deepspeed-arm.md
@@ -0,0 +1,411 @@
+---
+title: Train and Benchmark AI Workloads on GCP Axion (Arm)
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Train and Benchmark AI Workloads on GCP Axion (Arm)
+
+This section demonstrates neural network training and benchmarking on GCP Axion Arm64 processors using PyTorch.
+
+## Learning Objectives
+
+- Create AI training workloads
+- Train neural network models
+- Benchmark CPU workloads
+- Measure Arm64 AI performance
+- Validate large model execution
+
+
+## Activate environment
+Activate the Python virtual environment created during the installation setup.
+
+```bash
+source ~/deepspeed-env/bin/activate
+```
+
+Go to project directory:
+
+```bash
+cd ~/deepspeed-demo
+```
+
+## Baseline AI Training Workload
+
+This section creates and executes a lightweight neural network training workload to validate the AI/ML environment on GCP Axion Arm64 processors.
+
+### Create baseline training script
+
+Create the baseline training script:
+
+```bash
+cat > train.py << 'EOF'
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import time
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+        self.net = nn.Sequential(
+            nn.Linear(128, 256),
+            nn.ReLU(),
+            nn.Linear(256, 64),
+            nn.ReLU(),
+            nn.Linear(64, 1)
+        )
+
+    def forward(self, x):
+        return self.net(x)
+
+model = SimpleModel()
+
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+data = torch.randn(5000, 128)
+target = torch.randn(5000, 1)
+
+start = time.time()
+
+for epoch in range(5):
+
+    total_loss = 0
+
+    for i in range(0, len(data), 32):
+
+        x = data[i:i+32]
+        y = target[i:i+32]
+
+        output = model(x)
+
+        loss = ((output - y) ** 2).mean()
+
+        optimizer.zero_grad()
+
+        loss.backward()
+
+        optimizer.step()
+
+        total_loss += loss.item()
+
+    print(f"Epoch {epoch+1}, Loss: {total_loss}")
+
+end = time.time()
+
+print("Total Training Time:", end - start)
+EOF
+```
+
+### What this script does
+
+The script performs the following tasks:
+
+- Creates a multi-layer neural network using PyTorch
+- Generates synthetic training data
+- Executes forward and backward propagation
+- Optimizes the model using Adam optimizer
+- Measures total training execution time
+
+The model architecture contains:
+
+- Input layer: 128 features
+- Hidden layers: 256 and 64 neurons
+- Output layer: 1 neuron
+
+
+### Execute baseline training
+
+```bash
+python train.py
+```
+
+Expected output:
+
+```output
+Epoch 1, Loss: 155.41862654685974
+Epoch 2, Loss: 146.19861325621605
+Epoch 3, Loss: 130.47488084435463
+Epoch 4, Loss: 100.75305489450693
+Epoch 5, Loss: 65.7514722738415
+Total Training Time: 0.7545099258422852
+```
+
+### Analyze baseline results
+
+Observe the following:
+
+- Loss decreases continuously across epochs
+- The model is learning successfully
+- Training completes in less than one second
+- Axion Arm64 processors efficiently execute small AI workloads
+
+The decreasing loss confirms that:
+
+- Gradient updates are working correctly
+- CPU computation pipeline is stable
+- PyTorch runtime is functioning properly on Arm64
+
+### Benchmark baseline workload
+Measure real execution time:
+
+```bash
+time python train.py | tee pytorch_baseline_result.txt
+```
+
+Example output:
+
+```output
+Epoch 1, Loss: 160.0170536339283
+Epoch 2, Loss: 151.6725998222828
+Epoch 3, Loss: 136.18832343816757
+Epoch 4, Loss: 108.03106728196144
+Epoch 5, Loss: 73.08194716647267
+Total Training Time: 0.7314252853393555
+
+real    0m2.172s
+user    0m3.700s
+sys     0m0.137s
+```
+
+The benchmark output provides:
+
+| Metric | Description |
+|---|---|
+| real | Total wall-clock execution time |
+| user | CPU execution time spent in user space |
+| sys | CPU time spent in kernel operations |
+
+The results indicate:
+
+- Fast execution on Arm64 CPUs
+- Efficient tensor computation
+- Low system overhead
+
+## Large Scale AI Benchmark
+
+This section increases:
+
+- dataset size
+- model complexity
+- CPU workload intensity
+
+This helps evaluate scalable AI training performance on Axion Arm processors.
+
+
+### Create large benchmark workload
+
+```bash
+cat > train_large.py << 'EOF'
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import time
+import os
+
+torch.set_num_threads(os.cpu_count())
+
+class LargeModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+
+        self.net = nn.Sequential(
+            nn.Linear(512, 1024),
+            nn.ReLU(),
+            nn.Linear(1024, 512),
+            nn.ReLU(),
+            nn.Linear(512, 128),
+            nn.ReLU(),
+            nn.Linear(128, 1)
+        )
+
+    def forward(self, x):
+        return self.net(x)
+
+model = LargeModel()
+
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+data = torch.randn(20000, 512)
+target = torch.randn(20000, 1)
+
+start = time.time()
+
+for epoch in range(5):
+
+    total_loss = 0
+
+    for i in range(0, len(data), 64):
+
+        x = data[i:i+64]
+        y = target[i:i+64]
+
+        output = model(x)
+
+        loss = ((output - y) ** 2).mean()
+
+        optimizer.zero_grad()
+
+        loss.backward()
+
+        optimizer.step()
+
+        total_loss += loss.item()
+
+    print(f"Epoch {epoch+1}, Loss: {total_loss}")
+
+end = time.time()
+
+print("Total Training Time:", end - start)
+EOF
+```
+
+### What this workload changes
+
+Compared to the baseline workload:
+
+| Component | Baseline | Large Benchmark |
+|---|---|---|
+| Features | 128 | 512 |
+| Dataset Size | 5,000 | 20,000 |
+| Batch Size | 32 | 64 |
+| Model Complexity | Smaller | Larger |
+
+The benchmark stresses:
+
+- CPU compute capability
+- Memory bandwidth
+- Tensor operation throughput
+- Multi-threaded execution
+
+
+### Run large benchmark
+
+```bash
+time python train_large.py | tee pytorch_large_result.txt
+```
+
+Expected output:
+
+```text
+Epoch 1, Loss: 319.07712411880493
+Epoch 2, Loss: 308.4675619006157
+Epoch 3, Loss: 273.5877128839493
+Epoch 4, Loss: 227.81050024926662
+Epoch 5, Loss: 194.74351280927658
+Total Training Time: 4.878139972686768
+
+real    0m6.346s
+user    0m19.630s
+sys     0m0.251s
+```
+
+### Analyze large workload results
+
+The large benchmark demonstrates:
+
+- Stable execution under higher CPU load
+- Increased training duration due to larger tensors
+- Effective CPU thread utilization
+- Successful Arm64 scaling behavior
+
+Key observations:
+
+- Training remains stable
+- Loss decreases consistently
+- CPU utilization increases significantly
+- Multi-core execution improves performance
+
+## Monitor CPU utilization
+
+Open another terminal.
+
+Run:
+
+```bash
+top
+```
+
+In the first terminal:
+
+```bash
+python train_large.py
+```
+
+Observe:
+
+- CPU usage
+- Memory utilization
+- Python process behavior
+
+
+## Verify generated files
+
+```bash
+ls -lh
+```
+
+The output is similar to:
+
+```output
+environment.txt
+pytorch_baseline_result.txt
+pytorch_large_result.txt
+train.py
+train_large.py
+```
+
+## Benchmark observations
+
+| Workload | Approx Training Time |
+|---|---|
+| Baseline Model | ~0.8 seconds |
+| Large Model | ~5.4 seconds |
+
+These files contain:
+
+| File | Purpose |
+|---|---|
+| train.py | Baseline training workload |
+| train_large.py | Large benchmark workload |
+| pytorch_baseline_result.txt | Baseline benchmark results |
+| pytorch_large_result.txt | Large benchmark results |
+
+## Benchmark Summary
+
+| Workload | Training Time | Observation |
+|---|---|---|
+| Baseline Model | ~0.7–0.8 seconds | Fast lightweight execution |
+| Large Benchmark | ~4.8–5.4 seconds | Higher CPU utilization and larger workload handling |
+
+
+## Result Analysis
+
+The benchmark validates that:
+
+- GCP Axion Arm64 processors can efficiently execute AI workloads
+- PyTorch runs successfully on Arm64 architecture
+- CPU-only AI training is stable on SUSE Arm64
+- Larger workloads scale predictably with increased compute demand
+
+The benchmark also demonstrates:
+
+- Multi-layer neural network execution
+- Tensor computation stability
+- Efficient CPU utilization on Arm64 processors
+
+
+## What you've learned
+
+You have learned how to:
+
+- Create AI training workloads
+- Train neural network models on Arm64
+- Benchmark CPU-based AI workloads
+- Measure training execution performance
+- Validate scalable AI execution on GCP Axion
+- Analyze workload scaling behavior
+- Explore distributed AI training