diff --git a/source/_static/images/examples/cuml-ray-hpo/active-nvdashboard.png b/source/_static/images/examples/cuml-ray-hpo/active-nvdashboard.png new file mode 100644 index 00000000..44587001 Binary files /dev/null and b/source/_static/images/examples/cuml-ray-hpo/active-nvdashboard.png differ diff --git a/source/_static/images/examples/cuml-ray-hpo/final_trial_status.png b/source/_static/images/examples/cuml-ray-hpo/final_trial_status.png new file mode 100644 index 00000000..faec6a06 Binary files /dev/null and b/source/_static/images/examples/cuml-ray-hpo/final_trial_status.png differ diff --git a/source/_static/images/examples/cuml-ray-hpo/nvdashboard.png b/source/_static/images/examples/cuml-ray-hpo/nvdashboard.png new file mode 100644 index 00000000..1dfaecfc Binary files /dev/null and b/source/_static/images/examples/cuml-ray-hpo/nvdashboard.png differ diff --git a/source/_static/images/examples/cuml-ray-hpo/ray-dashboard.png b/source/_static/images/examples/cuml-ray-hpo/ray-dashboard.png new file mode 100644 index 00000000..ecff87df Binary files /dev/null and b/source/_static/images/examples/cuml-ray-hpo/ray-dashboard.png differ diff --git a/source/examples/cuml-ray-hpo/notebook.ipynb b/source/examples/cuml-ray-hpo/notebook.ipynb new file mode 100644 index 00000000..147aae5e --- /dev/null +++ b/source/examples/cuml-ray-hpo/notebook.ipynb @@ -0,0 +1,560 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6d73b12e-ebaf-42a1-8019-7e9c8f948f7a", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "library/cuml", + "library/ray", + "workflow/hpo", + "workflow/randomforest" + ] + }, + "source": [ + "# HPO for Random Forest with Ray Tune and cuML" + ] + }, + { + "cell_type": "markdown", + "id": "a930ecf7", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "cuml", + "ray", + "hpo" + ] + }, + "source": [ + "This notebook demonstrates how to perform hyperparameter optimization (HPO) for a Random Forest classifier using Ray Tune and cuML. We'll use Ray Tune to efficiently search through hyperparameter combinations while leveraging cuML's GPU-accelerated Random Forest implementation for faster training.\n", + "\n", + "## Problem Overview\n", + "\n", + "We're solving a binary classification problem using the airline dataset, where we predict flight delays. The goal is to find the optimal hyperparameters (number of estimators, max depth, and max features) that maximize the model's accuracy. Ray Tune will orchestrate multiple training trials in parallel, each testing different hyperparameter combinations, while cuML provides GPU acceleration for each individual model training." + ] + }, + { + "cell_type": "markdown", + "id": "bee88f22", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "### Setup Instructions\n", + "\n", + "#### Brev\n", + "\n", + "```{docref} /cloud/nvidia/brev\n", + "For the purpose of this example, follow Option 1 (Setting up your Brev GPU Environment) in the Brev Instance Setup section:\n", + "- Create a GPU environment with 4 L4 GPUs\n", + "- Make sure to include Jupyter in your setup\n", + "- Wait until the \"Open Notebook\" button is flashing\n", + "- Open the Notebook and navigate to a Jupyter terminal\n", + "```\n", + "\n", + "#### Environment Setup\n", + "\n", + "`````{tab-set}\n", + "\n", + "````{tab-item} uv\n", + ":sync: uv\n", + "\n", + "1. Check Your CUDA Version in the Jupyter terminal\n", + "\n", + "Before installing dependencies, verify your CUDA version (shown in the top right corner of the output):\n", + "\n", + "```bash\n", + "nvidia-smi\n", + "```\n", + "\n", + "2. Create a file named `pyproject.toml` and copy the content below\n", + "\n", + "Based on your CUDA version you have, modify the `cuML` package:\n", + "\n", + "- **CUDA 12.x**: Use `cuml-cu12==26.2.*`\n", + "- **CUDA 13.x**: Change to `cuml-cu13==26.2.*`\n", + "\n", + "\n", + "The `pyproject.toml` file should look like this:\n", + "\n", + "```toml\n", + "[project]\n", + "name = \"ray-cuml\"\n", + "version = \"0.1.0\"\n", + "requires-python = \"==3.13.*\"\n", + "dependencies = [\n", + " \"ray[default]==2.53.0\",\n", + " \"ray[data]==2.53.0\",\n", + " \"ray[train]==2.53.0\",\n", + " \"ray[tune]==2.53.0\",\n", + " \"cuml-cu12==26.2.*\", # Change cu12 to cu13 if you have CUDA 13.x\n", + " \"jupyterlab-nvdashboard\",\n", + " \"ipykernel\",\n", + " \"ipywidgets\",\n", + "]\n", + "```\n", + "\n", + "3. Install Dependencies\n", + "\n", + "```bash\n", + "uv sync\n", + "```\n", + "\n", + "#### Enable Jupyter nvdashboard\n", + "\n", + "We can use the `jupyterlab-nvdashboard` extension monitor GPU usage in Jupyter\n", + "\n", + "To be able to enable the `nvdashboard` jupyter extension, installed in as part of the setup, \n", + "\n", + "1. Restart Jupyter: `sudo systemctl restart jupyter.service`\n", + "2. Exit and reopen the notebook or refresh your browser\n", + "\n", + "````\n", + "\n", + "\n", + "````{tab-item} conda\n", + ":sync: conda\n", + "\n", + "When installing libraries with conda each individual CUDA library can be installed as a conda package. So we don't need to ensure any of the CUDA libraries already exist in `/usr/local/cuda`.\n", + "\n", + "1. Install JupyterLab nvdashboard Extension\n", + "\n", + "**Important**: Even though you're using conda for this setup, the JupyterLab nvdashboard extension must be installed using `uv` (which is already available in the system). This is because JupyterLab extensions need to be installed where the JupyterLab server runs, not where individual kernels run. In the current setup, the JupyterLab server runs from `/home/ubuntu/.venv/` (system uv environment), so we need to install the extension using `uv`:\n", + "\n", + "```bash\n", + "uv pip install jupyterlab_nvdashboard\n", + "\n", + "sudo systemctl restart jupyter.service\n", + "```\n", + "\n", + "Exit and reopen the notebook, and go back to a Jupyter terminal. \n", + "\n", + "2. Install Miniforge\n", + "\n", + "If you prefer to use `conda`, you need to install it first:\n", + "\n", + "```bash\n", + "curl -L -O \"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\"\n", + "\n", + "bash Miniforge3-$(uname)-$(uname -m).sh # Follow the prompts and choose yes to update your shell profile to automatically initialize conda\n", + "```\n", + "\n", + "```{note}\n", + "You'll need to source your `.bashrc` to make `conda` available in your current shell:\n", + "```\n", + "\n", + "```bash\n", + "source ~/.bashrc\n", + "```\n", + "\n", + "3. Check Your CUDA Version\n", + "\n", + "Check the CUDA version available on your system:\n", + "\n", + "```bash\n", + "nvidia-smi\n", + "```\n", + "\n", + "4. Create Environment File\n", + "\n", + "Create a file named `env.yaml` and copy the content below. Modify the `cuda-version` to match your CUDA version (e.g., `12.8` or `13.0`):\n", + "\n", + "```yaml\n", + "name: ray-cuml\n", + "channels:\n", + " - rapidsai\n", + " - conda-forge\n", + "dependencies:\n", + " - python=3.13\n", + " - \"ray-default=2.53.0\"\n", + " - \"ray-data=2.53.0\"\n", + " - \"ray-train=2.53.0\"\n", + " - \"ray-tune=2.53.0\"\n", + " - cuml=26.02\n", + " - \"cuda-version=12.8\" # Change to match your CUDA version (e.g., 12.8 or 13.0)\n", + " - ipykernel\n", + " - ipywidgets\n", + "```\n", + "\n", + "5. Create and Activate Conda Environment\n", + "\n", + "Create a new conda environment using the `env.yaml` file:\n", + "\n", + "```bash\n", + "conda env create -f env.yaml\n", + "\n", + "conda activate ray-cuml\n", + "```\n", + "\n", + "6. Install Jupyter Kernel\n", + "\n", + "Install the Jupyter kernel for this environment:\n", + "\n", + "```bash\n", + "python -m ipykernel install --user --name ray-cuml --display-name \"Python (ray-cuml)\" --env PATH \"$CONDA_PREFIX/bin:$PATH\"\n", + "```\n", + "\n", + "After running this, refresh your browser, open a new notebook and select the \"Python (ray-cuml)\" kernel.\n", + "\n", + "````\n", + "\n", + "`````" + ] + }, + { + "cell_type": "markdown", + "id": "0bee8292", + "metadata": {}, + "source": [ + "## Getting Started\n", + "\n", + "Download this notebook and the `get_data.py`script from the side panel and upload them to Jupyter, then run through the notebook. \n", + "\n", + "You should now see a button on the left panel that looks like a GPU, which will give you several dashboards to choose from. For the sake of this example, we will look at GPU memory and GPU Utilization.\n", + "\n", + "![GPU Dashboard Button](../../_static/images/examples/cuml-ray-hpo/nvdashboard.png)\n" + ] + }, + { + "cell_type": "markdown", + "id": "28930e22", + "metadata": {}, + "source": [ + "### Data Preparation\n", + "\n", + "Make sure the `get_data.py` script is the same directory that current jupyter working directory. We will use this script to get the airline dataset.\n", + "\n", + "The script supports both a small dataset (for quick testing) and a full dataset (20M rows). By default, it downloads the small dataset. Use the `--full-dataset` flag for the complete dataset. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "497cff32", + "metadata": {}, + "outputs": [], + "source": [ + "! python get_data.py --full-dataset ## for a smaller dataset remove --full-dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04ef550b", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import ray\n", + "from cuml.ensemble import RandomForestClassifier\n", + "from cuml.metrics import accuracy_score\n", + "from ray import tune\n", + "from ray.tune import RunConfig, TuneConfig\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa20b00a", + "metadata": {}, + "outputs": [], + "source": [ + "def train_rf(config, data_dict):\n", + " \"\"\"\n", + " Training function for Ray Tune.\n", + "\n", + " Args:\n", + " config: Dictionary of hyperparameters from Ray Tune\n", + " data_dict: Dictionary containing training and test data (NumPy arrays)\n", + " \"\"\"\n", + " # Extract data\n", + " X_train = data_dict[\"X_train\"]\n", + " X_test = data_dict[\"X_test\"]\n", + " y_train = data_dict[\"y_train\"]\n", + " y_test = data_dict[\"y_test\"]\n", + "\n", + " # Initialize cuML Random Forest with hyperparameters from config\n", + " rf = RandomForestClassifier(\n", + " n_estimators=config[\"n_estimators\"],\n", + " max_depth=config[\"max_depth\"],\n", + " max_features=config[\"max_features\"],\n", + " random_state=42,\n", + " )\n", + "\n", + " # Train the model\n", + " rf.fit(X_train, y_train)\n", + "\n", + " # Evaluate on test set\n", + " predictions = rf.predict(X_test)\n", + "\n", + " # Calculate accuracy using cuML's metric function\n", + " score = accuracy_score(y_test, predictions)\n", + "\n", + " # Report metrics back to Ray Tune\n", + " return {\"accuracy\": score}" + ] + }, + { + "cell_type": "markdown", + "id": "b7fce0d3", + "metadata": {}, + "source": [ + "## Ray Tune Hyperparameter Search\n", + "\n", + "Now we'll set up Ray Tune to search for optimal hyperparameters. Ray Tune will run multiple trials in parallel, each testing different combinations of hyperparameters. Each trial will train a cuML Random Forest model on a GPU and evaluate its performance.\n", + "\n", + "**Important**: Modify the following according to your setup:\n", + "- `ray.init()` parameters: Adjust `num_cpus` and `num_gpus` based on your available resources if you are not using the Brev instance indicated. \n", + "- `storage_path` in `RunConfig`: Set a valid local path to save Ray Tune results\n", + "- `resources` in `tune.with_resources()`: Configure CPU and GPU allocation per trial\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a53ea8d", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize Ray with resource constraints\n", + "# Note: If you see a FutureWarning about RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO, that's okay -\n", + "# it's just informing you about future Ray behavior changes and doesn't affect functionality.\n", + "ray.init(num_cpus=8, num_gpus=4)\n", + "\n", + "# use airlines_small.parquet if you downloaded the small dataset\n", + "df = pd.read_parquet(\"data/airlines.parquet\")\n", + "\n", + "# Define the target label\n", + "label = \"ArrDelayBinary\"\n", + "\n", + "# Prepare features and target\n", + "X = df.drop(columns=[label]) # All columns except the target\n", + "y = df[label] # Just the target column\n", + "\n", + "\n", + "# Split into train and test sets\n", + "X_train, X_test, y_train, y_test = train_test_split(\n", + " X, y, test_size=0.2, random_state=42\n", + ")\n", + "\n", + "\n", + "# Store data in a dictionary to pass to training function\n", + "data_dict = {\"X_train\": X_train, \"X_test\": X_test, \"y_train\": y_train, \"y_test\": y_test}" + ] + }, + { + "cell_type": "markdown", + "id": "2c3cf300", + "metadata": {}, + "source": [ + "**Access Ray Dashboard**: The dashboard is available at `http://127.0.0.1:8265` on the Brev instance. To access it from your local machine, run in your local terminal:\n", + " \n", + "If you haven't already, make sure to run `brev login` in your terminal before executing the port-forward command below.\n", + " \n", + "```bash\n", + "brev port-forward -p 8265:8265\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "9d1cf88d", + "metadata": {}, + "source": [ + "```{note}\n", + "Before running the code below, make sure to modify the `storage_path` in the `RunConfig` to your desired location where Ray Tune results will be saved.\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b656bfd8", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Define hyperparameter search space\n", + "search_space = {\n", + " \"n_estimators\": tune.grid_search([50, 100]),\n", + " \"max_depth\": tune.grid_search([20, 40]),\n", + " \"max_features\": tune.grid_search([0.5, 1.0]),\n", + "}\n", + "\n", + "# Using default random search algorithm\n", + "tune_config = TuneConfig(\n", + " metric=\"accuracy\",\n", + " mode=\"max\",\n", + ")\n", + "\n", + "run_config = RunConfig(\n", + " name=\"rf_hyperparameter_tuning_real_data\",\n", + " storage_path=os.path.abspath(\"output/ray_results\"),\n", + ")\n", + "\n", + "# Create a trainable with resources\n", + "trainable = tune.with_resources(\n", + " tune.with_parameters(train_rf, data_dict=data_dict),\n", + " resources={\"cpu\": 2, \"gpu\": 1}, # Each trial uses 1 GPU and 2 CPUs\n", + ")\n", + "\n", + "# Run the hyperparameter tuning\n", + "tuner = tune.Tuner(\n", + " trainable,\n", + " param_space=search_space,\n", + " tune_config=tune_config,\n", + " run_config=run_config,\n", + ")\n", + "\n", + "results = tuner.fit()\n", + "\n", + "# Get the best result\n", + "best_result = results.get_best_result(metric=\"accuracy\", mode=\"max\")" + ] + }, + { + "cell_type": "markdown", + "id": "ae712729", + "metadata": {}, + "source": [ + "#### Dashboard action \n", + "\n", + "While the hyperparameter tuning is running, you should see activity on the nvdashboard in the notebook:\n", + "\n", + "![Active nvdashboard](../../_static/images/examples/cuml-ray-hpo/active-nvdashboard.png)\n", + "\n", + "and if you check the Ray dashboard, on the cluster tab you'll see:\n", + "\n", + "![Ray Dashboard](../../_static/images/examples/cuml-ray-hpo/ray-dashboard.png)\n" + ] + }, + { + "cell_type": "markdown", + "id": "a46c4db4", + "metadata": {}, + "source": [ + "When it completes you will notice that all the trials status are marked as `TERMINATED`, for the example above the whole HPO took ~13 min\n", + "\n", + "![Final Trial Status](../../_static/images/examples/cuml-ray-hpo/final_trial_status.png)\n" + ] + }, + { + "cell_type": "markdown", + "id": "7a071939", + "metadata": {}, + "source": [ + "````{note}\n", + "When running this notebook with a Conda environment, you may see messages like the following appear in your output while Ray hyperparameter trials are running:\n", + " \n", + "```\n", + "(raylet) I0000 00:00:1770938640.198717 34590 chttp2_transport.cc:1182] ipv4:10.128.0.35:33125: Got goaway [2]\n", + "err=UNAVAILABLE:GOAWAY received; Error code: 2; Debug Text: Cancelling all calls {grpc_status:14, http2_error:2,\n", + "created_time:\"2026-02-12T23:24:00.198711281+00:00\"}\n", + "```\n", + " \n", + "These types of messages can safely be ignored—they do not affect the end result of the notebook or the hyperparameter tuning process.\n", + "````" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7137255d", + "metadata": {}, + "outputs": [], + "source": [ + "# Display results\n", + "\n", + "print(\"Best hyperparameters found:\")\n", + "print(f\" n_estimators: {best_result.config['n_estimators']}\")\n", + "print(f\" max_depth: {best_result.config['max_depth']}\")\n", + "print(f\" max_features: {best_result.config['max_features']}\")\n", + "print(f\"Best test accuracy: {best_result.metrics['accuracy']:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "9447538c", + "metadata": {}, + "source": [ + "```text\n", + "Best hyperparameters found:\n", + " n_estimators: 100\n", + " max_depth: 40\n", + " max_features: 0.5\n", + "Best test accuracy: 0.8855\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "ba4c4997", + "metadata": {}, + "source": [ + "### Clean up Ray results directory" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12da1849", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import shutil\n", + "\n", + "ray_results_path = \"output/ray_results\"\n", + "if os.path.exists(ray_results_path):\n", + " print(f\"Cleaning Ray results directory: {ray_results_path}\")\n", + " shutil.rmtree(ray_results_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "507ad5e2", + "metadata": {}, + "outputs": [], + "source": [ + "# Shutdown the Ray cluster\n", + "ray.shutdown()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/source/examples/cuml-ray-hpo/setup/env.yaml b/source/examples/cuml-ray-hpo/setup/env.yaml new file mode 100644 index 00000000..c5fc04ea --- /dev/null +++ b/source/examples/cuml-ray-hpo/setup/env.yaml @@ -0,0 +1,14 @@ +name: ray-cuml +channels: + - rapidsai + - conda-forge +dependencies: + - python=3.13 + - "ray-default=2.53.0" + - "ray-data=2.53.0" + - "ray-train=2.53.0" + - "ray-tune=2.53.0" + - cuml=26.02 + - "cuda-version=13.0" + - ipykernel + - ipywidgets diff --git a/source/examples/cuml-ray-hpo/setup/get_data.py b/source/examples/cuml-ray-hpo/setup/get_data.py new file mode 100644 index 00000000..bbab50e1 --- /dev/null +++ b/source/examples/cuml-ray-hpo/setup/get_data.py @@ -0,0 +1,71 @@ +import argparse +import os +from urllib.request import urlretrieve + +# If script is in setup/, use parent directory; otherwise use script directory or cwd +_script_dir = os.path.dirname(os.path.abspath(__file__)) +if os.path.basename(_script_dir) == "setup": + # Script is in setup/ directory, use parent directory + _data_dir = os.path.join(os.path.dirname(_script_dir), "data") +else: + # Script is not in expected location, use current working directory + _data_dir = os.path.join(os.getcwd(), "data") + + +def prepare_dataset(use_full_dataset=False): + """ + Download the airline dataset. + + Parameters + ---------- + use_full_dataset : bool, default=False + If True, downloads the full dataset (20M rows). + If False, downloads the small dataset. + """ + data_dir = _data_dir + + # Set filename based on dataset size + if use_full_dataset: + file_name = "airlines.parquet" + url = "https://data.rapids.ai/cloud-ml/airline_20000000.parquet" + else: + file_name = "airlines_small.parquet" + url = "https://data.rapids.ai/cloud-ml/airline_small.parquet" + + parquet_name = os.path.join(data_dir, file_name) + + if os.path.isfile(parquet_name): + print(f" > File already exists. Ready to load at {parquet_name}") + else: + # Ensure folder exists + os.makedirs(data_dir, exist_ok=True) + + def data_progress_hook(block_number, read_size, total_filesize): + if (block_number % 1000) == 0: + print( + f" > percent complete: { 100 * ( block_number * read_size ) / total_filesize:.2f}\r", + end="", + ) + return + + urlretrieve( + url=url, + filename=parquet_name, + reporthook=data_progress_hook, + ) + + print(f" > Download complete {file_name}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Download airline dataset for cuML Ray HPO example" + ) + parser.add_argument( + "--full-dataset", + action="store_true", + help="Download the full dataset (20M rows) instead of the small dataset", + ) + args = parser.parse_args() + + prepare_dataset(use_full_dataset=args.full_dataset) diff --git a/source/examples/cuml-ray-hpo/setup/pyproject.toml b/source/examples/cuml-ray-hpo/setup/pyproject.toml new file mode 100644 index 00000000..275c80f7 --- /dev/null +++ b/source/examples/cuml-ray-hpo/setup/pyproject.toml @@ -0,0 +1,14 @@ +[project] +name = "ray-cuml" +version = "0.1.0" +requires-python = "==3.13.*" +dependencies = [ + "ray[default]==2.53.0", + "ray[data]==2.53.0", + "ray[train]==2.53.0", + "ray[tune]==2.53.0", + "cuml-cu12==26.2.*", + "jupyterlab-nvdashboard", + "ipykernel", + "ipywidgets" +] diff --git a/source/examples/index.md b/source/examples/index.md index 737b338c..2dcc2c27 100644 --- a/source/examples/index.md +++ b/source/examples/index.md @@ -25,4 +25,5 @@ cuml-snowflake-nb/notebook rapids-coiled-cudf/notebook rapids-morpheus-pipeline/notebook lulc-classification-gpu/notebook +cuml-ray-hpo/notebook ```