diff --git a/mnist/README.md b/mnist/README.md index f8e6148e..e4e38854 100644 --- a/mnist/README.md +++ b/mnist/README.md @@ -1,108 +1,170 @@ # MNIST DIGITS CLASSIFICATION EXAMPLE -## Create code repo -- Name: dkube-examples -- Project source: Git -- Git URL: https://github.com/oneconvergence/dkube-examples.git -- Branch: tensorflow - -## Create dataset repo -- Name: mnist -- Dataset source: Other -- URL: https://s3.amazonaws.com/img-datasets/mnist.pkl.gz - - -## Create a model -- Name: mnist -- Keep default for others - - -## Launch Notebook -- Create Jupyterlab IDE with tensorflow framework. -- Select the Code dkube-examples. -- Repos->Inputs->Datasets: select mnist and enter mountpath as /mnist. -- Run workspace/dkube-examples/mnist/train.ipynb -- You can experient in the notebook and develop your code. Once you are ready for a formal run, export your code into python script(s) - -## Run training job - - Runs->+Training Run. - - Code: dkube-examples - - Framework: Tensorflow - - Version: 2.0.0 - - Start-up script: python mnist/train.py - - Repos->Inputs->Datasets: select mnist and enter mountpath as /mnist - - Repos->Outputs->Model: select mnist and enter mountpath as /model - - Submit - -## Katib based Hyperparameter Tuning -1. Create a Run same as explained above, except that now a tuning file also needs to be uploaded in the configuration tab. - - For hyperparameter tuning upload the https://github.com/oneconvergence/dkube-examples/blob/tensorflow/mnist/tuning.yaml under upload tuning definition. - - Submit the run. - -## Tuning.yaml file Details: -1. **objective**: The metric that you want to optimize. -2. **goal** parameter is mandatory in tuning.yaml file. -3. **objectiveMetricName:** Katib uses the objectiveMetricName and additionalMetricNames to monitor how the hyperparameters work with the model. Katib records the value of the best objectiveMetricName metric. -4. **parameters** : The range of the hyperparameters or other parameters that you want to tune for your machine learning (ML) model. -5. **parallelTrialCount**: The maximum number of hyperparameter sets that Katib should train in parallel. The default value is 3. -6. **maxTrialCount**: The maximum number of trials to run. -7. **maxFailedTrialCount**: The maximum number of failed trials before Katib should stop the experiment. -8. **algorithm**: Search algorithm to find the best hyper parameters. Value must be one of following: - + This example uses the `mnist` model to identify a digit from an image. It steps through a simple DKube workflow. + + > **Note** This example runs on DKube V3.x and above + +## 1. Setup Resources + + Before using DKube to experiment, train, and deploy, the resources must be set up. + +### Create Code Repo + + - From the `Code` menu on the left, select `+ Add Code` with the following fields: + - **Name:** `mnist` **(Or choose ``)** + - **Code Source:** `Git` + - **URL:**: `https://github.com/oneconvergence/dkube-examples.git` + - **Branch:** `tensorflow` + - Leave the other fields in their current selection and `Add Code` + +### Create Dataset Repo + + - From the `Datasets` menu, select `+ Add Dataset` with the following fields: + - **Name:** `mnist` **(Or choose `` **(Created during the Code Repo step)** + - **Framework:** `tensorflow` + - **Framework Version:** `2.0.0` + - **Image:** `ocdr/dkube-datascience-tf-cpu-multiuser:v2.0.0-17` + > **Note** The default Tensorflow Image should fill in automatically, but ensure that it is correct

+ - `Repos` tab + - **Inputs** > **Datasets**: `` **(Created during the Dataset Repo step)** + - **Mount Path:** `/mnist` + - Leave the other fields in their current selection and `Submit`

+ - Once the IDE is running and the JupyterLab icon on the right is active, select it to launch a JupyterLab window + - Navigate to workspace/**\**/mnist + - Open `train.ipynb` + - `Run All Cells` from the menu at the top + - Change the `EPOCHS` variable in the 2nd cell "5" and rerun all cells + - You can view the difference in output at the bottom of the script + > **Note** You would normally be developing your code in JupyterLab, and once you were satisfied you would create a Python file from the `ipynb` file. In this example, a Python file is already ready for execution. + +## 3. Work with Training Runs + + Batch training runs can be used to create trained models. + +### Run Training Job + +- From the `Runs` menu, select `+ Run` > `Training` with the following fields: + - `Basic` tab + - **Name:** `` **(Created during the Code Repo step)** + - **Framework:** `tensorflow` + - **Framework Version:** `2.0.0` + - **Image:** `ocdr/dkube-datascience-tf-cpu-multiuser:v2.0.0-17` + > **Note** The default Tensorflow Image should fill in automatically, but ensure that it is correct

+ - **Start-up Command:** `python mnist/train.py` + - `Repos` tab + - **Inputs** > **Datasets**: `` **(Created during the Dataset Repo step)** + - **Mount Path:** `/mnist`

+ - **Outputs** > **Models**: `` **(Created during the Model Repo step)** + - **Mount Path:** `/model` + > **Note** Ensure that you add the Model into the `Outputs` section, and not the `Inputs` section + - Leave the other fields in their current selection and `Submit` + - Your Run will show up from the `Runs` menu screen

+ - Clone the Run by selecting the checkbox and choosing `Clone` from the top buttons + - Leave the `Basic` and `Repos` tabs the same + - On the `Configuration` tab + - Select the `+` button next to `Environment Variables` + - **Key:** `EPOCHS` **(Must be in upper case)** + - **Value:** `5` + - `Submit` + +### Compare Runs + + - Wait for both Runs to `complete` + - From the `Runs` menu, select both Run checkboxes, then select `Compare` button + - Scroll down and choose **Y-Axis:** `train_accuracy` + +### Run Katib-Based Hyperparameter Tuning + + - Go to https://github.com/oneconvergence/dkube-examples/tree/tensorflow/mnist/tuning.yaml + - Select `Raw` + - Right-click & `Save as...` "tuning.yaml"

+ - From the `Runs` menu, select the first Run checkbox, then select `Clone` + - Leave the `Basic` and `Repos` tabs the same + - On the `Configuration` tab + - Select `Upload Tuning Definition` + - Choose the `tuning.yaml` file that you saved + - `Submit`

+ - Wait for Run to complete + - View the results by selecting the Katib icon on the right of the Run line + +#### Tuning.yaml File Details + + - **objective**: The metric that you want to optimize + - **goal** parameter is mandatory in tuning.yaml file + - **objectiveMetricName:** Katib uses the objectiveMetricName and additionalMetricNames to monitor how the hyperparameters work with the model. Katib records the value of the best objectiveMetricName metric. + - **parameters** : The range of the hyperparameters or other parameters that you want to tune for your machine learning (ML) model + - **parallelTrialCount**: The maximum number of hyperparameter sets that Katib should train in parallel. The default value is 3. + - **maxTrialCount**: The maximum number of trials to run + - **maxFailedTrialCount**: The maximum number of failed trials before Katib should stop the experiment + - **algorithm**: Search algorithm to find the best hyper parameters. Value must be one of following: - random - bayesianoptimization - hyperband - cmaes - enas -## Deploy Model (DKube version 2.1.x.x) -- Repos->Models->mnist: select a model version -- Deploy -- Name: mnist -- Type: Test -- Transformer: True -- Transformer script: mnist/transformer.py -- Submit - -## Publish and Deploy Model (Dkube version 2.2.x.x) -- Repos->Models->mnist: select a model version -- Click on Publish model icon under ACTIONS column. -- Name: mnist -- Transformer: True -- Transformer script: mnist/transformer.py -- Submit -### Deploy model -- Click on Model catalog and select the published model. -- Click on the deploy model icon under ACTIONS column. -- Enter the deploy model name and select CPU and click Submit. -- Check in Model Serving and wait for the deployed model to change to running state. - -## Publish and Deploy Model (Dkube version 3.0.x.x) -- Models->mnist: select a model version -- Click on Publish model icon under ACTIONS column -- Transformer: True -- Transformer script: mnist/transformer.py -- Submit -### Deploy model -- Click on Models in the navigation pane -- Click on the drop down next to 'Owned by me' and select 'Published' -- Click on the published model 'mnist' -- Select the published version and click on the deploy model icon under ACTIONS column -- Enter the deploy model name, select Deployment / Test and select Deploy using / CPU. Click Submit -- Check in Deployments and wait for the deployed model to change to running state - -## Test inference -- Go to - - Deployments in 2.1.x.x version - - Model Serving in 2.2.x.x version - - Deployments in 3.0.x.x version -- Copy the prediction Endpoint for the model -- Create a browser tab and go to https:///inference -- Paste the Endpoint URL -- Copy Auth token from Developer settings in Dkube page and Paste in inference page -- Choose mnist -- Upload 3.png from repo -- Click predict - -## Automate using pipelines -Run this [pipeline](https://github.com/oneconvergence/dkube-examples/blob/tensorflow/mnist/pipeline.ipynb) to automate training and serving using kubeflow pipelines. +## 4. Deploy Model + + After the best model is identified, it can be deployed for inference serving. + +- From `Models` menu, select `` **(Created during Model Repo step)** +- Choose the highest version of the Model +- Select the `Lineage` tab + - This provides information on the inputs and outputs of the Model

+- Select the `Metrics` tab + - This provides the metrics associated with the Model

+- Go back to `Models` top menu, and reselect the Model +- Select the `Deploy` icon on the right of the newest Model + - **Name:** `` **(Your choice)** + - **Deployment:** `Production` + - **Deploy Using:** `CPU` + - **Transformer:** `Check Box` + - **Transformer Script:** `mnist/transformer.py` + - Leave the other fields in their current selection an `Submit`

+ - The deployed Model will appear in the `Deployments` menu screen + +## 5. Train & Deploy with Kubeflow Pipelines + + The training and deployment steps can be automated using Kubeflow Pipelines. + + - Open the JupyterLab window + - Navigate to workspace/**\**/mnist + - Open `pipeline.ipynb`

+ - If you chose the default value for all of your repos (`mnist`) then `Run all Cells`

+ - If you chose different repo names + - In the 2nd cell, labeled `User Variables`, modify the repo names with your chosen names + - `Run All Cells` from the menu at the top

+ - From the `Pipelines` menu on the left + - Select `Runs` tab + - Your new pipeline will be executing + - Select the pipeline name to see its progress + +## 6. Test inference + + - Create a browser tab and go to https:///inference + - Paste the Endpoint URL from `Deployments` + - Copy Auth token from `Developer settings` in DKube page and paste in + - Choose `mnist` for model type + - Download `3.png` from repo + - Click `Predict` + > **Note** The prediction may time out waiting for the pod to start - select `wait` if prompted diff --git a/mnist/pipeline.ipynb b/mnist/pipeline.ipynb index 8f411395..eb50cf32 100644 --- a/mnist/pipeline.ipynb +++ b/mnist/pipeline.ipynb @@ -8,7 +8,35 @@ "source": [ "import os,sys\n", "import kfp\n", - "import json" + "import json\n", + "import random, string" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### User Variables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Fill in this cell with the repo names that you have chosen\n", + "\n", + "code_repo = \"mnist\"\n", + "dataset_repo = \"mnist\"\n", + "model_repo = \"mnist\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get/Set Environment Variables" ] }, { @@ -23,8 +51,15 @@ "token = os.getenv(\"DKUBE_USER_ACCESS_TOKEN\")\n", "project_id = os.environ.get(\"DKUBE_PROJECT_ID\")\n", "project_name = os.environ.get(\"DKUBE_PROJECT_NAME\",\"mnist\")\n", - "client = kfp.Client(host=os.getenv(\"KF_PIPELINES_ENDPOINT\"), existing_token=token, namespace=os.getenv(\"USERNAME\"))\n", - "run_id = 0" + "user_name = os.environ.get(\"DKUBE_USER_LOGIN_NAME\")\n", + "client = kfp.Client(host=os.getenv(\"KF_PIPELINES_ENDPOINT\"), existing_token=token, namespace=os.getenv(\"USERNAME\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Pipeline Definitions" ] }, { @@ -37,11 +72,11 @@ " name='dkube-mnist-pl',\n", " description='sample mnist pipeline with dkube components'\n", ")\n", - "def mnist_pipeline(program='dkube-examples', dataset='mnist', model='mnist', project_id='abc123'):\n", + "def mnist_pipeline(program=code_repo, dataset=dataset_repo, model=model_repo, project_id=project_id):\n", "\n", - " train = dkube_training_op(container='{\"image\":\"ocdr/dkube-datascience-tf-cpu:v2.0.0\"}',\n", + " train = dkube_training_op(container='{\"image\":\"ocdr/dkube-datascience-tf-cpu:v2.0.0-17\"}',\n", " framework=\"tensorflow\", version=\"2.0.0\",\n", - " tags=json.dumps([f\"project:{project_id}\"]),\n", + " tags=tags,\n", " program=str(program), run_script=\"python mnist/train.py\",\n", " datasets=json.dumps([str(dataset)]), outputs=json.dumps([str(model)]),\n", " input_dataset_mounts='[\"/mnist\"]',\n", @@ -50,11 +85,19 @@ " auth_token=token)\n", "\n", " serving = dkube_serving_op(model=train.outputs['artifact'], device='cpu', \n", + " name=deployment_name,\n", " serving_image='{\"image\":\"ocdr/tensorflowserver:2.0.0\"}',\n", - " transformer_image='{\"image\":\"ocdr/dkube-datascience-tf-cpu:v2.0.0\"}',\n", + " transformer_image='{\"image\":\"ocdr/dkube-datascience-tf-cpu:v2.0.0-17\"}',\n", " transformer_project=str(program),\n", - " transformer_code='mnist/transformer.py', auth_token=token).after(train)\n", - " " + " transformer_code='mnist/transformer.py', auth_token=token).after(train)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Pipeline, Deployment & Run names using Base and Random digits" ] }, { @@ -63,10 +106,30 @@ "metadata": {}, "outputs": [], "source": [ - "run_name = f\"[{project_name}] Run{run_id}\"\n", - "experiment = f\"[{project_name}] experiment\"\n", - "client.create_run_from_pipeline_func(mnist_pipeline, run_name=run_name, experiment_name=experiment, arguments={\"project_id\":project_id})\n", - "run_id += 1" + "# Create a random set of digits for the names\n", + "res = ''.join(random.choices(string.ascii_lowercase + string.digits, k=4))\n", + "\n", + "# Create the deployment, experiment, & run names based on project & user\n", + "if project_id:\n", + " tags = json.dumps([f\"project:{project_id}\"])\n", + " run_name = f\"[{project_name}] {user_name}:mnist-pl-%s\"%res\n", + " experiment = f\"[{project_name}] mnist\"\n", + " pipeline_name = f\"[{project_name}] mnist-Pipeline.zip\"\n", + " deployment_name = f\"{user_name}-mnist-pl-%s\"%res\n", + "else:\n", + " tags = []\n", + " run_name = f\"{user_name}:mnist-pl-%s\"%res\n", + " experiment = \"default\"\n", + " pipeline_name = \"mnist-Pipeline.zip\"\n", + " deployment_name = f\"{user_name}-mnist-pl-%s\"%res" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create the Pipeline" ] }, { @@ -75,10 +138,34 @@ "metadata": {}, "outputs": [], "source": [ - "#generate & upload pipeline (Optional)\n", + "# Create the pipeline\n", + "client.create_run_from_pipeline_func(mnist_pipeline, run_name=run_name, experiment_name=experiment, arguments={})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate & upload pipeline\n", "import kfp.compiler as compiler\n", - "compiler.Compiler().compile(mnist_pipeline, f\"[{project_name}]-pipeline.zip\")\n", - "client.upload_pipeline(f\"[{project_name}]-pipeline.zip\")" + "compiler.Compiler().compile(mnist_pipeline, pipeline_name)\n", + "\n", + "# Upload Pipeline to DKube if it does not exist\n", + "pipeline_id = client.get_pipeline_id(pipeline_name)\n", + "\n", + "if pipeline_id:\n", + " print(\"Pipeline already exists within DKube, will use existing version\")\n", + "else:\n", + " client.upload_pipeline(pipeline_name)" ] } ],