Dear team,
I have a pipeline with a sweep component that stopped working and gives a overall failure because the sweep step never initiates so it leaves me without error message or logs.
The command with extension:
az ml job create --file ./pipelines/pipeline_demandmodel_hp.yml
az version:
{
"azure-cli": "2.42.0",
"azure-cli-core": "2.42.0",
"azure-cli-telemetry": "1.0.8",
"extensions": {
"ml": "2.12.1"
}
}
within the environment I have azure-ai-ml==1.1.0
I'm expecting the pipeline to produce child runs and trials for the parameters as it did a month ago but instead it gets stuck on never initiating the sweep step at all and after a while will 'fail'. I tried with a registered data set as in put as well as the data passed on from previous step (which will complete with green tick) and both have the same issue.

this is the sweep step:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
display_name: Demand Hyperparameter tuning Pipeline
description: Pipeline prepares data and finds best set of parameters
experiment_name: demand_model_demo
type: pipeline
settings:
default_compute: azureml:train
default_datastore: azureml:spot_train
inputs:
model_input:
type: uri_folder
path: azureml:test_input_hp@latest
mode: ro_mount
jobs:
sweep_step:
type: sweep
inputs:
data: ${{parent.inputs.model_input}}
start_new_run: True
register_model: False
gamma: 0
sample_weights: True
reg_alpha: 0
reg_lambda: 1
outputs:
data_out:
mode: rw_mount
sampling_algorithm: bayesian
trial: ../components/component_train_extraparam.yaml
search_space:
learning_rate:
type: choice
values: [0.05, 0.1, 0.15]
max_depth:
type: choice
values: [5, 7, 10, 15, 20]
n_estimators:
type: choice
values: [70, 100, 120, 150]
max_delta_step:
type: uniform
min_value: 0.0
max_value: 3.0
objective:
goal: minimize
primary_metric: probability_difference
limits:
max_total_trials: 50
max_concurrent_trials: 4
timeout: 14400
#######
Component:
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_demandmodel
display_name: Training Demand Model
type: command
inputs:
data:
type: uri_folder
start_new_run:
type: string
default: True
register_model:
type: string
default: False
learning_rate:
type: number
default: 0.1
n_estimators:
type: integer
default: 130
max_depth:
type: integer
default: 10
max_delta_step:
type: number
default: 0
sample_weights:
type: string
default: True
gamma:
type: number
default: 0
reg_alpha:
type: number
default: 0
reg_lambda:
type: number
default: 1
outputs:
data_out:
type: uri_folder
code: ../
environment: azureml:optimiser@latest
is_deterministic: false
command: >-
python aml_train.py
--data ${{inputs.data}}
--data_out ${{outputs.data_out}}
--start_new_run ${{inputs.start_new_run}}
--register_model ${{inputs.register_model}}
--learning_rate ${{inputs.learning_rate}}
--n_estimators ${{inputs.n_estimators}}
--max_depth ${{inputs.max_depth}}
--max_delta_step ${{inputs.max_delta_step}}
--sample_weights ${{inputs.sample_weights}}
--gamma ${{inputs.gamma}}
--reg_alpha ${{inputs.reg_alpha}}
--reg_lambda ${{inputs.reg_lambda}}
Dear team,
I have a pipeline with a sweep component that stopped working and gives a overall failure because the sweep step never initiates so it leaves me without error message or logs.
The command with extension:
az ml job create --file ./pipelines/pipeline_demandmodel_hp.yml
az version:
{
"azure-cli": "2.42.0",
"azure-cli-core": "2.42.0",
"azure-cli-telemetry": "1.0.8",
"extensions": {
"ml": "2.12.1"
}
}
within the environment I have azure-ai-ml==1.1.0
I'm expecting the pipeline to produce child runs and trials for the parameters as it did a month ago but instead it gets stuck on never initiating the sweep step at all and after a while will 'fail'. I tried with a registered data set as in put as well as the data passed on from previous step (which will complete with green tick) and both have the same issue.
this is the sweep step:
#######
Component: