Conversation
| "# learn more about options to configure the output, run 'help(OutputFileDatasetConfig)'\n", | ||
| "output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))" | ||
| "output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))\n", | ||
| "# Why not call this class Dataset.Output.File.OutputFileConfig ? so that in the modules hierarchy it makes more sense..." |
There was a problem hiding this comment.
We are trying to avoid make it feels like a new output dataset concept, instead just a config abt how to outputdata.
@rongduan-zhu thoughts on the module hierarchy comments Olivier has here?
There was a problem hiding this comment.
Fair enough, there are already quite a few options with data in general...
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Can we create actual datasets out of these output datasets? Or would that have to be\n", |
There was a problem hiding this comment.
yes! just call register_on_complete() on the OutputFileDatasetConfig.
| " compute_target=compute_target)\n", | ||
| "\n", | ||
| "# For ease of understanding, and compatiblity with existing code, I would convert the train.py to instead retrive\n", | ||
| "# the argument using argparse versus Run.get_context().input_datasets['...']\n", |
There was a problem hiding this comment.
argument can only retrieve string in script. in this case, if i use argparse, i will get the dataset id. then i need to call Dataset.get_by_id to get the tabulardataset object. On the other hand, if i use run.get_context.input_datasets[], i can get tabulardataset object back directly. Do you still feel we shall recommend using argparse considering the one extra step to get the dataset object?
There was a problem hiding this comment.
I guess in the case of tabular it’s different versus files datasets where I’d imagine that script to retrieve a path for mounted files. But for tabular then it’s ok, the script has to be modified slightly anyway to account for tabular datasets.
| "source": [ | ||
| "# get input datasets\n", | ||
| "prep_step = run.find_step_run('prepare step')[0]\n", | ||
| "# Why not a method to directly get input datasets? in addition to the get_details()... something like step.get_input_datasets(), or get_output_datasets()\n", |
Don't merge, just read comments or load the notebook in a notebook viewer and see. Lots of changes are meta data or not useful.