Skip to content

Added comments... don't merge!#1

Open
omartin2010 wants to merge 5 commits intoMayMSFT:masterfrom
omartin2010:master
Open

Added comments... don't merge!#1
omartin2010 wants to merge 5 commits intoMayMSFT:masterfrom
omartin2010:master

Conversation

@omartin2010
Copy link
Copy Markdown

Don't merge, just read comments or load the notebook in a notebook viewer and see. Lots of changes are meta data or not useful.

"# learn more about options to configure the output, run 'help(OutputFileDatasetConfig)'\n",
"output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))"
"output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))\n",
"# Why not call this class Dataset.Output.File.OutputFileConfig ? so that in the modules hierarchy it makes more sense..."
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid make it feels like a new output dataset concept, instead just a config abt how to outputdata.
@rongduan-zhu thoughts on the module hierarchy comments Olivier has here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, there are already quite a few options with data in general...

"metadata": {},
"outputs": [],
"source": [
"# Can we create actual datasets out of these output datasets? Or would that have to be\n",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! just call register_on_complete() on the OutputFileDatasetConfig.

" compute_target=compute_target)\n",
"\n",
"# For ease of understanding, and compatiblity with existing code, I would convert the train.py to instead retrive\n",
"# the argument using argparse versus Run.get_context().input_datasets['...']\n",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argument can only retrieve string in script. in this case, if i use argparse, i will get the dataset id. then i need to call Dataset.get_by_id to get the tabulardataset object. On the other hand, if i use run.get_context.input_datasets[], i can get tabulardataset object back directly. Do you still feel we shall recommend using argparse considering the one extra step to get the dataset object?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the case of tabular it’s different versus files datasets where I’d imagine that script to retrieve a path for mounted files. But for tabular then it’s ok, the script has to be modified slightly anyway to account for tabular datasets.

"source": [
"# get input datasets\n",
"prep_step = run.find_step_run('prepare step')[0]\n",
"# Why not a method to directly get input datasets? in addition to the get_details()... something like step.get_input_datasets(), or get_output_datasets()\n",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. @rongduan-zhu thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants