Added comments... don't merge! by omartin2010 · Pull Request #1 · MayMSFT/output-dataset

omartin2010 · 2020-06-16T15:45:31Z

Don't merge, just read comments or load the notebook in a notebook viewer and see. Lots of changes are meta data or not useful.

MayMSFT · 2020-06-22T18:25:55Z

    "# learn more about options to configure the output, run 'help(OutputFileDatasetConfig)'\n",
-    "output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))"
+    "output = OutputFileDatasetConfig(destination=(def_blob_store, 'may_sample/outputdataset'))\n",
+    "# Why not call this class Dataset.Output.File.OutputFileConfig ? so that in the modules hierarchy it makes more sense..."


We are trying to avoid make it feels like a new output dataset concept, instead just a config abt how to outputdata.
@rongduan-zhu thoughts on the module hierarchy comments Olivier has here?

Fair enough, there are already quite a few options with data in general...

MayMSFT · 2020-06-22T18:27:18Z

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Can we create actual datasets out of these output datasets? Or would that have to be\n",


yes! just call register_on_complete() on the OutputFileDatasetConfig.

MayMSFT · 2020-06-22T18:34:48Z

    "                compute_target=compute_target)\n",
    "\n",
+    "# For ease of understanding, and compatiblity with existing code, I would convert the train.py to instead retrive\n",
+    "# the argument using argparse versus Run.get_context().input_datasets['...']\n",


argument can only retrieve string in script. in this case, if i use argparse, i will get the dataset id. then i need to call Dataset.get_by_id to get the tabulardataset object. On the other hand, if i use run.get_context.input_datasets[], i can get tabulardataset object back directly. Do you still feel we shall recommend using argparse considering the one extra step to get the dataset object?

I guess in the case of tabular it’s different versus files datasets where I’d imagine that script to retrieve a path for mounted files. But for tabular then it’s ok, the script has to be modified slightly anyway to account for tabular datasets.

MayMSFT · 2020-06-22T18:35:25Z

   "source": [
    "# get input datasets\n",
    "prep_step = run.find_step_run('prepare step')[0]\n",
+    "# Why not a method to directly get input datasets? in addition to the get_details()... something like step.get_input_datasets(), or get_output_datasets()\n",


agree. @rongduan-zhu thoughts?

omartin2010 added 5 commits June 16, 2020 15:43

comments

dbbde2d

comments on multistep pipeline

ddb58d1

good commit

ba1f759

removed outputs

5b743f7

removed outputs2

8e4311f

MayMSFT reviewed Jun 22, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added comments... don't merge!#1

Added comments... don't merge!#1
omartin2010 wants to merge 5 commits intoMayMSFT:masterfrom
omartin2010:master

omartin2010 commented Jun 16, 2020

Uh oh!

MayMSFT Jun 22, 2020

Uh oh!

omartin2010 Jun 23, 2020

Uh oh!

MayMSFT Jun 22, 2020

Uh oh!

MayMSFT Jun 22, 2020

Uh oh!

omartin2010 Jun 23, 2020

Uh oh!

MayMSFT Jun 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

omartin2010 commented Jun 16, 2020

Uh oh!

MayMSFT Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

omartin2010 Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

MayMSFT Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

MayMSFT Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

omartin2010 Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

MayMSFT Jun 22, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants