Multi-Tensor Input in Servo-Beam by meixinzhang · Pull Request #10 · tensorflow/tfx-bsl

meixinzhang · 2020-07-23T22:57:35Z

Internally uses Arrow RecordBatch for processing, supports multi-tensor input

separate public API calls for run inference on Examples and SequenceExamples
there is now an internal API supporting Pcoll of recordbatch as input
serialized examples maybe in bytes or string which are both supported
modified signature processing, now allows multiple inputs and we will extract the key/name to those inputs
when predict model requires more than 1 input, we will feed the model feature tensors that has the same name
since there is no clean way to extract tensor proto from a composite tensor, post-process will only include request values when the input tensor is dense, otherwise only input info such as type and name, and output is included

…mn opption and tests

Integrate Arrow as internal processing container

rose-rong-liu · 2020-08-05T21:51:23Z

tfx_bsl/beam/run_inference.py

    self._api_client = discovery.build('ml', 'v1')

+  def _extract_from_recordBatch(self, elements: pa.RecordBatch):
+    serialized_examples = bsl_util.ExtractSerializedExampleFromRecordBatch(elements)


Seems this is the same in Batch and Remote DoFn. Maybe extract this out to Base, and only get model_input in _extract_from_recordBatch?

rose-rong-liu · 2020-08-05T21:51:52Z

tfx_bsl/beam/run_inference.py

-  ) -> Mapping[Text, np.ndarray]:
-    self._check_elements(elements)
-    outputs = self._run_tf_operations(elements)
+    self, tensors: Mapping[Any, Any]) -> Mapping[Text, np.ndarray]:


Add comment on what's expected in tensors. And is the Mapping key a Text?

rose-rong-liu · 2020-08-05T21:55:04Z

tfx_bsl/beam/run_inference.py

+      self, elements: Mapping[Any, Any],
+      outputs: Mapping[Text, np.ndarray]
+  ) -> Iterable[Tuple[Union[str, bytes], classification_pb2.Classifications]]:
+    serialized_examples, = elements.values()


It won't give the right answer

rose-rong-liu · 2020-08-05T21:56:31Z

tfx_bsl/beam/run_inference.py

+      self, elements: Mapping[Any, Any],
+      outputs: Mapping[Text, np.ndarray]
+  ) -> Iterable[Tuple[Union[str, bytes], classification_pb2.Classifications]]:
+    serialized_examples, = elements.values()


Is element.values serialized examples?

rose-rong-liu · 2020-08-05T21:59:11Z

tfx_bsl/beam/run_inference.py

+      raise ValueError('Expected to have one name and one alias per tensor')
+
+    include_request = True
+    if len(input_tensor_names) == 1:


Can we make the determination of single input string tensor in a internal utility function inside of BaseDoFn?

The input tensor names is not in baseDoFn

rose-rong-liu · 2020-08-05T21:59:34Z

tfx_bsl/beam/run_inference.py

+
+    include_request = True
+    if len(input_tensor_names) == 1:
+      serialized_examples, = elements.values()


Shall we also check the type of elements.values is string/bytes?

It's checked in extract form record batch

rose-rong-liu · 2020-08-06T02:11:36Z

tfx_bsl/beam/run_inference.py

+        else:
+          input_tensor_proto.tensor_shape.dim.add().size = len(elements[tensor_name][0])


Why the dim size is len(elements[tensor_name][0]) instead of:
for s in elements[tensor_name][0].shape:
input_tensor_proto.tensor_shape.dim.add().size = s

we have an nd.array, I dont think we will have shape parameter

rose-rong-liu · 2020-08-06T02:15:45Z

tfx_bsl/beam/run_inference.py

+      for alias, tensor_name in zip(input_tensor_alias, input_tensor_names):
+        input_tensor_proto = predict_log_tmpl.request.inputs[alias]
+        input_tensor_proto.dtype = tf.as_dtype(input_tensor_types[alias]).as_datatype_enum
+        if len(input_tensor_alias) == 1:


Could the single input case be handled separately?

rose-rong-liu · 2020-08-06T02:16:08Z

tfx_bsl/beam/run_inference.py

+          alias = input_tensor_alias[0]
+          predict_log.request.inputs[alias].string_val.append(process_elements[i])
+        else:
+          for alias, tensor_name in zip(input_tensor_alias, input_tensor_names):


Is this correct given it's already in the loop of alias, tensor_name

rose-rong-liu · 2020-08-06T02:47:47Z

tfx_bsl/beam/run_inference.py

-  ) -> Iterable[Tuple[tf.train.Example, inference_pb2.MultiInferenceResponse]]:
+      self, elements: Mapping[Any, Any],
+      outputs: Mapping[Text, np.ndarray]
+  ) -> Iterable[Tuple[Union[str, bytes], inference_pb2.MultiInferenceResponse]]:


Can this just be bytes instead of Union[str, bytes] ?
str is the same as 'bytes' in py2.

Just wanted to make sure it's compatible with py2

rose-rong-liu · 2020-08-06T03:16:56Z

tfx_bsl/beam/run_inference.py

+
+    model_input = None
+    if (len(self._io_tensor_spec.input_tensor_names) == 1):
+      model_input = {self._io_tensor_spec.input_tensor_names[0]: serialized_examples}


Can we just leave this in _BaseBatchsavedModelDoFn and move the rest to _BatchPredictDoFn?

rose-rong-liu · 2020-08-06T03:18:47Z

tfx_bsl/public/beam/run_inference.py

  Args:
    examples: A PCollection containing examples.
    inference_spec_type: Model inference endpoint.
+    Schema [optional]: required for models that requires


Mention this is only available for Predict method.

paulgc · 2020-08-06T21:28:13Z

tfx_bsl/beam/bsl_util.py

+
+_KERAS_INPUT_SUFFIX = '_input'
+
+def ExtractSerializedExampleFromRecordBatch(elements: pa.RecordBatch) -> List[Text]:


ExtractSerializedExamplesFromRecordBatch

paulgc · 2020-08-07T16:59:26Z

tfx_bsl/beam/bsl_util.py

+def ExtractSerializedExampleFromRecordBatch(elements: pa.RecordBatch) -> List[Text]:
+  serialized_examples = None
+  for column_name, column_array in zip(elements.schema.names, elements.columns):
+    if column_name == _RECORDBATCH_COLUMN:


Should _RECORDBATCH_COLUMN be passed an an argument to the API?

If we use a constant here, it would mean users would have to use this same constant when creating the TFXIO.

tfx_bsl/beam/bsl_util.py

paulgc · 2020-08-07T17:01:11Z

tfx_bsl/public/beam/run_inference.py

-                                       tf.train.SequenceExample])
+@beam.typehints.with_input_types(tf.train.Example)
 @beam.typehints.with_output_types(prediction_log_pb2.PredictionLog)
 def RunInference(  # pylint: disable=invalid-name


Is the long term plan to deprecate the tf.example API? And only have a record batch API?

If so, mention it in a comment

paulgc · 2020-08-07T17:03:12Z

tfx_bsl/beam/bsl_util.py

+  if prepare_instances_serialized: 
+    return [{'b64': base64.b64encode(value).decode()} for value in df[_RECORDBATCH_COLUMN]]
+  else:
+    as_binary = df.columns.str.endswith("_bytes")


Why does the name end with "_bytes"?

User specified byte columns, it's consistent with the original implementation

This is required by cloud ai platform to indicate the bytes feature with '_bytes' suffix.

rose-rong-liu · 2020-08-11T01:02:23Z

tfx_bsl/beam/run_inference.py

+@beam.typehints.with_input_types(tf.train.Example)
 @beam.typehints.with_output_types(prediction_log_pb2.PredictionLog)
-def RunInferenceImpl(  # pylint: disable=invalid-name
+def RunInferenceOnExamples(  # pylint: disable=invalid-name


Let's use the first option of public API here to have a polymorphic RunInference and RunInferenceImpl.

… (WIP)

…ess and multihead

googlebot · 2020-08-20T22:21:09Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

meixinzhang · 2020-08-24T03:18:28Z

@googlebot I signed it!

googlebot · 2020-08-24T03:19:15Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

meixinzhang23 added 30 commits May 27, 2020 10:45

created new file with arrow and modified base function

1d03b5a

make master the same as before for comparison

1d55301

add changes for base class

c39a82d

add more changes

ace3f73

Merge remote-tracking branch 'upstream/master'

8287878

Merge branch 'master' into add_arrow

fedf60d

modify batch functions

56357a0

remove extra files

ee6e928

add column choice (so far only support one column still)

622fbcf

make internal implementation with arrow, add input type, process colu…

7562e15

…mn opption and tests

fix spacing

037f3b6

Merge remote-tracking branch 'upstream/master'

86b94c0

Merge branch 'master' into add_arrow

7498a05

remove unecessary loop

44af058

modify and add tests for remote prediction

3b64e74

add config param, declared IOspec (foundation for multi-tensor)

f874ce3

add signature checking with multi-tensor model

300d8c9

complete case 2

a45716b

fix typo and renamed util to avoid conflict

c42ce34

add APIs and use recordbatch to json module

4e8651c

fix docstring

89878b3

Merge remote-tracking branch 'upstream/master'

d7e31fc

Merge branch 'master' into add_arrow

1a58bbf

add missing case

0cdf874

add model analysis function to util

c7e2237

update API and create constate file

42fab7e

Merge remote-tracking branch 'upstream/master'

1451037

Merge branch 'master' into add_arrow

100411b

include TFXIO module in tests and create and tested APIS

353604a

Merge pull request #1 from meixinzhang/add_arrow

5d40e92

Integrate Arrow as internal processing container

rose-rong-liu reviewed Aug 5, 2020

View reviewed changes

rose-rong-liu reviewed Aug 6, 2020

View reviewed changes

add pytypes for returns and complete comments

dc9c513

paulgc reviewed Aug 6, 2020

View reviewed changes

paulgc reviewed Aug 7, 2020

View reviewed changes

tfx_bsl/beam/bsl_util.py Show resolved Hide resolved

paulgc reviewed Aug 7, 2020

View reviewed changes

rose-rong-liu reviewed Aug 11, 2020

View reviewed changes

meixinzhang23 added 4 commits August 12, 2020 11:56

separate test for bsl-util

1a12c5c

checkpoint: address comments on post-process, and modified public api…

2fa6720

… (WIP)

identify if example is empty

8c279ce

assert data tyoe and add tests for sequence examples on classify regr…

ff40846

…ess and multihead

googlebot added cla: no and removed cla: yes labels Aug 20, 2020

googlebot added cla: yes and removed cla: no labels Aug 24, 2020

		else:
		input_tensor_proto.tensor_shape.dim.add().size = len(elements[tensor_name][0])


		_KERAS_INPUT_SUFFIX = '_input'

		def ExtractSerializedExampleFromRecordBatch(elements: pa.RecordBatch) -> List[Text]:

Conversation

meixinzhang commented Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rose-rong-liu Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rose-rong-liu Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rose-rong-liu Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googlebot commented Aug 20, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

Uh oh!

meixinzhang commented Aug 24, 2020

Uh oh!

googlebot commented Aug 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

meixinzhang commented Jul 23, 2020 •

edited

Loading

rose-rong-liu Aug 5, 2020 •

edited

Loading

rose-rong-liu Aug 6, 2020 •

edited

Loading

rose-rong-liu Aug 6, 2020 •

edited

Loading