-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[node] Add SavedModel execution #2336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks! couple comments.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1055 at r1 (raw file):
auto savedmodel_entry = tf_savedmodel_map_.find(savedmodel_id); if (savedmodel_entry == tf_savedmodel_map_.end()) { NAPI_THROW_ERROR(env, "SavedModel ID not referenced (savedmodel_id: %d)",
not found? there are couple other similar instances.
tfjs-node/binding/tfjs_backend.cc, line 1076 at r1 (raw file):
num_input_ids
please verify the input ids length is same as input_op_names.
tfjs-node/binding/tfjs_backend.cc, line 1118 at r1 (raw file):
return nullptr; } TF_Output in = {input_op, 0};
for input_op and output_op node, name can contain index as following format node_name:index, but if index is 0, it can be omitted.
Code here should handle with or without index cases, not always set to 0.
tfjs-node/binding/tfjs_backend.cc, line 1134 at r1 (raw file):
} std::vector<TF_Tensor *> output_values(num_input_ids, nullptr);
num_output_ids?
tfjs-node/binding/tfjs_backend.cc, line 1147 at r1 (raw file):
output_tensor_infos
if the output tensor info construction is same as eager op execution, maybe we should refactor into a common method.
tfjs-node/binding/tfjs_binding.cc, line 221 at r1 (raw file):
ENSURE_VALUE_IS_NUMBER_RETVAL(env, args[0], nullptr); ENSURE_VALUE_IS_ARRAY_RETVAL(env, args[1], nullptr); ENSURE_VALUE_IS_STRING_RETVAL(env, args[2], nullptr);
should the input and output node names be array instead of string? this way, you can do some validity check
of parameters in the js.
tfjs-node/src/saved_model.ts, line 213 at r1 (raw file):
this.inputOpNames = new Set(); inputNodeNames.map(inputName => { this.inputOpNames.add(inputName.split(':')[0]);
same here, the input/output names can contain index, and it should be handled.
tfjs-node/src/saved_model.ts, line 297 at r1 (raw file):
} else { throw new Error( 'TFSavedModel predict() does not support NamedTensorMap yet.');
namedTensorMap should be allow, since you can convert them into arrays for backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1055 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
not found? there are couple other similar instances.
Done.
tfjs-node/binding/tfjs_backend.cc, line 1076 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
num_input_ids
please verify the input ids length is same as input_op_names.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1118 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
for input_op and output_op node, name can contain index as following format node_name:index, but if index is 0, it can be omitted.
Code here should handle with or without index cases, not always set to 0.
Done.
tfjs-node/binding/tfjs_backend.cc, line 1134 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
num_output_ids?
Done.
tfjs-node/binding/tfjs_backend.cc, line 1147 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
output_tensor_infos
if the output tensor info construction is same as eager op execution, maybe we should refactor into a common method.
Done.
tfjs-node/binding/tfjs_binding.cc, line 221 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
should the input and output node names be array instead of string? this way, you can do some validity check
of parameters in the js.
The NAPI_VALUE does not support array of strings. So I am using stringArray.join()
in js side and spliting the string with ,
in C++ side.
tfjs-node/src/saved_model.ts, line 213 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
same here, the input/output names can contain index, and it should be handled.
Done.
tfjs-node/src/saved_model.ts, line 297 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
namedTensorMap should be allow, since you can convert them into arrays for backend.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 21 of 25 files at r1, 5 of 5 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1075 at r2 (raw file):
// Get input/output op names as vector std::vector<char *> input_op_name_array = splitStringByComma(input_op_names);
const here and below
tfjs-node/src/saved_model_test.ts, line 331 at r2 (raw file):
it('execute model float times three', async () => { const signature1 = await tf.node.loadSavedModel(
call this model? here and elsewhere
tfjs-node/src/saved_model_test.ts, line 339 at r2 (raw file):
expect(output.dtype).toBe(input.dtype); expect(output.dtype).toBe('float32'); test_util.expectArraysClose(await output.data(), await input.mul(3).data());
don't use input.mul(3).data() hard-code the array here and below
tfjs-node/test_objects/saved_model/times_two_int/saved_model.pb, line 0 at r2 (raw file):
do you think it would make sense to put these on GCP since these will now be part of the history and always have to be downloaded by anyone cloning the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/src/saved_model.ts, line 198 at r2 (raw file):
/** * A `tf.TFSavedModel` is a signature loaded from a SavedModel * metagraph, and allows inference exeuction.
execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1075 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
const here and below
Done.
tfjs-node/src/saved_model.ts, line 198 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
execution
Done.
tfjs-node/src/saved_model_test.ts, line 331 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
call this model? here and elsewhere
Done.
tfjs-node/src/saved_model_test.ts, line 339 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
don't use input.mul(3).data() hard-code the array here and below
Done.
tfjs-node/test_objects/saved_model/times_two_int/saved_model.pb, line at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
do you think it would make sense to put these on GCP since these will now be part of the history and always have to be downloaded by anyone cloning the repo?
I would prefer to keep the test_objects in this folder, for two reasons:
- the total size of all these test_objects are 77.7 kb, it's not too big and there is not plan to add large size objects in future. Compare with the
scripts
folder, whose size is 34kb, the size of these test objects should be ok. - GCP does not support downloads folder through code, which means to hosting these objects on GCP, every single file need to be downloaded and managed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1147 at r1 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
Done.
I am wondering if the whole output tensor list generation can be refactor into one method?
tfjs-node/binding/tfjs_backend.cc, line 1092 at r3 (raw file):
if (input_op_name_array.size() != num_input_ids) { NAPI_THROW_ERROR(env, "Input op names and input tensors do not match.");
Please clarify that it is the length does match
tfjs-node/binding/tfjs_backend.cc, line 1127 at r3 (raw file):
serving_default_x
do you make sure the op name always have index at the end in javascript, otherwise, name without index is allowed, and it referring to index 0.
tfjs-node/src/saved_model.ts, line 297 at r1 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
Done.
Please take a look at GraphModel, there are code that verify the input and output node name matches the models, in your case the signature. Please add similar verification here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.cc, line 1147 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
I am wondering if the whole output tensor list generation can be refactor into one method?
No, the helper function can not process the output tensor list. Op execution returns a list of TFE_TensorHandle
. While SavedModel execution (TF_SessionRun
) returns a list of TF_Tensor
, and when looping into this list, it converts the TF_Tensor
into TFE_TensorHandle
and generate Tensor info from one TFE_TensorHandle
.
tfjs-node/binding/tfjs_backend.cc, line 1092 at r3 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
Please clarify that it is the length does match
Done.
tfjs-node/binding/tfjs_backend.cc, line 1127 at r3 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
serving_default_x
do you make sure the op name always have index at the end in javascript, otherwise, name without index is allowed, and it referring to index 0.
Done. The index will be 0 is the op name does not have index information(:0
) ad the end.
tfjs-node/src/saved_model.ts, line 297 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
Please take a look at GraphModel, there are code that verify the input and output node name matches the models, in your case the signature. Please add similar verification here.
Done. It's using a local helper function to check input node names match the signatureDef. Check of output will be added when implementing model.execute()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Few comments, mostly related to testing mem leaks, and auto-cleaning with finalization. Might be better to do the mem leaks and finalization in a separate PR (LGTM for PR)
Reviewed 20 of 25 files at r1, 1 of 5 files at r2, 3 of 3 files at r4.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.h, line 93 at r4 (raw file):
TFE_Context *tfe_context_; std::map<int32_t, TFE_TensorHandle *> tfe_handle_map_; std::map<int32_t, std::pair<TF_Session *, TF_Graph *>> tf_savedmodel_map_;
tip i recently learned: use unordered_map
, which is a hashmap and faster than map
which is a binary search tree. Use map only when you need to iterate the keys in-order. Here and below.
tfjs-node/binding/tfjs_backend.cc, line 1092 at r3 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
Done.
Also would be good to provide the length of the names and the tensor ids in the error message. Something like:
"Length of input op names (3) does not match the length of tensor ids (4)."
tfjs-node/binding/tfjs_backend.cc, line 752 at r4 (raw file):
TF_AutoStatus tf_status; TF_DeleteSession(kv.second.first, tf_status.status); TF_DeleteGraph(kv.second.second);
it would be great to have a unit test in cc that will assert for these mem leaks by checking the value of a counter that counts how many graphs and sessions are in memory? E.g. checking the size of the map. For example see xnn_operator_count
in this WASM unit test: /~https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-wasm/src/cc/kernels/Prelu_test.cc#L20 You can also assert this in js test instead of a cc test, by exposing the size of the map via napi.
tfjs-node/binding/tfjs_backend.cc, line 1120 at r4 (raw file):
if (TF_GetCode(tf_status.status) != TF_OK) { NAPI_THROW_ERROR(env, "Faile to get input tensor (tensor_id: %d) for session.",
typo: failed
tfjs-node/src/nodejs_kernel_backend.ts, line 1904 at r4 (raw file):
runSavedModel( id: number, inputs: Tensor[], inputOpNames: string,
make inputOpNames
and outputOpNames
be string[]
in backend.runSavedModel()
. The caller of backend.runSavedModel()
shouldn't know the internal details and limitations of the binding. You can keep it as string
in binding.runSavedModel()
tfjs-node/src/saved_model_test.ts, line 339 at r4 (raw file):
expect(output.dtype).toBe('float32'); test_util.expectArraysClose(await output.data(), await input.mul(3).data()); model.dispose();
since node supports finalization, can we make sure that model.dispose()
is called automatically when the variable goes out of scope?
tfjs-node/src/saved_model_test.ts, line 372 at r4 (raw file):
it('execute model with wrong tensor name', async done => { try {
tip: it's a bit cleaner to write tests like this using
expect(() => ...).toThrowError(/error message/)
tfjs-node/test_objects/saved_model/times_two_int/saved_model.pb, line at r2 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
I would prefer to keep the test_objects in this folder, for two reasons:
- the total size of all these test_objects are 77.7 kb, it's not too big and there is not plan to add large size objects in future. Compare with the
scripts
folder, whose size is 34kb, the size of these test objects should be ok.- GCP does not support downloads folder through code, which means to hosting these objects on GCP, every single file need to be downloaded and managed separately.
Given the very small size and simplicity, I think this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/binding/tfjs_backend.h, line 93 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
tip i recently learned: use
unordered_map
, which is a hashmap and faster thanmap
which is a binary search tree. Use map only when you need to iterate the keys in-order. Here and below.
Done. Thanks for the info. tfe_handle_map_
has been changed to unordered_map
as well.
tfjs-node/binding/tfjs_backend.cc, line 1092 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Also would be good to provide the length of the names and the tensor ids in the error message. Something like:
"Length of input op names (3) does not match the length of tensor ids (4)."
Done.
tfjs-node/binding/tfjs_backend.cc, line 752 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
it would be great to have a unit test in cc that will assert for these mem leaks by checking the value of a counter that counts how many graphs and sessions are in memory? E.g. checking the size of the map. For example see
xnn_operator_count
in this WASM unit test: /~https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-wasm/src/cc/kernels/Prelu_test.cc#L20 You can also assert this in js test instead of a cc test, by exposing the size of the map via napi.
Got it, thanks for the advice. I'll add the unit test in another PR so it does not overflow the size of this PR and makes code review easier.
tfjs-node/binding/tfjs_backend.cc, line 1120 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
typo: failed
Done.
tfjs-node/src/nodejs_kernel_backend.ts, line 1904 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
make
inputOpNames
andoutputOpNames
bestring[]
inbackend.runSavedModel()
. The caller ofbackend.runSavedModel()
shouldn't know the internal details and limitations of the binding. You can keep it asstring
inbinding.runSavedModel()
Done. This makes sense.
tfjs-node/src/saved_model_test.ts, line 339 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
since node supports finalization, can we make sure that
model.dispose()
is called automatically when the variable goes out of scope?
Can you elaborate a little bit here to help me understand? I think what you mean is what then a TFSavedModel
instance is our of scope (no variable is referencing to the model instance), model.dispose()
should be called automatically, similar to a deconstructor. Am I understanding correct? I tried to figure out how to achieve this by searching node js finalize
, javascript finalize
, and node js garbage collection no reference/out of scope
, but didn't get any luck.
tfjs-node/src/saved_model_test.ts, line 372 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
tip: it's a bit cleaner to write tests like this using
expect(() => ...).toThrowError(/error message/)
Thank you for the advice. Based on my research expect().toThrowError()
does not for code block with await
, unless I wrap all the testing code into a new async function, which makes it less cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/src/nodejs_kernel_backend.ts, line 1906 at r5 (raw file):
outputOpNames: string[]): Tensor[] { const outputMetadata = this.binding.runSavedModel( id, this.getInputTensorIds(inputs), inputOpNames.join(),
even though the default value for join is ',', it would be clear to specify that here. join(',')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/src/nodejs_kernel_backend.ts, line 1906 at r5 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
even though the default value for join is ',', it would be clear to specify that here. join(',')
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/src/saved_model_test.ts, line 339 at r4 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
Can you elaborate a little bit here to help me understand? I think what you mean is what then a
TFSavedModel
instance is our of scope (no variable is referencing to the model instance),model.dispose()
should be called automatically, similar to a deconstructor. Am I understanding correct? I tried to figure out how to achieve this by searchingnode js finalize
,javascript finalize
, andnode js garbage collection no reference/out of scope
, but didn't get any luck.
Yes, when no js variable points to the model instance, model.dispose() should be called automatically. Few pointers:
- https://v8docs.nodesource.com/node-0.10/d2/d78/classv8_1_1_persistent.html#a5610d667bc793ba0af838bb134941bec
- https://stackoverflow.com/questions/4573446/in-v8-how-would-i-remove-wrapped-c-objects-after-their-javascript-counterpart
- There is also an npm library, but you can get away without this dependency: https://www.npmjs.com/package/weak
tfjs-node/src/saved_model_test.ts, line 372 at r4 (raw file):
Previously, kangyizhang (Kangyi Zhang) wrote…
Thank you for the advice. Based on my research
expect().toThrowError()
does not for code block withawait
, unless I wrap all the testing code into a new async function, which makes it less cleaner.
Got it. I missed the await part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @dsmilkov, @kangyizhang, @nsthorat, and @pyu10055)
tfjs-node/src/saved_model_test.ts, line 339 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Yes, when no js variable points to the model instance, model.dispose() should be called automatically. Few pointers:
- https://v8docs.nodesource.com/node-0.10/d2/d78/classv8_1_1_persistent.html#a5610d667bc793ba0af838bb134941bec
- https://stackoverflow.com/questions/4573446/in-v8-how-would-i-remove-wrapped-c-objects-after-their-javascript-counterpart
- There is also an npm library, but you can get away without this dependency: https://www.npmjs.com/package/weak
Thank you for the resources. It makes a lot of sense. I plan to send another PR for this, so
- it doesn't flood this PR
- other model types (GraphModel, LayersModel) need to add this functionality as well
- I'm wondering if this functionality should be added into
tf.tidy()
inputTensors.push(inputs); | ||
return this.backend.runSavedModel( | ||
this.sessionId, inputTensors, Object.values(this.inputNodeNames), | ||
Object.values(this.outputNodeNames))[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kangyizhang, is there a reason why when a single input is provided it returns only the first output, even if there are many? I'm currently experimenting with SavedModel support and because of that I have to pass the inputs as an array to be sure to get all the outputs, even if the model has only one input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. This should be fixed. If the output has only one item, it will return one tensor. If the output has multiple nodes, it will return an array. I'll send a PR to fix this and it should be included in the next release.
This is the first implementation of SavedModel execution through TF C API.
Added more SavedModel for testing. These SavedModels are generated from python code.
This change is