Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Add serialization infrastructure to core #997

Merged
merged 9 commits into from
May 3, 2018
Merged

Conversation

ericdnielsen
Copy link
Contributor

@ericdnielsen ericdnielsen commented Apr 25, 2018

This is half of a split PR across the two repos.


This change is Reviewable

@ericdnielsen
Copy link
Contributor Author

I know we've discussed exposing this as tf.registerSerializableClass from union. But I think that's a third repo PR that can be decoupled from this pair.

@caisq
Copy link
Collaborator

caisq commented Apr 25, 2018

Please take a look at the linter error:
https://travis-ci.org/tensorflow/tfjs-core/builds/371228940?utm_source=github_status&utm_medium=notification


Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

export * from './serialization';

Should you put them under a namespace like tf.serialization.* for cleanliness?


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

Previously, caisq (Shanqing Cai) wrote…
export * from './serialization';

Should you put them under a namespace like tf.serialization.* for cleanliness?

I thought that was handled at the union package level?

In any event, yes some, but I'm not sure all, should be namespaced. Looking for some thoughts on that. Just about every class in the Layers repo will need to import ConfigDict and SerializableMap. The other ones are less common. (Though I think once we proxy tf.registerSerializableClass -> SerializableMap.register(), we're only left with ConfigDict as a near universal import.

So I can maybe? see leaving the triplet of ConfigDict types in the root names space and moving the others?


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


a discussion (no related file):
Just added two files, suspect they'll be some breakages, but Daniel asked to see what it would look like to update a sample core Optimizer to be Serializable. Since Optimizers share a common base class, we'll have to update them all at once. I'd prefer to use the Optimizer purely for review purposes, then revert the changes to Optimizer and SGDOptimizer. Migrate all of them in a single later PR. Once we're happy with how it would potentially look.


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


a discussion (no related file):

Previously, ericdnielsen wrote…

Just added two files, suspect they'll be some breakages, but Daniel asked to see what it would look like to update a sample core Optimizer to be Serializable. Since Optimizers share a common base class, we'll have to update them all at once. I'd prefer to use the Optimizer purely for review purposes, then revert the changes to Optimizer and SGDOptimizer. Migrate all of them in a single later PR. Once we're happy with how it would potentially look.

It worth calling out that core Optimizers will have to each implement the static fromConfig method, that most -Layers code can use the base version because of the convention of all -layers objects being passed a Config object as their sole argument.

While working through this I'm surprised by the JsonDict being present and I'll need to revisit that. I think that's an error and we can remove it from this PR (FromConfig should be ConfigDicts, not JsonDicts).


Comments from Reviewable

@nsthorat
Copy link
Contributor

:lgtm_strong:

I think generally this makes sense.


Review status: 0 of 10 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


a discussion (no related file):

Previously, ericdnielsen wrote…

It worth calling out that core Optimizers will have to each implement the static fromConfig method, that most -Layers code can use the base version because of the convention of all -layers objects being passed a Config object as their sole argument.

While working through this I'm surprised by the JsonDict being present and I'll need to revisit that. I think that's an error and we can remove it from this PR (FromConfig should be ConfigDicts, not JsonDicts).

Overall this looks great. This barely adds anything to the bundle size and is a tiny change to the serializable classes.

Nice work.


src/index.ts, line 46 at r1 (raw file):

Previously, ericdnielsen wrote…

I thought that was handled at the union package level?

In any event, yes some, but I'm not sure all, should be namespaced. Looking for some thoughts on that. Just about every class in the Layers repo will need to import ConfigDict and SerializableMap. The other ones are less common. (Though I think once we proxy tf.registerSerializableClass -> SerializableMap.register(), we're only left with ConfigDict as a near universal import.

So I can maybe? see leaving the triplet of ConfigDict types in the root names space and moving the others?

The union package just merges everything into a single package, it doesn't actually move symbols around. You should follow what "Second level exports" does below.


src/serialization.ts, line 46 at r3 (raw file):

 * Source for this idea: https://stackoverflow.com/a/43607255
 */
export type Constructor<T extends Serializable> = {

maybe this should be called SerializableConstructor


src/serialization.ts, line 61 at r3 (raw file):

export abstract class Serializable {
  /**
   * Return the class name for this class ot use in serialization contexts.

s/ot/to


src/serialization.ts, line 96 at r3 (raw file):

export class SerializationMap {
  private static instance: SerializationMap;
  pythonClassNameMap: {

maybe we can just call this classNameMap since we may use this for non-pythonic serialization. let me know what you think.


src/serialization.ts, line 105 at r3 (raw file):

  }

  static getMap() {

return type


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

  getConfig(): ConfigDict {
    return {
      learningRate: -1 * this.c.dataSync().values().next().value,

Instead of doing it this way, can we make all of the number fields separated from its Tensor version (admittedly my fault for making this inconsistency)? Eventually I'm going to add support for uploading numbers in place of Tensors so there won't be both.

Same in the other optimizers.


src/optimizers/adamax_optimizer.ts, line 134 at r3 (raw file):

    }
  }
  getConfig(): ConfigDict {

overall this LGTM. can you add unit tests that check round trip serialization?


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 0 of 10 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


a discussion (no related file):

Previously, nsthorat (Nikhil Thorat) wrote…

Overall this looks great. This barely adds anything to the bundle size and is a tiny change to the serializable classes.

Nice work.

Added a few responses while I work through the rest, just to make sure I'm interpreting them right.


src/serialization.ts, line 96 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

maybe we can just call this classNameMap since we may use this for non-pythonic serialization. let me know what you think.

I think that rename works, once I add some jsDoc to outline the key of the map is expected to match the python name rather than the js name, if they differ.


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Instead of doing it this way, can we make all of the number fields separated from its Tensor version (admittedly my fault for making this inconsistency)? Eventually I'm going to add support for uploading numbers in place of Tensors so there won't be both.

Same in the other optimizers.

That sounds good to me. I'll adjust. It looks like I should probably just add protected/private to the constructor args? Don't think anything else would need to change (aside from simplifying my getConfigs)


Comments from Reviewable

@dsmilkov
Copy link
Contributor

dsmilkov commented May 1, 2018

:lgtm_strong: Looks good, worth adding a few unit tests.


Reviewed 1 of 2 files at r1, 9 of 9 files at r3.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

The union package just merges everything into a single package, it doesn't actually move symbols around. You should follow what "Second level exports" does below.

+1 to moving this to a second-level export. See below:

// Second level exports.
export {environment, test_util, util, serialization};

src/serialization.ts, line 30 at r3 (raw file):

 * the pythonic version as that's the portable format.  If you need to
 * python-ify a non-model level toConfig output, you'll need to use a
 * to-be-written-helper doing the inverse of models::convertPythonicToTs.

what is the "::" symbol in this case, since we are not in C++?


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

  getConfig(): ConfigDict {
    return {
      learningRate: -1 * this.c.dataSync().values().next().value,

s/this.c.dataSync().values().next().value/this.c.get()/ here and elsewhere


src/optimizers/adam_optimizer.ts, line 132 at r3 (raw file):

    return {
      learningRate: this.learningRate,
      beta1: this.beta1.dataSync().values().next().value,

same here: this.beta1.get()


Comments from Reviewable

@caisq
Copy link
Collaborator

caisq commented May 1, 2018

Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


src/optimizers/adadelta_optimizer.ts, line 22 at r3 (raw file):

 SerializationMap

You plan to add the public register method in a later PR, right?


src/optimizers/adamax_optimizer.ts, line 134 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

overall this LGTM. can you add unit tests that check round trip serialization?

+1


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 2 of 17 files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

+1 to moving this to a second-level export. See below:

// Second level exports.
export {environment, test_util, util, serialization};

when I try that it gets unhappy. Can I grab one of you later to look at this in person?


src/serialization.ts, line 30 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

what is the "::" symbol in this case, since we are not in C++?

updated the docs


src/serialization.ts, line 46 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

maybe this should be called SerializableConstructor

Done.


src/serialization.ts, line 61 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

s/ot/to

Done.


src/serialization.ts, line 105 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

return type

Done.


src/optimizers/adadelta_optimizer.ts, line 22 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
 SerializationMap

You plan to add the public register method in a later PR, right?

I can add that now, but not sure where/how to add it.


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

s/this.c.dataSync().values().next().value/this.c.get()/ here and elsewhere

So which do we want? the c.get() that Daniel is listing, or introducing non-Tensor members?


src/optimizers/adamax_optimizer.ts, line 134 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

+1

Done.


Comments from Reviewable

@nsthorat
Copy link
Contributor

nsthorat commented May 1, 2018

Review status: 2 of 17 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

Previously, ericdnielsen wrote…

when I try that it gets unhappy. Can I grab one of you later to look at this in person?

You have to import it first:

import * as serialization from './serialization';
...
export {environment, test_util, util, serialization};

src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, ericdnielsen wrote…

That sounds good to me. I'll adjust. It looks like I should probably just add protected/private to the constructor args? Don't think anything else would need to change (aside from simplifying my getConfigs)

Yeah you just add those, however you will need to rename the tensor wrappers as well.


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, ericdnielsen wrote…

So which do we want? the c.get() that Daniel is listing, or introducing non-Tensor members?

we chatted and prefer non-Tensor members (if for some reason you serialize every tick of training, this will cause serious perf issues. we also don't want to set the precedent that some tensors can be downloaded and serialized as part of the model).


Comments from Reviewable

@ericdnielsen
Copy link
Contributor Author

Review status: 1 of 17 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


src/index.ts, line 46 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

You have to import it first:

import * as serialization from './serialization';
...
export {environment, test_util, util, serialization};

Done, but still need to confirm it doesn't make -layers use too awkward.


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Yeah you just add those, however you will need to rename the tensor wrappers as well.

Done.


src/optimizers/adadelta_optimizer.ts, line 106 at r3 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

we chatted and prefer non-Tensor members (if for some reason you serialize every tick of training, this will cause serious perf issues. we also don't want to set the precedent that some tensors can be downloaded and serialized as part of the model).

Done.


src/optimizers/adam_optimizer.ts, line 132 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

same here: this.beta1.get()

Done.


Comments from Reviewable

@dsmilkov
Copy link
Contributor

dsmilkov commented May 3, 2018

:lgtm_strong: Good to submit.


Reviewed 16 of 16 files at r4.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


Comments from Reviewable

@ericdnielsen ericdnielsen merged commit bd82d33 into master May 3, 2018
@dsmilkov dsmilkov deleted the serialization-to-core branch May 4, 2018 14:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants