Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Doc: Session #3993

Merged
merged 5 commits into from
Oct 4, 2017
Merged

Design Doc: Session #3993

merged 5 commits into from
Oct 4, 2017

Conversation

helinwang
Copy link
Contributor

@helinwang helinwang commented Sep 10, 2017

Here is better for review.

Fixes: #4552


## Abstract

This design doc proposes to have an object called *Session* which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is these design came out from #3811 (comment)?

If so, you probably need to add descriptions like "session is able to distinguish running a graph locally or remotely, using CPU or GPU, using one device or more"

Copy link
Contributor Author

@helinwang helinwang Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Great point! Added "The Session is able to distinguish running a graph locally or remotely, using CPU only or using one or more GPUs."

## Background

A computation graph is executed in an environment which contains the
[scope](./scope.md) and other states. PaddlePaddle used to only have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean by "other states"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope, device, context etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question with @typhoonzero , @jacquesqiao do you mean the Session contains runtime resources?
e.g. already allocated memory in scope, occupied device, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. @dzhwinter The environment contains runtime resources. The session is a "owner" of these runtime resources.

a = paddle.constant(1.0)
b = paddle.constant(2.0)
c = a + b
sess = paddle.session()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts of mine:

  • An NN training job must contain: graph(containing sub nets like init_net, forward_net, backward_net, opt_net); scope containing tensors as parameters; hyper parameters(learning_rate, batch_size, etc.); settings(cluster or not, devices, quotas, node ip address etc.), so here may be something like:
    sess = paddle.session(graph, scope_list, settings)
    # or
    sess = paddle.remote_session(graph, scop_list, cluster_settings)

  • states can be stored all in some "scope" by creating scope for storing tensor for forward and backward, and for storing hyper parameters changing states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with

sess = paddle.session(graph, scope_list, settings)
# or
sess = paddle.remote_session(graph, scop_list, cluster_settings)

states can be stored all in some "scope" by creating scope for storing tensor for forward and backward, and for storing hyper parameters changing states.

Yes the variable states are stored in "scope". One session means one scope (just added into the doc).

## Background

A computation graph is executed in an environment which contains the
[scope](./scope.md) and other states. PaddlePaddle used to only have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to specify one session contains one scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Added "This indicates different sessions have different scopes.".

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! But after reading it, still I couldn't tell what must be included in a Session. Though it seems that this information should appear in the first paragraph of this document?


## Background

A computation graph is executed in an environment which contains the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try not to use passive voice.

A computation graph runs in an environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply! Done.

## Background

A computation graph is executed in an environment which contains the
[scope](./scope.md) and other states. PaddlePaddle used to only have
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"used to only have" => "used only to have" or "used to have only". Indeed, here we are describing the current statues, so it should be

The current design has an implicit session ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.


## Session

Session is an object that owns all runtime states such as scope,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session is an object ==> A session is an object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

[scope](./scope.md) and other states. PaddlePaddle used to only have
an implicit global session on which `paddle.eval()` is executed.

This has the limitation that the user can not create two independent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a broken logic. It claimed but didn't explain why users cannot have two environments. The second sentence is to claim that it is necessary to have two environments.

From the text above, readers would ask it seems that what we need is two Scope instances, but not a new class Session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Explained why user cannot have two environments, and also changed wording so that reader will know we need a new class Session.

label = reader.column(1)
fc1 = paddle.op.fc(image, size=256, act="sigmoid")
fc2 = paddle.op.fc(fc1, size=10, act="softmax")
cost = paddle.op.cross_entropy(fc2)
Copy link
Contributor

@putcn putcn Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this line be cost = paddle.op.cross_entropy(fc2, label) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

@QiJune
Copy link
Member

QiJune commented Sep 11, 2017

It's better to give a C++ or Python class definition of Session.

## Background

A computation graph is executed in an environment which contains the
[scope](./scope.md) and other states. PaddlePaddle used to only have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question with @typhoonzero , @jacquesqiao do you mean the Session contains runtime resources?
e.g. already allocated memory in scope, occupied device, etc?


## Abstract

This design doc proposes to have an object called *Session* which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the Session try to solve the problem of

  1. unify the runtime sources management between a local machine and distributed environment.
  2. Replace the global scope with a named concept, which holds resources explicitly.

is it correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
Updated, please review:)

@helinwang helinwang force-pushed the session branch 2 times, most recently from 5960fd2 to 526c9eb Compare September 25, 2017 23:09

## Session

A session is an object that owns all runtime states such as scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should define what a session owns exactly, like:

  • The "graph" to run locally or remotely.
  • Exactly one scope, containing all tensor variables for all the subnets.
  • Settings and hyper-parameters.

Copy link
Contributor Author

@helinwang helinwang Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated (the session owns the scope), please review.

The session owns a single global scope, but a scope can have sub-scope, so I did not specify one scope.
The session does not own the graph, the graph is what gets evaluated with session.
The session does not own settings and hyper-parameters, the session could be created from settings and hyper-parameters.

@helinwang
Copy link
Contributor Author

@QiJune

It's better to give a C++ or Python class definition of Session.

Added Python interface, please review.


Evaluates the target Operations or Variables in `targets`.

- *targets*: the evaluation targets. Can be a single Operation or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should targets be an instance of "Block"(graph)?

Copy link
Contributor Author

@helinwang helinwang Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targets will be a OP the output of a OP (Var). The "Block" (ProgramDesc to be exact) will be inferred by eval.
To make the relationship more clear, I have updated the PR, please take a look.


The computation graph is implicitly inferred from the targets.

- *feed_dict*: a dictionary that contains the tensors which overrides
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the input data?

Copy link
Contributor Author

@helinwang helinwang Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not only the input data, can override any edge as well. E.g.,:

a = pd.constant(1.0, name="a")
b = pd.constant(2.0)
c = pd.mul(a,b)
sess.eval(targets=c, feed_dict={"a":3.0}) # returns 6.0

I have added the above example into the design doc.

close()
```

Closes the session. Calling this method releases the scope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need save() and load() also to do checkpointing.

Copy link
Contributor Author

@helinwang helinwang Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save and load will be OPs, so user will need to run something like sess.eval(targets=save) or sess.eval(targets=load).

Similar to TF (which treats save and load as OPs), we can add syntax sugar, wrap the saving and loading model into something like:

tf.reset_default_graph()
# Create some variables.
v1 = tf.get_variable("v1", [3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", [5], initializer = tf.zeros_initializer)

# Add ops to save and restore only `v2` using the name "v2"
saver = tf.train.Saver({"v2": v2})

# Use the saver object normally after that.
with tf.Session() as sess:
  # Initialize v1 since the saver will not.
  v1.initializer.run()
  saver.restore(sess, "/tmp/model.ckpt")

  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

I think the syntax sugar will not be in the scope of this design doc (maybe more suited for Python API design doc).

Creates a new session. One session owns one scope, so creating
multiple sessions will create different scopes.

- *gpu_ids*: a single `int` or a list of `int` of the GPU IDs to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up devices can take advantage of paddle v1 design, which default set gpu_ids to all of the available GPUs.

Can we use a string to specify devices, because there may be other devices than GPU, like FPGA, what TF does is /job:worker/task:1/gpu:0 also can be /job:worker/task:1/fpga:0

Copy link
Contributor Author

@helinwang helinwang Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Changed gpu_ids to devices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinking what's the difference between Session and Executor. The Executor has a Run interface to execute a ProgramDesc created in compile-time. DeviceContext(CPUDeviceContext/CUDADeviceContext) is created and managed by Executor.
And Maybe Executor is a data member of Session. Session will get a target and some device ids. The target is parsed to get a ProgramDesc. Then, the ProgramDesc and device ids are passed to Executor. Executor will created a DeviceContextManeger according to device ids. At last, the ProgramDesc will be executed in specific hardwares.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune Great! That is a clear logic. We could add one more step ProgramOptimizer (currently called Converter) between Session and Executor. Please see this graph for more detail: /~https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactor/distributed_architecture.md#local-training-architecture (In the graph, "PaddlePaddle runtime" means the Executor. Btw, there are many names of the same thing, we need to decide on the naming).

Copy link
Contributor

@dzhwinter dzhwinter Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the offline discussion with @helinwang , I believe Session is

  • a manager of the resources, which is a more high-level concept than DeviceContextManager. It owns the resources.
  • The job handler.
    • it's will construct a new graph according to the targets user given.
    • Interact with remote cluster or local machine, fetch/feed tensor, start/stop running a graph....
    • The close interface allows user release the resource it owns.

The Executor is a runtime concept, which has nothing to do with Session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's will construct a new graph according to the targets user given.

The session will not "construct a new graph", it will send the graph to the converter, and the converter will construct a new graph.

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the offline discussion with @helinwang , I believe Session is

  • a manager of the resources, which is a more high-level concept than DeviceContextManager. It owns the resources.
  • The job handler.
    • it's will construct a new graph according to the targets user given.
    • Interact with remote cluster or local machine, fetch/feed tensor, start/stop running a graph....
    • The close interface allows user release the resource it owns.

The Executor is a runtime concept, which has nothing to do with Session.

@helinwang
Copy link
Contributor Author

@dzhwinter

it's will construct a new graph according to the targets user given.

The session will not "construct a new graph", it will send the graph to the converter, and the converter will construct a new graph.

@tonyyang-svail tonyyang-svail mentioned this pull request Oct 2, 2017
Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement this module before the 0.11.0?

a = pd.constant(1.0, name="a")
b = pd.constant(2.0)
c = pd.mul(a,b)
sess.eval(targets=c, feed_dict={"a":3.0}) # returns 6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant can be changed value looks weird. Maybe name them with variable is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

)
```

Creates a new session. One session owns one scope, so creating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is confusing. One session owns at least one scope, namely, global scope in one single session, right?
Or you mean that one session will have exactly one scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the global scope. Done.

Creates a new session. One session owns one scope, so creating
multiple sessions will create different scopes.

- *devices*: a single `string` or a list of `string` of device names,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we really need a SessionConfig here? I mean, if one session cannot fully utilize the GPU resource, then another session may also own the same GPU.

In my view, we need to submit the config for this round of session run call. If it is a local run call, we provide the local config, vice versa, we provide the cluster config. Even more, if it is an inference run call, we may provide another config, which is totally different from those ones above.
We can leave these complexities to be solved in the future, but we need to figure the concept clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one session cannot fully utilize the GPU resource, then another session may also own the same GPU.

Devices only means the devices that the session uses, multiple sessions can use the same device. I have added "Multiple sessions can use the same device." to clear up this point.

If it is a local run call, we provide the local config, vice versa, we provide the cluster config

Local session is created by paddle.session, remote session is created by paddle.remote_session.

if it is an inference run call, we may provide another config

In my view, inferencing is just user specifying a inference target, which is no different than training (specifying a training target). They should use the same kind of session. The layers below session should do the optimization (e.g., based on batch size) transparent to which session is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see.

@dzhwinter
Copy link
Contributor

We had discussed this module for a long-term and had reached an agreement, so it's better to merge it. For anyone who has any questions or different view, we can have an offline meeting. :)

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@helinwang helinwang merged commit 15b35f9 into PaddlePaddle:develop Oct 4, 2017
@helinwang helinwang deleted the session branch October 4, 2017 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants