-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-1294] Priority-based parameter propagation for improved data parallel training throughput #15124
Conversation
Thanks for your contribution @anandj91. Can you look into the CI failures ? |
07f9818
to
d85967f
Compare
Is there a way to rerun the test cases without doing a git push? |
You could push an empty commit. That should trigger the CI on your PR again. |
@pinaraws @anirudhacharya for review |
7ef001a
to
c64be49
Compare
Modified the code to address the review comments. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anandj91 thanks for the update and sorry for the late reply.
- Could you make it a runtime option with environment variable instead of compilation time flag? That makes it easier to distribute the feature (usually people just install mxnet with
pip install
). - Could you also update the env var page: /~https://github.com/apache/incubator-mxnet/blob/master/docs/faq/env_var.md
- Did you do any perf testing after changing the implementation with
NewVar()
s? - For testing, you can add test cases in /~https://github.com/apache/incubator-mxnet/tree/master/tests/nightly and update on CI here: /~https://github.com/apache/incubator-mxnet/blob/7fe478afc4f8bed58b9816ddce57a3c75997c9f2/ci/docker/runtime_functions.sh#L1148
Thanks!
@anandj91 gentle ping, any update on this PR? |
@roywei The current implementation uses multiple Instead I'm planning to introduce a new API for pushpull which combines push and pull of one slice. I had an offline discussion with @eric-haibin-lin and he is fine with this approach. |
This PR is waiting on #15559 for the PushPull API in KVStore. |
I'm facing some design level challenges to properly implement Priority based update (P3) on top of PushPull API. MXNet does a simple load balancing before pushing or pulling key-values by splitting NDArrays equally to the parameter servers. P3 requires a round-robin style parameter distribution which means slicing a large NDArray into thousands of smaller ones. Much more granular than current default distribution strategy and each PS would get more than one slice. With the way mxnet and ps-lite designed right now, ps-lite assumes a single ZPush/ZPull/ZPushPull belongs to a single layer/NDArray. It also assumes that one slice only belong to one PS. These assumption need to be broken for implementing P3. What I have done right now is to add round-robin (RR) distribution strategy along with the default one and use a boolean flag to switch between these two. When user chooses to use RR, KVStore consider each slice as separate key-value pair. Otherwise fallback to the default mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on readability.
What would be the user experience for an end-user if he is interested in P3?
@@ -67,6 +68,7 @@ def __init__(self, handle): | |||
self._updater = None | |||
self._updater_func = None | |||
self._str_updater_func = None | |||
self._is_p3 = (os.getenv('DMLC_PS_VAN_TYPE', '') == 'p3') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will u also change launch.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I just wanted to confirm if we are planning to only add --p3 to launch.py now and not --horovod or --byteps because of the reasons I mentioned in the previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify a bit more on this? According to this RFC (#16795), p3 will be treated as another kvstore just like byteps/horovod. Why would it be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anandj91 and I had a recent discussion. Having separate python kvstore class for p3 leads to lots of duplicated code because it also supports both device and local mode. Its source code lives inside mxnet, and sharing the same python KVStore class is reasonable in terms of complexity and maintenance efforts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apeforest We do have a separate class for P3 in the backend though.
python/mxnet/kvstore/kvstore.py
Outdated
@@ -349,6 +363,7 @@ def pushpull(self, key, value, out=None, priority=0): | |||
|
|||
out: NDArray or list of NDArray or list of list of NDArray | |||
Values corresponding to the keys. | |||
`out` should have length as `value`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there such a restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is grouping of keys and corresponding ndarrays happening at the backend based on their positions in the list before performing kv operations. Since we are using same key list for both value and out, their layouts should be identical. length check is just for a sanity.
this is the reason why I had both vkeys and okeys separate in the previous implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. However, if there's one key, having value = a list of 4 arrays, and out = a list of one array is also valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case we need to bring back separate key list for value and out.
0358357
to
876744b
Compare
P3StoreDist extends KVStoreDist class to perform Push, Pull and PushPull based on the priority of the packets * Use --p3 flag in launch.py script to use P3 * DMLC_PS_VAN_TYPE - to turn on P3 with distributed kvstore * DMLC_PS_WATER_MARK - to set the maximum in-flight packets in the network queue Note: Currently P3 does not support: 1. operations on sparse ndarrays 2. gradient compression 3. update on kvstore server
@anandj91 great work! |
P3StoreDist extends KVStoreDist class to perform Push, Pull and PushPull based on the priority of the packets * Use --p3 flag in launch.py script to use P3 * DMLC_PS_VAN_TYPE - to turn on P3 with distributed kvstore * DMLC_PS_WATER_MARK - to set the maximum in-flight packets in the network queue Note: Currently P3 does not support: 1. operations on sparse ndarrays 2. gradient compression 3. update on kvstore server
Description
https://issues.apache.org/jira/browse/MXNET-1294
During data parallel training , the parameters of each layer is synchronized through parameter server after slicing them into smaller slices and synchronizing them independent to each other based on their priority. Currently priority of a layer is set as its negative index.
For more details, refer https://anandj.in/wp-content/uploads/sysml.pdf.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments