Aggregate SGD #13346

ptrendx · 2018-11-21T00:05:26Z

Description

Currently MXNet optimizers are invoked 1 weight at a time. This leads to a lot of synchronization overhead, as updates (especially for convolutions and batchnorm) tend to be small, but each one needs to by synchronized upon.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

ability to control update_on_kvstore value via environment variable MXNET_UPDATE_ON_KVSTORE (default is 1, which is consistent with the current behavior)
if update_on_kvstore is False, in the case of SGD optimizer it attempts to bundle updates of multiple weights together and launches a single kernel to perform them all, reducing the number of kernel calls and synchronizations.

Comments

Current test_sgd automatically uses the new code paths, so no new tests are needed.
Code does not support sparse arrays and it will fall back to not aggregated calls when it encounters sparse array in the bundle of weights/gradients

stu1130 · 2018-11-21T02:48:24Z

@mxnet-label-bot add [pr-awaiting-review]
Thanks for your contribution @ptrendx

anirudhacharya · 2018-11-21T03:50:19Z

@ptrendx can you share a benchmark on SGD performance when MXNET_UPDATE_ON_KVSTORE is set for aggregate SGD vs when when it is not.

ptrendx · 2018-11-21T17:16:54Z

This PR is part of upstreaming improvements to MXNet that are available in NVIDIA's NGC 18.11 MXNet container. I will use results from that container to show the impact once all the other improvements are in place. The benchmark shown is ResNet v1.5 training on single V100 32GB in DGX1-V, batch size 32.

MXNET_UPDATE_ON_KVSTORE=1 (default)

root@dgx1v-loki-19:/opt/mxnet/example/image-classification# numactl --physcpubind=0-4 ./train_imagenet_runner -n 1 -b 32 --network resnet-v1b --disp-batches 50 -e 1 --no-val -s 12800                             
INFO:root:start with arguments Namespace(batch_size=32, batchnorm_eps=2e-05, batchnorm_layout='NHWC', batchnorm_mom=0.9, benchmark=0, bn_gamma_init0=False, brightness=0.4, contrast=0.4, conv_algo=-1, conv_layou$
='NHWC', custom_bn_off=0, dali_nvjpeg_memory_padding=16, dali_prefetch_queue=3, dali_threads=3, data_nthreads=40, data_train='/data/imagenet/train-480-val-256-recordio/train.rec', data_train_idx='/data/imagenet$
train-480-val-256-recordio/train.idx', data_val=None, data_val_idx='', disp_batches=50, dtype='float16', epoch_size=0, fill_value=127, force_tensor_core=0, fuse_bn_add_relu=1, fuse_bn_relu=1, gc_threshold=0.5, $
c_type='none', gpus='0', image_shape='4,224,224', initializer='default', input_layout='NCHW', kv_store='device', load_epoch=None, log='', logging_dir='logs', loss='', lr=0.0125, lr_factor=0.1, lr_step_epochs='3$
,60,80', macrobatch_size=0, max_crop_size=-1, max_random_area=1.0, max_random_aspect_ratio=1.33, max_random_h=0, max_random_l=0, max_random_rotate_angle=0, max_random_s=0, max_random_scale=1.0, max_random_shear$
ratio=0.0, min_crop_size=-1, min_random_area=0.08, min_random_aspect_ratio=0.75, min_random_scale=1.0, model_prefix=None, mom=0.9, monitor=0, network='resnet-v1b-fl', num_classes=1000, num_epochs=1, num_example$
=12800, num_layers=50, optimizer='sgd', pad_size=0, pca_noise=0.0, pooling_layout='NHWC', profile_server_suffix='', profile_worker_suffix='', random_crop=0, random_mirror=1, random_resized_crop=1, resize=256, r$
b_mean='123.68,116.779,103.939', rgb_std='1,1,1', saturation=0.4, save_period=1, seed=None, separ_val=False, set_data_aug_level=None, set_resnet_aug=None, test_io=0, top_k=0, use_dali=True, verbose=0, warmup_ep$
chs=5, warmup_strategy='linear', wd=0.0001)
/opt/mxnet/example/image-classification/common/dali.py:142: UserWarning: 12800 training examples will be used, although full training set contains 1281167 examples
  warnings.warn("{} training examples will be used, although full training set contains {} examples".format(args.num_examples, trainpipes[0].epoch_size("Reader")))
[17:04:56] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:119: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 t$
 disable)
INFO:root:Epoch[0] Batch [50]   Speed: 912.56 samples/sec lr:0.000313   accuracy=0.000613
INFO:root:Epoch[0] Batch [100]  Speed: 922.14 samples/sec lr:0.000625   accuracy=0.000625
INFO:root:Epoch[0] Batch [150]  Speed: 919.71 samples/sec lr:0.000937   accuracy=0.000625
INFO:root:Epoch[0] Batch [200]  Speed: 924.12 samples/sec lr:0.001250   accuracy=0.001875
INFO:root:Epoch[0] Batch [250]  Speed: 922.34 samples/sec lr:0.001563   accuracy=0.000625
INFO:root:Epoch[0] Batch [300]  Speed: 923.93 samples/sec lr:0.001875   accuracy=0.000625
INFO:root:Epoch[0] Batch [350]  Speed: 923.90 samples/sec lr:0.002188   accuracy=0.002500
INFO:root:Epoch[0] Train-accuracy=0.001276
INFO:root:Epoch[0] Time cost=15.579

MXNET_UPDATE_ON_KVSTORE=0
MXNET_OPTIMIZER_AGGREGATION_SIZE=1 (no aggregation)
Speedup here comes from lack unnecessary (in single GPU case) broadcast call in the kvstore.

root@dgx1v-loki-19:/opt/mxnet/example/image-classification# numactl --physcpubind=0-4 ./train_imagenet_runner -n 1 -b 32 --network resnet-v1b --disp-batches 50 -e 1 --no-val -s 12800                             
INFO:root:start with arguments Namespace(batch_size=32, batchnorm_eps=2e-05, batchnorm_layout='NHWC', batchnorm_mom=0.9, benchmark=0, bn_gamma_init0=False, brightness=0.4, contrast=0.4, conv_algo=-1, conv_layout='NHWC', custom_bn_off=0, dali_nvjpeg_memory_padding=16, dali_prefetch_queue=3, dali_threads=3, data_nthreads=40, data_train='/data/imagenet/train-480-val-256-recordio/train.rec', data_train_idx='/data/imagenet/train-480-val-256-recordio/train.idx', data_val=None, data_val_idx='', disp_batches=50, dtype='float16', epoch_size=0, fill_value=127, force_tensor_core=0, fuse_bn_add_relu=1, fuse_bn_relu=1, gc_threshold=0.5, gc_type='none', gpus='0', image_shape='4,224,224', initializer='default', input_layout='NCHW', kv_store='device', load_epoch=None, log='', logging_dir='logs', loss='', lr=0.0125, lr_factor=0.1, lr_step_epochs='30,60,80', macrobatch_size=0, max_crop_size=-1, max_random_area=1.0, max_random_aspect_ratio=1.33, max_random_h=0, max_random_l=0, max_random_rotate_angle=0, max_random_s=0, max_random_scale=1.0, max_random_shear_ratio=0.0, min_crop_size=-1, min_random_area=0.08, min_random_aspect_ratio=0.75, min_random_scale=1.0, model_prefix=None, mom=0.9, monitor=0, network='resnet-v1b-fl', num_classes=1000, num_epochs=1, num_examples=12800, num_layers=50, optimizer='sgd', pad_size=0, pca_noise=0.0, pooling_layout='NHWC', profile_server_suffix='', profile_worker_suffix='', random_crop=0, random_mirror=1, random_resized_crop=1, resize=256, rgb_mean='123.68,116.779,103.939', rgb_std='1,1,1', saturation=0.4, save_period=1, seed=None, separ_val=False, set_data_aug_level=None, set_resnet_aug=None, test_io=0, top_k=0, use_dali=True, verbose=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
/opt/mxnet/example/image-classification/common/dali.py:142: UserWarning: 12800 training examples will be used, although full training set contains 1281167 examples
  warnings.warn("{} training examples will be used, although full training set contains {} examples".format(args.num_examples, trainpipes[0].epoch_size("Reader")))
[17:12:43] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:119: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [50]   Speed: 959.50 samples/sec lr:0.000313   accuracy=0.000613
INFO:root:Epoch[0] Batch [100]  Speed: 968.80 samples/sec lr:0.000625   accuracy=0.000625
INFO:root:Epoch[0] Batch [150]  Speed: 966.11 samples/sec lr:0.000937   accuracy=0.000625
INFO:root:Epoch[0] Batch [200]  Speed: 969.05 samples/sec lr:0.001250   accuracy=0.001875
INFO:root:Epoch[0] Batch [250]  Speed: 971.04 samples/sec lr:0.001563   accuracy=0.000625
INFO:root:Epoch[0] Batch [300]  Speed: 971.68 samples/sec lr:0.001875   accuracy=0.000625
INFO:root:Epoch[0] Batch [350]  Speed: 971.70 samples/sec lr:0.002188   accuracy=0.002500
INFO:root:Epoch[0] Train-accuracy=0.001276
INFO:root:Epoch[0] Time cost=14.874

MXNET_UPDATE_ON_KVSTORE=0
MXNET_OPTIMIZER_AGGREGATION_SIZE=4 (default in this PR)

root@dgx1v-loki-19:/opt/mxnet/example/image-classification# numactl --physcpubind=0-4 ./train_imagenet_runner -n 1 -b 32 --network resnet-v1b --disp-batches 50 -e 1 --no-val -s 12800
INFO:root:start with arguments Namespace(batch_size=32, batchnorm_eps=2e-05, batchnorm_layout='NHWC', batchnorm_mom=0.9, benchmark=0, bn_gamma_init0=False, brightness=0.4, contrast=0.4, conv_algo=-1, conv_layout='NHWC', custom_bn_off=0, dali_nvjpeg_memory_padding=16, dali_prefetch_queue=3, dali_threads=3, data_nthreads=40, data_train='/data/imagenet/train-480-val-256-recordio/train.rec', data_train_idx='/data/imagenet/train-480-val-256-recordio/train.idx', data_val=None, data_val_idx='', disp_batches=50, dtype='float16', epoch_size=0, fill_value=127, force_tensor_core=0, fuse_bn_add_relu=1, fuse_bn_relu=1, gc_threshold=0.5, gc_type='none', gpus='0', image_shape='4,224,224', initializer='default', input_layout='NCHW', kv_store='device', load_epoch=None, log='', logging_dir='logs', loss='', lr=0.0125, lr_factor=0.1, lr_step_epochs='30,60,80', macrobatch_size=0, max_crop_size=-1, max_random_area=1.0, max_random_aspect_ratio=1.33, max_random_h=0, max_random_l=0, max_random_rotate_angle=0, max_random_s=0, max_random_scale=1.0, max_random_shear_ratio=0.0, min_crop_size=-1, min_random_area=0.08, min_random_aspect_ratio=0.75, min_random_scale=1.0, model_prefix=None, mom=0.9, monitor=0, network='resnet-v1b-fl', num_classes=1000, num_epochs=1, num_examples=12800, num_layers=50, optimizer='sgd', pad_size=0, pca_noise=0.0, pooling_layout='NHWC', profile_server_suffix='', profile_worker_suffix='', random_crop=0, random_mirror=1, random_resized_crop=1, resize=256, rgb_mean='123.68,116.779,103.939', rgb_std='1,1,1', saturation=0.4, save_period=1, seed=None, separ_val=False, set_data_aug_level=None, set_resnet_aug=None, test_io=0, top_k=0, use_dali=True, verbose=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
/opt/mxnet/example/image-classification/common/dali.py:142: UserWarning: 12800 training examples will be used, although full training set contains 1281167 examples
  warnings.warn("{} training examples will be used, although full training set contains {} examples".format(args.num_examples, trainpipes[0].epoch_size("Reader")))
[17:14:43] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:119: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [50]   Speed: 1005.45 samples/sec lr:0.000313  accuracy=0.000613
INFO:root:Epoch[0] Batch [100]  Speed: 1020.27 samples/sec lr:0.000625  accuracy=0.000625
INFO:root:Epoch[0] Batch [150]  Speed: 1016.28 samples/sec lr:0.000937  accuracy=0.000625
INFO:root:Epoch[0] Batch [200]  Speed: 1020.46 samples/sec lr:0.001250  accuracy=0.001875
INFO:root:Epoch[0] Batch [250]  Speed: 1018.46 samples/sec lr:0.001563  accuracy=0.000625
INFO:root:Epoch[0] Batch [300]  Speed: 1020.25 samples/sec lr:0.001875  accuracy=0.000625
INFO:root:Epoch[0] Batch [350]  Speed: 1020.17 samples/sec lr:0.002188  accuracy=0.002500
INFO:root:Epoch[0] Train-accuracy=0.001276
INFO:root:Epoch[0] Time cost=14.256

MXNET_UPDATE_ON_KVSTORE=0
MXNET_OPTIMIZER_AGGREGATION_SIZE=60 (max possible)

root@dgx1v-loki-19:/opt/mxnet/example/image-classification# numactl --physcpubind=0-4 ./train_imagenet_runner -n 1 -b 32 --network resnet-v1b --disp-batches 50 -e 1 --no-val -s 12800
INFO:root:start with arguments Namespace(batch_size=32, batchnorm_eps=2e-05, batchnorm_layout='NHWC', batchnorm_mom=0.9, benchmark=0, bn_gamma_init0=False, brightness=0.4, contrast=0.4, conv_algo=-1, conv_layout='NHWC', custom_bn_off=0, dali_nvjpeg_memory_padding=16, dali_prefetch_queue=3, dali_threads=3, data_nthreads=40, data_train='/data/imagenet/train-480-val-256-recordio/train.rec', data_train_idx='/data/imagenet/train-480-val-256-recordio/train.idx', data_val=None, data_val_idx='', disp_batches=50, dtype='float16', epoch_size=0, fill_value=127, force_tensor_core=0, fuse_bn_add_relu=1, fuse_bn_relu=1, gc_threshold=0.5, gc_type='none', gpus='0', image_shape='4,224,224', initializer='default', input_layout='NCHW', kv_store='device', load_epoch=None, log='', logging_dir='logs', loss='', lr=0.0125, lr_factor=0.1, lr_step_epochs='30,60,80', macrobatch_size=0, max_crop_size=-1, max_random_area=1.0, max_random_aspect_ratio=1.33, max_random_h=0, max_random_l=0, max_random_rotate_angle=0, max_random_s=0, max_random_scale=1.0, max_random_shear_ratio=0.0, min_crop_size=-1, min_random_area=0.08, min_random_aspect_ratio=0.75, min_random_scale=1.0, model_prefix=None, mom=0.9, monitor=0, network='resnet-v1b-fl', num_classes=1000, num_epochs=1, num_examples=12800, num_layers=50, optimizer='sgd', pad_size=0, pca_noise=0.0, pooling_layout='NHWC', profile_server_suffix='', profile_worker_suffix='', random_crop=0, random_mirror=1, random_resized_crop=1, resize=256, rgb_mean='123.68,116.779,103.939', rgb_std='1,1,1', saturation=0.4, save_period=1, seed=None, separ_val=False, set_data_aug_level=None, set_resnet_aug=None, test_io=0, top_k=0, use_dali=True, verbose=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
/opt/mxnet/example/image-classification/common/dali.py:142: UserWarning: 12800 training examples will be used, although full training set contains 1281167 examples
  warnings.warn("{} training examples will be used, although full training set contains {} examples".format(args.num_examples, trainpipes[0].epoch_size("Reader")))
[17:15:54] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:119: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [50]   Speed: 1035.09 samples/sec lr:0.000313  accuracy=0.000613
INFO:root:Epoch[0] Batch [100]  Speed: 1047.64 samples/sec lr:0.000625  accuracy=0.000625
INFO:root:Epoch[0] Batch [150]  Speed: 1042.25 samples/sec lr:0.000937  accuracy=0.000625
INFO:root:Epoch[0] Batch [200]  Speed: 1047.25 samples/sec lr:0.001250  accuracy=0.001875
INFO:root:Epoch[0] Batch [250]  Speed: 1045.58 samples/sec lr:0.001563  accuracy=0.000625
INFO:root:Epoch[0] Batch [300]  Speed: 1044.48 samples/sec lr:0.001875  accuracy=0.000625
INFO:root:Epoch[0] Batch [350]  Speed: 1045.78 samples/sec lr:0.002188  accuracy=0.002500
INFO:root:Epoch[0] Train-accuracy=0.001276
INFO:root:Epoch[0] Time cost=13.927

update_on_kvstore to be true

lupesko · 2018-12-05T06:13:42Z

Thanks for the contribution @ptrendx !
Adding @nswamy and @sandeep-krishnamurthy to help review/merge.

eric-haibin-lin · 2018-12-10T05:13:18Z

tests/python/unittest/test_gluon_trainer.py

@@ -98,6 +99,9 @@ def dict_equ(a, b):

 @with_seed()


Is it not tested with test_trainer?

Are you asking why I am not changing the test_trainer as well since it should fail with the MXNET_UPDATE_ON_KVSTORE=0 option set? Since you made a PR to fix that test, I did not change it. The MXNET_UPDATE_ON_KVSTORE=0 option is not set in CI (although logic for the aggregated SGD itself is tested by the SGD test).

Roshrini · 2019-01-02T19:50:58Z

@ptrendx Can you please rebase this PR?

anirudhacharya

In the PR description you said that the test_sgd covers the new code paths. But in _update_impl there is an if statement with aggregate

if aggregate:
    ...
else:
    ...

can you explain how the else block is covered with the current test_sgd code.

docs/faq/env_var.md

ptrendx · 2019-01-03T00:28:30Z

python/mxnet/optimizer/optimizer.py

+        for weight, grad in zip(weights, grads):
+            assert(isinstance(weight, NDArray))
+            assert(isinstance(grad, NDArray))
+            aggregate = (aggregate and


@anirudhacharya As you can see aggregate is set to True at the beginning and changes to False when encountering non-default storage type, so testing with both dense and sparse data tests both branches of the code.

ptrendx · 2019-01-03T00:34:23Z

src/operator/optimizer_op-inl.h

+
+template<typename DType, typename MPDType>
+struct MultiSGDKernelParam {
+  static const int N = 60;


@anirudhacharya This is the reason of 60 - I pass this struct as kernel parameter, which has a limit of 4 kB.

ptrendx · 2019-01-04T18:15:03Z

Is there anything else needed for this PR?

anirudhacharya

I think the code LGTM. Some minor doc fixes are needed I think

anirudhacharya · 2019-01-04T20:37:50Z

python/mxnet/optimizer/optimizer.py

@@ -105,6 +110,7 @@ def __init__(self, rescale_grad=1., param_idx2name=None, wd=0.,
        self._index_update_count = {}
        self.clip_gradient = clip_gradient
        self.multi_precision = multi_precision
+        self.aggregate_num = 0


~~please add this in the parameter list in the class doc.~~

It is not really a parameter though - it is up to the optimizer (not the user) to override this value if they support aggregated execution.

anirudhacharya · 2019-01-04T20:44:33Z

python/mxnet/optimizer/optimizer.py

@@ -502,6 +545,7 @@ def __init__(self, momentum=0.0, lazy_update=True, **kwargs):
        super(SGD, self).__init__(**kwargs)
        self.momentum = momentum
        self.lazy_update = lazy_update
+        self.aggregate_num = int(os.getenv('MXNET_OPTIMIZER_AGGREGATION_SIZE', "4"))


in line 510 can you add a section on aggregate updates and in line 524 can also point to these two methods - multi_sgd_mom_update and multi_mp_sgd_update as optimizer update rules.

I wrote the section on aggregate updates, but I'm not sure about pointing to the new methods in line 524 - they use the same algorithm as the sgd_update and sgd_mom_update functions so pointing to those functions for details of the algorithm seems sufficient.

I still think the 'multi' update methods should show up in the SGD doc description. But I am okay with the code owner/merger making a call on this.

I don't think it's necessary to point to these two methods since the algorithm is the same one

KellenSunderland · 2019-01-14T22:24:24Z

LGTM

@eric-haibin-lin Open question brought by @anirudh2290. In your opinion should 'multi' update methods should show up in the SGD doc description?

eric-haibin-lin

Looks good pending some suggestions for documentation. Awesome work

src/operator/optimizer_op.cc

python/mxnet/gluon/trainer.py

* Aggregate SGD * Make OpWrapperGenerator understand Tuple<float> * Trigger * Add NNVM Tuple to cpp-package op.h * Trigger * Fix pylint aggregate SGD * Update info about new ENV vars and modifying 2 tests that require update_on_kvstore to be true * Fix * Aggregate SGD support for Gluon trainer * Added text to doc about aggregate update in SGD optimizer * Docs changes from review

This reverts commit 0a45e1a.

This reverts commit fabc318.

* Aggregate SGD * Make OpWrapperGenerator understand Tuple<float> * Trigger * Add NNVM Tuple to cpp-package op.h * Trigger * Fix pylint aggregate SGD * Update info about new ENV vars and modifying 2 tests that require update_on_kvstore to be true * Fix * Aggregate SGD support for Gluon trainer * Added text to doc about aggregate update in SGD optimizer * Docs changes from review

kice · 2019-03-09T00:30:30Z

Missing type information for some parameteres

E.g. From here https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.multi_mp_sgd_mom_update

lrs (tuple of , required) – Learning rates.

This should be tuple of <float>, instead of tuple of .

And OpWrapperGenerator.py also complains this

argument "lrs" of operator "multi_sgd_update" has unknown type ", required"
argument "wds" of operator "multi_sgd_update" has unknown type ", required"
argument "lrs" of operator "multi_sgd_mom_update" has unknown type ", required"
argument "wds" of operator "multi_sgd_mom_update" has unknown type ", required"
argument "lrs" of operator "multi_mp_sgd_update" has unknown type ", required"
argument "wds" of operator "multi_mp_sgd_update" has unknown type ", required"
argument "lrs" of operator "multi_mp_sgd_mom_update" has unknown type ", required"
argument "wds" of operator "multi_mp_sgd_mom_update" has unknown type ", required"

* Aggregate SGD * Make OpWrapperGenerator understand Tuple<float> * Trigger * Add NNVM Tuple to cpp-package op.h * Trigger * Fix pylint aggregate SGD * Update info about new ENV vars and modifying 2 tests that require update_on_kvstore to be true * Fix * Aggregate SGD support for Gluon trainer * Added text to doc about aggregate update in SGD optimizer * Docs changes from review

Aggregate SGD

21f5eaf

ptrendx requested review from anirudh2290 and szha as code owners November 21, 2018 00:05

szha requested a review from eric-haibin-lin November 21, 2018 00:12

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 21, 2018

Make OpWrapperGenerator understand Tuple<float>

3280482

ptrendx requested a review from nswamy as a code owner November 21, 2018 19:55

ptrendx added 7 commits November 26, 2018 08:45

Trigger

74f7b9b

Add NNVM Tuple to cpp-package op.h

81d3b79

Trigger

c25903d

Fix pylint aggregate SGD

a29a191

Update info about new ENV vars and modifying 2 tests that require

e03cd05

update_on_kvstore to be true

Merge branch 'upstream' into pr_aggregate_sgd

f32aa9a

Fix

08314a6

eric-haibin-lin reviewed Dec 10, 2018

View reviewed changes

ptrendx added 2 commits December 20, 2018 16:24

Aggregate SGD support for Gluon trainer

47bfcf4

Merge branch 'upstream' into pr_aggregate_sgd

fc35a40

Merge branch 'upstream' into pr_aggregate_sgd

9dc8974

anirudhacharya reviewed Jan 2, 2019

View reviewed changes

docs/faq/env_var.md Show resolved Hide resolved

ptrendx commented Jan 3, 2019

View reviewed changes

anirudhacharya reviewed Jan 4, 2019

View reviewed changes

ptrendx added 2 commits January 4, 2019 13:27

Added text to doc about aggregate update in SGD optimizer

d2752da

Merge branch 'upstream' into pr_aggregate_sgd

1ff01ac

eric-haibin-lin reviewed Jan 15, 2019

View reviewed changes

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved

python/mxnet/gluon/trainer.py Show resolved Hide resolved

ptrendx added 2 commits January 16, 2019 10:07

Docs changes from review

cc37a34

Merge branch 'upstream' into pr_aggregate_sgd

dbb3f8c

eric-haibin-lin merged commit 0a45e1a into apache:master Jan 24, 2019

yuxihu mentioned this pull request Jan 29, 2019

MXNet: support aggregation in optimizer & few bugfixes horovod/horovod#793

Merged

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Feb 5, 2019

Revert "Aggregate SGD (apache#13346)"

d1554c1

This reverts commit 0a45e1a.

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Feb 13, 2019

Revert "Aggregate SGD (apache#13346)"

fabc318

This reverts commit 0a45e1a.

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Feb 13, 2019

Revert "Aggregate SGD (apache#13346)"

d58815e

This reverts commit 0a45e1a.

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Feb 14, 2019

Revert "Aggregate SGD (apache#13346)"

c128394

This reverts commit 0a45e1a.

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Feb 14, 2019

Revert "Revert "Aggregate SGD (apache#13346)""

a649f67

This reverts commit fabc318.

This was referenced Feb 20, 2019

fix nightly on sgd awslabs/keras-apache-mxnet#225

Merged

update params failed when params and grads are empty #14216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate SGD #13346

Aggregate SGD #13346

ptrendx commented Nov 21, 2018

stu1130 commented Nov 21, 2018

anirudhacharya commented Nov 21, 2018 •

edited

Loading

ptrendx commented Nov 21, 2018

lupesko commented Dec 5, 2018

eric-haibin-lin Dec 10, 2018

ptrendx Dec 10, 2018

Roshrini commented Jan 2, 2019

anirudhacharya left a comment

ptrendx Jan 3, 2019

anirudhacharya Jan 3, 2019

ptrendx Jan 3, 2019

ptrendx commented Jan 4, 2019

anirudhacharya left a comment

anirudhacharya Jan 4, 2019 •

edited

Loading

ptrendx Jan 4, 2019

anirudhacharya Jan 4, 2019

ptrendx Jan 4, 2019

ptrendx Jan 4, 2019

anirudhacharya Jan 4, 2019

eric-haibin-lin Jan 15, 2019

KellenSunderland commented Jan 14, 2019

eric-haibin-lin left a comment

kice commented Mar 9, 2019 •

edited

Loading

Aggregate SGD #13346

Aggregate SGD #13346

Conversation

ptrendx commented Nov 21, 2018

Description

Checklist

Essentials

Changes

Comments

stu1130 commented Nov 21, 2018

anirudhacharya commented Nov 21, 2018 • edited Loading

ptrendx commented Nov 21, 2018

lupesko commented Dec 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roshrini commented Jan 2, 2019

anirudhacharya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrendx commented Jan 4, 2019

anirudhacharya left a comment

Choose a reason for hiding this comment

anirudhacharya Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KellenSunderland commented Jan 14, 2019

eric-haibin-lin left a comment

Choose a reason for hiding this comment

kice commented Mar 9, 2019 • edited Loading

anirudhacharya commented Nov 21, 2018 •

edited

Loading

anirudhacharya Jan 4, 2019 •

edited

Loading

kice commented Mar 9, 2019 •

edited

Loading