Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Dependency Update] Upgrade cuDNN & NCCL #14988

Merged
merged 1 commit into from
May 20, 2019

Conversation

stu1130
Copy link
Contributor

@stu1130 stu1130 commented May 17, 2019

Description

Since the CI have upgraded to use cuDNN 7.5.1 (#14950) , we can upgrade the CUDA 9.0/9.2/10.0 with latest cuDNN 7.5.1 & NCCL 2.4.2
@perdasilva please check it

Checklist

Run three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
Performance shown below
Environment: P3.16xlarge Deep Learning Base AMI
Codebase: commit 1540a84
I also applied the #14837 PR change
The unit of thoughput is samples/per second
Each throughput is calcuated by average of 5 runs

ResNet

model: Resnet50
dataset: Imagenet
number of gpu: 8
epochs: 3 (only to test throughput)
preprocess command: sudo pip install gluoncv==0.2.0b20180625
command: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 —num-data-workers 40 —num-epochs 3 —gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma —mode symbolic —model resnet50_v1b —rec-train /home/ubuntu/data/train-passthrough.rec —rec-train-idx /home/ubuntu/data/train-passthrough.idx —rec-val /home/ubuntu/data/val-passthrough.rec —rec-val-idx /home/ubuntu/data/val-passthrough.idx
github repo: /~https://github.com/rahul003/deep-learning-benchmark-mirror.git*

Throughput Tables cuDNN 7.5.1/NCCL 2.4.2 cuDNN 7.3.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10 2831.54405 2821.9832 0.339%
CUDA 9.2 2832.36803 2843.28968 -0.384%
CUDA 9.0 2815.83939 2851.92915 -1.265%

**There is another performance regression with --batch-size 256 --dtype float16 --mode hybrid, please find more details on #14838

LSTM

model: LSTM
dataset: PTB(Penn Treebank)
number of gpu: 1
epochs: 10
command:
python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local

Throughput Tables cuDNN 7.5.1/NCCL 2.4.2 cuDNN 7.3.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10 847.98222 868.28966 -2.339%
CUDA 9.2 1005.25185 1051.06692 -4.359%
CUDA 9.0 1002.59081 1028.46962 -1.265%

The CUDA 10 have a performance regression issue, please see #14725 to find more details.

MLP

model: 3 dense layers with num_hidden=64 and relu as activation
dataset: MNIST
number of gpu: 1
epochs: 10
command:
python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist

Throughput Tables cuDNN 7.5.1/NCCL 2.4.2 cuDNN 7.3.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10 4638.73873 4500.7834 3.065%
CUDA 9.2 4425.37599 4540.29583 -2.531%
CUDA 9.0 4421.82611 4427.43356 -0.127%

Comments

@szha @lanking520 @eric-haibin-lin @perdasilva

@stu1130 stu1130 requested a review from szha as a code owner May 17, 2019 18:14
@stu1130 stu1130 changed the title [Dependency Update] Upgrade cuDNN & NCCL [WIP][Dependency Update] Upgrade cuDNN & NCCL May 17, 2019
@stu1130 stu1130 changed the title [WIP][Dependency Update] Upgrade cuDNN & NCCL [Dependency Update] Upgrade cuDNN & NCCL May 17, 2019
@stu1130 stu1130 force-pushed the bump_up_cudnn_to_7_5_1 branch 2 times, most recently from e637967 to d27477f Compare May 19, 2019 06:22
@stu1130 stu1130 force-pushed the bump_up_cudnn_to_7_5_1 branch from d27477f to bf239ce Compare May 20, 2019 00:18
Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you for all your efforts getting CI to cuda v10.1 and the latest cudnn - very nice indeed!

@pinaraws
Copy link

@mxnet-label-bot add[CI, pr-awaiting-merge]

@marcoabreu marcoabreu added CI pr-awaiting-merge Review and CI is complete. Ready to Merge labels May 20, 2019
@szha szha merged commit ace478f into apache:master May 20, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants