optimize performance of offload in dygraph sharding stage2 #38064

haohongxiang · 2021-12-12T14:37:04Z

PR types

Performance optimization

PR changes

Others

Describe

optimize performance of offload in dygraph sharding stage2

After performance optimization：
1、precision：（PaddleNLP GPT-3 model）
sharding stage2+fp16+gradients accumulation（with offload versus without offload）

2、performance：（PaddleNLP GPT-3 model）

hidden size=1024，layer_num=4，global batch size=16，micro batch size=2，sharding_degree=2
before optimization -- speed: 0.23-0.24 step/s ；ips: 3700-3800 tokens/s
after optimization -- speed: 0.58-0.60 step/s ；ips: 9500-9800 tokens/s

3、peek gpu memory：（PaddleNLP GPT-3 model）

0.31B parameters -- 3137 MiB（no difference made by performance optimization）
1.02B parameters -- 5369 MiB（no difference made by performance optimization）

paddle-bot-old · 2021-12-12T14:37:08Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes

LGTM

ForFishes · 2021-12-14T06:47:47Z

python/paddle/distributed/fleet/meta_optimizers/dygraph_optimizer/sharding_optimizer_stage2.py

 from ...meta_parallel.sharding.sharding_utils import Type, device_guard, ShardingClipGrad

 # CUDA alignment 256 bytes
-alignment = {"gpu": 256, }
+alignment = {"gpu": 256, "cpu": 256}


cpu: 256? 这个值不对吧。

haohongxiang added 4 commits December 19, 2021 13:02

update

69c5cc5

fix bugs

205aee6

modify code style

0b610c4

fix bugs of _get_global_group

97de87d

ForFishes approved these changes Dec 21, 2021

View reviewed changes

ForFishes merged commit f74ebd8 into PaddlePaddle:develop Dec 21, 2021

haohongxiang mentioned this pull request Dec 23, 2021

support offload in sharding stage2 #37904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize performance of offload in dygraph sharding stage2 #38064

optimize performance of offload in dygraph sharding stage2 #38064

haohongxiang commented Dec 12, 2021 •

edited

Loading

paddle-bot-old bot commented Dec 12, 2021

ForFishes left a comment

ForFishes Dec 14, 2021

optimize performance of offload in dygraph sharding stage2 #38064

optimize performance of offload in dygraph sharding stage2 #38064

Conversation

haohongxiang commented Dec 12, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Dec 12, 2021

ForFishes left a comment

Choose a reason for hiding this comment

ForFishes Dec 14, 2021

Choose a reason for hiding this comment

haohongxiang commented Dec 12, 2021 •

edited

Loading