-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[OP] Accelerate GPU version of LayerNorm(axis=-1) #14935
Conversation
We tested the speed of LayerNorm(axis=-1) with different batch, channel, dtype combinations. The results are listed as follows. We use both We run the speed test in a P3.2 machine (V100). All experiments are repeated for 3 times and the average running time is reported. codes: /~https://github.com/sxjscience/benchmark_ops/blob/master/gen_layernorm_benchmark.py To reproduce, run the following code: git clone /~https://github.com/sxjscience/benchmark_ops.git
cd benchmark_ops
python gen_layernorm_benchmark.py PyTorch + Apex + FP32 --> time in microsecond (us), apex: /~https://github.com/NVIDIA/apex Forward (nvprof timer)
Backward (nvprof timer)
Backward Data (nvprof timer)
Backward Gamma & Beta (nvprof timer)
Forward (python timer)
Backward (python timer)
|
MXNet (new kernel) + FP32 According to nvprof, the performance of the new kernel matches that of nvidia/apex. However, if we check the overall running time of the python script, MXNet is much slower than PyTorch. This is caused by some other overheads and is not related to the CUDA kernel. Forward (nvprof timer)
Backward (nvprof timer)
Backward Data (nvprof timer)
Backward Gamma & Beta (nvprof timer)
Forward (python timer)
Backward (python timer)
|
Nice work! Can you retrigger CI? |
fix lint fix lint fix bug further accelerate fix fix bug fix bug
PyTorch + Apex + FP16
Backward (nvprof timer)
Backward Data (nvprof timer)
Backward Gamma & Beta (nvprof timer)
Forward (python timer)
Backward (python timer)
|
MXNet (new kernel) + FP16 Forward (nvprof timer)
Backward (nvprof timer)
Backward Data (nvprof timer)
Backward Gamma & Beta (nvprof timer)
Forward (python timer)
Backward (python timer)
|
@mxnet-label-bot add[Operator, pr-awaiting-review] |
fix lint fix lint fix bug further accelerate fix fix bug fix bug
Description
Accelerate the speed of LayerNorm when (axis=-1).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments