optimize logsumexp in small data scale #52952

Asthestarsfalll · 2023-04-16T06:28:57Z

PR types

Performance optimization

PR changes

OPs

Description

optimize logsumexp in small data scale

具体思路为每个线程处理ColsPerThread个数据，当数据规模过小，启动线程组数量太少时，每个线程还会额外处理多行以提高指令并行

当前前向性能如下（1000次运行取平均值）：

Case No.	device	input_shape	input_type	New Paddle Perf(ms)	diff with original Paddle	diff with PyTorch
1	Tesla V100	[64L, 64L]	float32	0.003245	faster than 1555.2%	faster than 840.06%
2	Tesla V100	[1024L, 512L]	float32	0.004887	faster than 14769.2%	faster than 696.6%
3	Tesla V100	[64L, 64L]	float16	0.0032332	faster than 1517.9%	faster than 875.2%
4	Tesla V100	[1024L, 512L]	float16	0.0045824	faster than 15773.3%	faster than 715.7%

关联PR：#52509
计算方式：(old_time - new_time) / new_time

paddle-bot · 2023-04-16T06:29:02Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Asthestarsfalll · 2023-04-19T05:00:28Z

@JamesLim-sy 老师可以先审一下吗？

JamesLim-sy · 2023-04-20T06:50:31Z

@JamesLim-sy 老师可以先审一下吗？

这两天有点忙，今晚上会给出我的review建议

paddle/phi/kernels/gpu/logsumexp_function.cu.h

JamesLim-sy · 2023-04-25T08:14:27Z

paddle/phi/kernels/gpu/logsumexp_function.cu.h

+  HANDLE_THREAD_GROUP(29)
+  HANDLE_THREAD_GROUP(30)
+  HANDLE_THREAD_GROUP(31)
+  HANDLE_THREAD_GROUP(32)


这部分的展开有点暴力啊，能否改成RowsPerThread作为参数而非模板参数传入，但是在__global__ kernel直接将 Local Array 开到，Max value of RowsPerThread and max value of ColsPerThread]，但是我觉得还需要注意一个问题，double类型是否会导致使用过量的local memory

kernel启动失败的话再用LogsumexpFallbackKernel执行？

paddle/phi/kernels/gpu/logsumexp_kernel.cu

paddle-ci-bot · 2023-04-29T03:28:10Z

Sorry to inform you that d0b8f5d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

JamesLim-sy · 2023-05-04T05:12:48Z

paddle/phi/kernels/gpu/logsumexp_function.cu.h

+      out[cur_row + row_id] =
+          static_cast<SourceType>(log(warp_sum[row_id]) + warp_max[row_id]);
+    }
+  }


观察到这里的写出操作是不连续的，是否可以改成向量化部分，向量化连续写出，不可向量化的部分，采用非连续写出；或者采用threadIdx_0 写出 data_0, data_32, data_64；threadIdx_1 写出 data_1, data_33, data_65，类似这样的操作，避免掉每个线程的写出的 stride = RowsPerThread 这种操作

老师这里是什么意思，没有看明白

Asthestarsfalll · 2023-05-18T10:22:23Z

@JamesLim-sy 老师，CI已通过，麻烦审核一下

luotao1 · 2023-05-24T03:20:13Z

ROCM流水线编译失败

luotao1

#51835 (comment)
这个PR也修过ROCM的编译问题，看看有没有可参考的

Asthestarsfalll · 2023-05-24T08:50:12Z

@luotao1 @JamesLim-sy ci已通过

Asthestarsfalll · 2023-05-31T10:49:46Z

emmmm

paddle-ci-bot · 2023-06-01T03:16:34Z

Sorry to inform you that e518f38's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

luotao1 · 2023-06-01T04:19:10Z

@Asthestarsfalll 辛苦再merge下develop，重跑下CI

Asthestarsfalll · 2023-06-01T11:24:28Z

@JamesLim-sy @luotao1 ci已通过

Asthestarsfalll · 2023-06-02T03:56:09Z

@JamesLim-sy PTAL

JamesLim-sy

LGTM, Great job!

optimize logsumexp in small data scale

7480a5f

paddle-bot bot added contributor External developers status: proposed labels Apr 16, 2023

fix

9b575bc

luotao1 assigned luotao1, Ligoml and JamesLim-sy Apr 17, 2023

Asthestarsfalll and others added 3 commits April 18, 2023 15:48

fix

62b5ca7

add #pragma once

b45c8b6

Merge branch 'PaddlePaddle:develop' into optimize_logsumexp

d0b8f5d

JamesLim-sy reviewed Apr 25, 2023

View reviewed changes

Asthestarsfalll commented Apr 28, 2023

View reviewed changes

paddle/phi/kernels/gpu/logsumexp_kernel.cu Outdated Show resolved Hide resolved

JamesLim-sy reviewed May 4, 2023

View reviewed changes

Asthestarsfalll and others added 11 commits May 8, 2023 10:46

swith to use aligned_vector and support arbitrarily shape

80e5593

fix store

53bc8de

fix store

71afd5f

refine for special cases

9693df8

try

dbf55ba

Merge branch 'PaddlePaddle:develop' into optimize_logsumexp

6934f8d

fix

4e039c2

update

56de176

fix

b4b82fc

Merge branch 'PaddlePaddle:develop' into optimize_logsumexp

7f0ffb7

fix all_reduce

5946214

luotao1 added the PaddlePaddle Hackathon label May 17, 2023

paddle-bot bot removed the status: proposed label May 17, 2023

try

3cbc6af

Asthestarsfalll added 6 commits May 21, 2023 18:11

fix rocm bug

2876bd2

fix rocm bug

a35f0c4

fix rocm bug

36eb1e5

fix rocm bug

f1d1bda

fix rocm bug

08a1de1

fix rocm bug

8946009

luotao1 reviewed May 24, 2023

View reviewed changes

Asthestarsfalll and others added 3 commits May 24, 2023 13:00

fix rocm bug

51779dd

Merge branch 'PaddlePaddle:develop' into optimize_logsumexp

ea7df32

fix rocm bug

e518f38

Merge branch 'PaddlePaddle:develop' into optimize_logsumexp

1c72f33

JamesLim-sy approved these changes Jun 5, 2023

View reviewed changes

JamesLim-sy merged commit 93e1bb9 into PaddlePaddle:develop Jun 5, 2023

Ligoml mentioned this pull request Jun 5, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

Asthestarsfalll deleted the optimize_logsumexp branch August 19, 2023 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize logsumexp in small data scale #52952

optimize logsumexp in small data scale #52952

Asthestarsfalll commented Apr 16, 2023 •

edited by luotao1

Loading

paddle-bot bot commented Apr 16, 2023

Asthestarsfalll commented Apr 19, 2023

JamesLim-sy commented Apr 20, 2023

JamesLim-sy Apr 25, 2023 •

edited

Loading

Asthestarsfalll Apr 28, 2023

paddle-ci-bot bot commented Apr 29, 2023

JamesLim-sy May 4, 2023

Asthestarsfalll May 8, 2023

Asthestarsfalll commented May 18, 2023

luotao1 commented May 24, 2023

luotao1 left a comment

Asthestarsfalll commented May 24, 2023

Asthestarsfalll commented May 31, 2023

paddle-ci-bot bot commented Jun 1, 2023

luotao1 commented Jun 1, 2023

Asthestarsfalll commented Jun 1, 2023

Asthestarsfalll commented Jun 2, 2023

JamesLim-sy left a comment

optimize logsumexp in small data scale #52952

optimize logsumexp in small data scale #52952

Conversation

Asthestarsfalll commented Apr 16, 2023 • edited by luotao1 Loading

PR types

PR changes

Description

paddle-bot bot commented Apr 16, 2023

Asthestarsfalll commented Apr 19, 2023

JamesLim-sy commented Apr 20, 2023

JamesLim-sy Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Asthestarsfalll Apr 28, 2023

Choose a reason for hiding this comment

paddle-ci-bot bot commented Apr 29, 2023

JamesLim-sy May 4, 2023

Choose a reason for hiding this comment

Asthestarsfalll May 8, 2023

Choose a reason for hiding this comment

Asthestarsfalll commented May 18, 2023

luotao1 commented May 24, 2023

luotao1 left a comment

Choose a reason for hiding this comment

Asthestarsfalll commented May 24, 2023

Asthestarsfalll commented May 31, 2023

paddle-ci-bot bot commented Jun 1, 2023

luotao1 commented Jun 1, 2023

Asthestarsfalll commented Jun 1, 2023

Asthestarsfalll commented Jun 2, 2023

JamesLim-sy left a comment

Choose a reason for hiding this comment

Asthestarsfalll commented Apr 16, 2023 •

edited by luotao1

Loading

JamesLim-sy Apr 25, 2023 •

edited

Loading