Auto tune for cutlass #50809

umiswing · 2023-02-23T05:42:42Z

PR types

New features

PR changes

OPs

Describe

在auto tune中增加了cutlass的gather-gemm-scatter融合的自动调优功能。默认开启调优。
sparse conv3d实现中，涉及shape为(m, n, k)的GEMM。m与features num相关，变化较大。为了防止大量重复搜索，将(m, n, k)的shape映射为(m/features_num_range, n, k)。features_num_range当前设为1e4。后续可能根据推理和训练的情况调整features_num_range的大小。
去除了手写规则。
由于cutlass在sm 70上的gemm-scatter实现有问题，因此去除了生成规则中的sm 70部分。本PR支持了sm 80。后续将增加对sm 75的支持。
支持自动调优的数据类型：fp16、fp32

… auto_select

paddle-bot · 2023-02-23T05:42:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… auto_select

JamesLim-sy · 2023-03-07T08:24:33Z

paddle/phi/kernels/autotune/auto_tune_base.h

+GatherGemmScatterGetCache() {
+  return autotune::AutoTuneCache::Instance().Get(
+      AlgorithmType::kGatherGemmScatterFP16NN);
+}


GatherGemmScatterGetCache 函数移动到 cache_base.h目录下更合适.

我将GatherGemmScatterGetCache重命名为GetGatherGemmScatter移动到cache.h，作为AutoTuneCache的成员函数了。

JamesLim-sy · 2023-03-07T08:28:37Z

paddle/phi/kernels/autotune/auto_tune_base.h

+           const int32_t* b_indices,
+           const int32_t* c_d_indices,
+           T alpha,
+           T beta) {


既然类的模板参数中已经含有了变参模板 typename... Args，run 函数的书写可以仿照class MatmulAutoTuner 做简化，函数体内部也可以做同样地简化.

已简化。

JamesLim-sy · 2023-03-07T08:45:33Z

paddle/phi/kernels/sparse/gpu/gather_gemm_scatter.h

-                        static_cast<const int32_t*>(c_d_indices),
-                        static_cast<cutlass::half_t>(1),
-                        static_cast<cutlass::half_t>(1));
+    GatherGemmScatter(dev_ctx,


static void dispatchKernel(const GPUContext& dev_ctx, 函数已经加入了模板参数 template <typename T>，const phi::DataType type应该不用传入了，配合模板参数if (type == phi::DataType::FLOAT16) { 这样的分支语句也可替换掉了.

已简化，去掉了dispatchKernel，使用对GatherGemmScatterDriver（即原来的GatherGemmScatter)做partial template specialization的写法。

JamesLim-sy · 2023-03-07T08:45:59Z

paddle/phi/kernels/sparse/gpu/gather_gemm_scatter.h

+                      nullptr,
+                      static_cast<const int32_t*>(c_d_indices),
+                      static_cast<float>(alpha),
+                      static_cast<float>(beta));


FLOAT64 的分支删除后，会采用什么措施补充吗？

这个pr的做法是，fp64不走融合分支。因为模型不用fp64。

JamesLim-sy · 2023-03-07T08:50:44Z

paddle/phi/kernels/sparse/gpu/gather_gemm_scatter.h

+  for (auto i = 1; i < fp16_kernels.size(); i++)
+    tuner->AddCallBack(fp16_kernels[i]);
+
+  size_t key = autotune::GenKey(m / features_num_range, n, k);


建议把 T 模板参数转换为 phi::DataType 也传入到GenKey中，因为看到这个调用函数应该是同时针对 fp16和fp32，可能同样的输入M,N.K, 针对fp16和fp32两种类型，最佳计算Implement 也不相同的情况出现.

cutlass提供的fp16和fp32的kernel互不通用，因此这里没有传入phi::DataType，而是用了partial template specialization，fp16和fp32分别在各自的候选kernel中搜索。

JamesLim-sy · 2023-03-07T08:51:24Z

paddle/phi/kernels/sparse/gpu/gather_gemm_scatter.h

 #include "paddle/phi/backends/gpu/gpu_context.h"
 #include "paddle/phi/common/data_type.h"
+#include "paddle/phi/kernels/autotune/auto_tune_base.h"
+#include "paddle/phi/kernels/sparse/gpu/cutlass_generator/build/generated/gemm/all_gemm_operations.h"


all_gemm_operations.h 这个头文件是否忘记在PR中提交上来？

这个头文件是在编译期生成的，见PR50364

JamesLim-sy · 2023-03-14T11:34:59Z

paddle/phi/kernels/autotune/auto_tune_base.h

+  void Run(const phi::GPUContext& ctx,
+           const size_t key,
+           T const alpha,
+           T const beta,


alpha, beta 两个参数的性能，看着也能被 Args... args 存储了.

@JamesLim-sy
GatherGemmScatter会做一次matrix_c += alpha*matrix_a*matrix_b + beta*matrix_c。为避免在PickBestKernel时改变matrix_c的值，需要令alpha=0; beta=1;。因此此处将alpha, betaunpack。

JamesLim-sy

LGTM, good work!

umiswing added 4 commits February 21, 2023 11:44

commit for saving, not work now :(

d037457

finally it pass compilation...

63decbd

Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…

5a1f9a3

… auto_select

change GetKey() to GenKey()

865f12a

umiswing added 6 commits March 1, 2023 07:57

Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…

5084bc6

… auto_select

works for fp16 and fp32 on sm 80.

a21618f

clean the code.

630319d

remove scripts for sm 70

31984a0

remove some comment

cd414d9

remove some unused header.

4f24f11

umiswing requested review from JamesLim-sy and Xreki March 7, 2023 08:32

JamesLim-sy reviewed Mar 7, 2023

View reviewed changes

umiswing changed the title ~~[WIP] Auto tune for cutlass~~ Auto tune for cutlass Mar 7, 2023

umiswing added 3 commits March 9, 2023 12:28

restructure code.

62c8120

restructure more codes.

384c34e

remove some unused codes.

1b8072b

zkh2016 approved these changes Mar 15, 2023

View reviewed changes

JamesLim-sy reviewed Mar 15, 2023

View reviewed changes

JamesLim-sy approved these changes Mar 15, 2023

View reviewed changes

zkh2016 merged commit 12d43da into PaddlePaddle:develop Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto tune for cutlass #50809

Auto tune for cutlass #50809

umiswing commented Feb 23, 2023 •

edited

Loading

paddle-bot bot commented Feb 23, 2023

JamesLim-sy Mar 7, 2023

umiswing Mar 13, 2023

JamesLim-sy Mar 7, 2023

umiswing Mar 13, 2023

JamesLim-sy Mar 7, 2023

umiswing Mar 13, 2023

JamesLim-sy Mar 7, 2023

umiswing Mar 7, 2023 •

edited

Loading

JamesLim-sy Mar 7, 2023

umiswing Mar 7, 2023 •

edited

Loading

JamesLim-sy Mar 7, 2023

umiswing Mar 7, 2023 •

edited

Loading

JamesLim-sy Mar 14, 2023

umiswing Mar 15, 2023

JamesLim-sy left a comment •

edited

Loading

Auto tune for cutlass #50809

Auto tune for cutlass #50809

Conversation

umiswing commented Feb 23, 2023 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umiswing Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umiswing Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umiswing Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesLim-sy left a comment • edited Loading

Choose a reason for hiding this comment

umiswing commented Feb 23, 2023 •

edited

Loading

umiswing Mar 7, 2023 •

edited

Loading

umiswing Mar 7, 2023 •

edited

Loading

umiswing Mar 7, 2023 •

edited

Loading

JamesLim-sy left a comment •

edited

Loading