Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Phi] Support cudnn kernel moving & move softmax kernels #39547

Merged

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Feb 14, 2022

PR types

New features

PR changes

Others

Describe

[Phi] Support cudnn kernel moving

支持phi注册GPUDNN的kernel,迁移softmax op kernel作为验证

由于并没有单独的GPUDNNContext,这给目前的体系引入了一些问题,当我们using CUDNNContext = GPUContext时,如果gpu和cudnn的kernel函数命名是一样的,会导致实例化冲突,例如:

SoftmaxKernel<float, GPUContext>SoftmaxKernel<float, GPUDNNContext>实质上是一个实例化,但它们的实现是不一样的,这导致编译时,GPUDNN的kernel实例化被覆盖了,实际上GPUDNN注册的是GPU的kernel,在运行时引入了计算结果错误。

由于目前GPUDNNContext的设计并不明确,和现在的GPUContext本质上就是一样的,所以暂时通过更改函数名解决冲突,GPUDNN的kernel命名为SoftmaxGPUDNNKernel。

但这样引入的问题是,通过SoftmaxKernel<T, Context>将无法调用到SoftmaxGPUDNNKernel,gpudnn的kernel自成一个小体系,但影响不大,cudnn kernel的数目极少(个位数)。这里后续需要确认下,SoftmaxKernel原先的cuda kernel是否还有必要保留,如果没有必要的话,将其删除,并将cudnn kernel移到gpu下会合适一些。

TODO:

  1. 迁移时发现math/softmax.*和gpu_dnn相关组件也需要由fluid迁移到phi,后续PR进行
  2. 暂时为gpudnn目录下的kernel文件增加了gpudnn的后缀,不加后缀的话,inference CI单测会因为文件重名出现符号丢失问题,本地暂未复现,为不阻塞后续conv cuddn的迁移工作,该后缀我会在后续PR继续分析去除
  3. 移除fluid下的paddle/fluid/operators/amp/fp16_type_traits.h

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@chenwhql chenwhql changed the title [PTen] Support cudnn kernel moving [PTen] Support cudnn kernel moving & move softmax kernels Feb 17, 2022
@chenwhql chenwhql changed the title [PTen] Support cudnn kernel moving & move softmax kernels [Phi] Support cudnn kernel moving & move softmax kernels Feb 23, 2022
zyfncg
zyfncg previously approved these changes Feb 24, 2022
ZzSean
ZzSean previously approved these changes Feb 24, 2022
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark

public:
using Type = float;
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里和 paddle/fluid/operators/amp/fp16_type_traits.h 重复了,后边可以考虑合并。

Copy link
Contributor Author

@chenwhql chenwhql Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,这是移过来的,后续不推荐使用fluid下的paddle/fluid/operators/amp/fp16_type_traits.h,可以删除这个文件

@chenwhql chenwhql dismissed stale reviews from ZzSean and zyfncg via b99eadd February 24, 2022 13:56
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@MingMingShangTian MingMingShangTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark

@chenwhql chenwhql merged commit 8895379 into PaddlePaddle:develop Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants