Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

Merged
merged 1 commit into from
Apr 16, 2020

Conversation

blchu
Copy link
Contributor

@blchu blchu commented Apr 16, 2020

(cherry picked from commit afae030)

Description

Fixed issue where fp32 inputs use tensor cores for the interleaved multihead attention operators, resulting in lower precision calculations and potential reduction in accuracy.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Set interleaved multihead attention GEMM default to not use tensor cores, and only use if input data type is fp16
  • No longer checks for tensor input shape divisibility by 8

Comments

@mxnet-bot
Copy link

Hey @blchu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, website, centos-gpu, edge, miscellaneous, unix-gpu, windows-gpu, clang, centos-cpu, unix-cpu, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu leezu merged commit 8cfc64a into apache:v1.x Apr 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants