Adding initial support for gfx950 #1710

yoichiyoshida · 2025-02-27T01:01:55Z

Add new MFMA instructions for gfx950
Enable F8 OCP for gfx950
Enable sparse gfx950 SMFMA tiles
Enable f8/b8 spmm
Added V_PRNG instruction to CodeGen
Add ds_read_b64_tr_b16 for gfx950
add gfx950 sparse fp16, bf16 and int8
Added Scaled AB GEMMs
added liblogic of gfx950 F8/BF8/F8BF8/BF8F8 input B/H/S output types
Add f8 macro guards for post gsu kernels
Incorporate BF16 CVT into hipBLASLt
Add support of F8/F16/BF16 datatypes for gfx950
Added Hascvtf16_fp8 and Hascvtfp8_f16 asmCaps
Using F8->F16 and F16->F8 CVT instructions instead of round trip via F32 for gfx950

* Add new MFMA instructions for gfx950 * Enable F8 OCP for gfx950 * Enable sparse gfx950 SMFMA tiles * Enable f8/b8 spmm * Added V_PRNG instruction to CodeGen * Add ds_read_b64_tr_b16 for gfx950 * add gfx950 sparse fp16, bf16 and int8 * Added Scaled AB GEMMs * added liblogic of gfx950 F8/BF8/F8BF8/BF8F8 input B/H/S output types * Add f8 macro guards for post gsu kernels * Incorporate BF16 CVT into hipBLASLt * Add support of F8/F16/BF16 datatypes for gfx950 * Added Hascvtf16_fp8 and Hascvtfp8_f16 asmCaps * Using F8->F16 and F16->F8 CVT instructions instead of round trip via F32 for gfx950 Co-authored-by: Majed Sujon <mdmsujon@amd.com> Co-authored-by: Aditya Joshi <Aditya.Joshi@amd.com> Co-authored-by: Zhongze Li <ZHONGZE.LI@amd.com> Co-authored-by: Vin Huang <vin.huang@amd.com> Co-authored-by: Feroz KamalMustafa <Feroz.KamalMustafa@amd.com> Co-authored-by: Babak Poursartip <Babak.Poursartip@amd.com> Co-authored-by: Brian Shi <Brian.Shi@amd.com> Co-authored-by: YangWen Huang <YANGWEN.HUANG@AMD.COM> Co-authored-by: Ethan Lin <Yi-Chen.Lin@amd.com> Co-authored-by: Meng-Zhe Cai <meng-zhe.cai@amd.com>

Undo reduced target list

docs/api-reference.rst

tensilelite/Tensile/SolutionStructs.py

* use state over globalparams for current isa * fix spelling mistake in docs

yoichiyoshida requested review from jichangjichang, KKyang, vin-huang, imcarsonliao, hcman2, Serge45, Jinp800125, TonyYHsieh, solaslin and a team as code owners February 27, 2025 01:01

Update CMakeLists.txt

bcdaa4d

Undo reduced target list

amd-jnovotny reviewed Feb 27, 2025

View reviewed changes

docs/api-reference.rst Outdated Show resolved Hide resolved

vin-huang reviewed Feb 27, 2025

View reviewed changes

tensilelite/Tensile/SolutionStructs.py Outdated Show resolved Hide resolved

vin-huang reviewed Feb 27, 2025

View reviewed changes

tensilelite/Tensile/SolutionStructs.py Outdated Show resolved Hide resolved

yoichiyoshida added 4 commits February 27, 2025 16:36

* removing duplicate test files

ee27933

* use state over globalparams for current isa * fix spelling mistake in docs

fixing gfx950 test files

4ea0949

fixing earlier state change

1940cd2

add missing test skips for incompatible tests

68ee78a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding initial support for gfx950 #1710

Adding initial support for gfx950 #1710

yoichiyoshida commented Feb 27, 2025

Adding initial support for gfx950 #1710

Are you sure you want to change the base?

Adding initial support for gfx950 #1710

Conversation

yoichiyoshida commented Feb 27, 2025