Skip to content

Commit

Permalink
Fix: Invoke f16f32 in WGMMA
Browse files Browse the repository at this point in the history
  • Loading branch information
ashvardanian committed Feb 10, 2025
1 parent ea4a3e0 commit 4423421
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion less_slow.cu
Original file line number Diff line number Diff line change
Expand Up @@ -603,7 +603,7 @@ __global__ void tops_f16f32_sm90wgmma_64x256x16_loop128_cuda_kernel() {
std::uint64_t b_descriptor = wgmma_descriptor((std::uint64_t)b_shared, 128 * 256 / 8, 128, 0, 0);
wgmma_fence();
for (int i = 0; i != 128; ++i) {
wgmma_bf16f32_64x256x16(c_registers, a_descriptor, b_descriptor);
wgmma_f16f32_64x256x16(c_registers, a_descriptor, b_descriptor);
wgmma_commit_group();
}
wgmma_sync_group();
Expand Down

0 comments on commit 4423421

Please sign in to comment.