Enable preshuffled mixed dtype Cutlass Gemm #3722

jwfromm · 2025-02-21T20:10:19Z

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm.

Differential Revision: D69955197

facebook-github-bot · 2025-02-21T20:10:28Z

This pull request was exported from Phabricator. Differential Revision: D69955197

netlify · 2025-02-21T20:10:41Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`bbca782`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67b92525c91cb400082f474e
😎 Deploy Preview	https://deploy-preview-3722--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm. Differential Revision: D69955197

facebook-github-bot · 2025-02-21T21:40:55Z

This pull request was exported from Phabricator. Differential Revision: D69955197

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm. While the example compiles and runs, it runs into a variety of problems. The outputs are either completely incorrect, contain NaNs, or the kernel hits an Illegal Memory Access. I'm not yet sure why. Differential Revision: D69955197

facebook-github-bot · 2025-02-21T23:55:13Z

This pull request was exported from Phabricator. Differential Revision: D69955197

jwfromm · 2025-02-21T23:56:19Z

@IwakuraRein Despite this compiling and running, I'm getting incorrect outputs and very poor performance (even slower than the legacy f8i4 without packing or shuffling). Can you take a look and see if I'm doing something obviously wrong?

Ignore files besides f8i4_shuffled.cu and mixed_dtype_utils.cu as the others just fix cutlass v3.8 compatibility.

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm. While the example compiles and runs, it runs into a variety of problems. The outputs are either completely incorrect, contain NaNs, or the kernel hits an Illegal Memory Access. I'm not yet sure why. Differential Revision: D69955197

facebook-github-bot · 2025-02-22T00:52:28Z

This pull request was exported from Phabricator. Differential Revision: D69955197

Differential Revision: D69890673

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm. While the example compiles and runs, it runs into a variety of problems. The outputs are either completely incorrect, contain NaNs, or the kernel hits an Illegal Memory Access. I'm not yet sure why. Differential Revision: D69955197

facebook-github-bot · 2025-02-22T01:15:22Z

This pull request was exported from Phabricator. Differential Revision: D69955197

IwakuraRein · 2025-02-25T19:46:49Z

@jwfromm Are there negative values in the scale factors? This might be the reason for the accuracy drop after enabling lookup table, and can be easily fixed by applying this change to external/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp in your fork.

jwfromm · 2025-02-25T21:22:08Z

@IwakuraRein The scales are all positive and I'm running with the latest cutlass head commit (as of yesterday). The link you posted doesnt seem to include any changes to sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp, did you mean to paste a differint one?

IwakuraRein · 2025-02-26T00:35:24Z

@jwfromm Sorry I mean the changes in include/cutlass/detail/collective/mixed_input_utils.hpp in that link. But since your scales are all positive and I'm running with the latest cutlass then I guess this is not the issue.

IwakuraRein · 2025-03-03T18:19:45Z

fbgemm_gpu/experimental/gen_ai/bench/quantize_ops.py:1145:

-     scales = scales.view(x.shape[0], -1)
+     scales = scales.view(x.shape[0], -1).t().contiguous()

fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu:59:

- StrideB stride_B;
+ StrideB stride_B = cutlass::make_cute_packed_stride(StrideB{}, shape_B);

These should fix the bugs.

facebook-github-bot added the cla signed label Feb 21, 2025

facebook-github-bot added the fb-exported label Feb 21, 2025

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Feb 21, 2025

Enable preshuffled mixed dtype Cutlass Gemm (pytorch#3722)

24a824b

Summary: WIP to enable new optimized preshuffled fp8xint4 gemm. Differential Revision: D69955197

jwfromm force-pushed the export-D69955197 branch from 254c644 to 24a824b Compare February 21, 2025 21:40

jwfromm force-pushed the export-D69955197 branch from 24a824b to dc741e7 Compare February 21, 2025 23:55

jwfromm force-pushed the export-D69955197 branch from dc741e7 to 70477ce Compare February 22, 2025 00:52

jwfromm added 2 commits February 21, 2025 17:14

Update Cutlass to V3.8-2

3209bb4

Differential Revision: D69890673

jwfromm force-pushed the export-D69955197 branch from 70477ce to bbca782 Compare February 22, 2025 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable preshuffled mixed dtype Cutlass Gemm #3722

Enable preshuffled mixed dtype Cutlass Gemm #3722

jwfromm commented Feb 21, 2025

facebook-github-bot commented Feb 21, 2025

netlify bot commented Feb 21, 2025 •

edited

Loading

facebook-github-bot commented Feb 21, 2025

facebook-github-bot commented Feb 21, 2025

jwfromm commented Feb 21, 2025 •

edited

Loading

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

IwakuraRein commented Feb 25, 2025

jwfromm commented Feb 25, 2025 •

edited

Loading

IwakuraRein commented Feb 26, 2025

IwakuraRein commented Mar 3, 2025 •

edited

Loading

Enable preshuffled mixed dtype Cutlass Gemm #3722

Are you sure you want to change the base?

Enable preshuffled mixed dtype Cutlass Gemm #3722

Conversation

jwfromm commented Feb 21, 2025

facebook-github-bot commented Feb 21, 2025

netlify bot commented Feb 21, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Feb 21, 2025

facebook-github-bot commented Feb 21, 2025

jwfromm commented Feb 21, 2025 • edited Loading

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

IwakuraRein commented Feb 25, 2025

jwfromm commented Feb 25, 2025 • edited Loading

IwakuraRein commented Feb 26, 2025

IwakuraRein commented Mar 3, 2025 • edited Loading

netlify bot commented Feb 21, 2025 •

edited

Loading

jwfromm commented Feb 21, 2025 •

edited

Loading

jwfromm commented Feb 25, 2025 •

edited

Loading

IwakuraRein commented Mar 3, 2025 •

edited

Loading