better handling of tensorwise float8 recipe in configuration #901

vkuzo · 2025-02-27T21:06:24Z

Bug description

We need a follow-up on #808 . If --float8.recipe_name tensorwise is specified, we should handle the FSDP float8 all-gather, scale precompute, etc arguments properly instead of asserting that they aren't supported.

Versions

main branch

The text was updated successfully, but these errors were encountered:

tianyu-l · 2025-02-27T21:08:29Z

Thanks for filing this issue.
That sounds cleaner to me! And we can have recipe_name default to “tensorwise”.

vkuzo self-assigned this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better handling of tensorwise float8 recipe in configuration #901

better handling of tensorwise float8 recipe in configuration #901

vkuzo commented Feb 27, 2025

tianyu-l commented Feb 27, 2025

better handling of tensorwise float8 recipe in configuration #901

better handling of tensorwise float8 recipe in configuration #901

Comments

vkuzo commented Feb 27, 2025

Bug description

Versions

tianyu-l commented Feb 27, 2025