[dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

XilunWu · 2025-02-27T22:29:14Z

Stack from ghstack (oldest at bottom):

-> [dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

Summary

#146372 changed the op signature of _scaled_dot_product_flash_attention and as a consequence DTensor needs to change its sharding defined at

pytorch/torch/distributed/tensor/_ops/_matrix_ops.py

Line 232 in 40ad5e0

def scaled_dot_product_flash_attention_strategy(

Test

pytest test/distributed/tensor/test_attention.py

Follow-up

It's still unclear why the CP unit tests were not run over the original PR which is BC-breaking.

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @tianyu-l

[ghstack-poisoned]

pytorch-bot · 2025-02-27T22:29:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148125

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5a8391c with merge base 2978771 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 47c426b771a7515a3057e4b1d100dac640933265 Pull Request resolved: #148125

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: 408ec85127af2c09bde0248956fb6bc2456e858b Pull Request resolved: #148125

tianyu-l

lgtm

XilunWu · 2025-02-28T04:27:34Z

@pytorchbot merge

pytorchmergebot · 2025-02-28T04:29:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fegin

Thanks for the fix!

…as been fixed (#912) Stack from [ghstack](/~https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #912 ### Summary This PR undo #898 and re-enables CP tests in CI as pytorch/pytorch#148125 fixed the DTensor sdp flash attention op. ### Test CI

#921) …as been fixed (#912) Stack from [ghstack](/~https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #912 ### Summary This PR undo #898 and re-enables CP tests in CI as pytorch/pytorch#148125 fixed the DTensor sdp flash attention op. ### Test CI Co-authored-by: Xilun Wu <12968408+XilunWu@users.noreply.github.com>

…rch#148125) ### Summary pytorch#146372 changed the op signature of `_scaled_dot_product_flash_attention` and as a consequence DTensor needs to change its sharding defined at /~https://github.com/pytorch/pytorch/blob/40ad5e01dff05c7d64e070fb01683820e678f788/torch/distributed/tensor/_ops/_matrix_ops.py#L232 ### Test `pytest test/distributed/tensor/test_attention.py` ### Follow-up It's still unclear why the CP unit tests were not run over the original PR which is BC-breaking. Pull Request resolved: pytorch#148125 Approved by: /~https://github.com/tianyu-l, /~https://github.com/fegin

[dtensor] fix scaled dot product flash attention sharding

b12a04e

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Feb 27, 2025

XilunWu added a commit that referenced this pull request Feb 27, 2025

[dtensor] fix scaled dot product flash attention sharding

049708e

ghstack-source-id: 47c426b771a7515a3057e4b1d100dac640933265 Pull Request resolved: #148125

XilunWu marked this pull request as draft February 28, 2025 00:00

Update on "[dtensor] fix scaled dot product flash attention sharding"

5a8391c

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

XilunWu added a commit that referenced this pull request Feb 28, 2025

[dtensor] fix scaled dot product flash attention sharding

4439ac0

ghstack-source-id: 408ec85127af2c09bde0248956fb6bc2456e858b Pull Request resolved: #148125

XilunWu added better-engineering Relatively self-contained tasks for better engineering contributors module: dtensor distributed tensor tag module: context parallel PyTorch Context Parallel labels Feb 28, 2025

XilunWu changed the title ~~[dtensor] fix scaled dot product flash attention sharding~~ [dtensor][fix] fix _scaled_dot_product_flash_attention sharding Feb 28, 2025

XilunWu requested review from wanchaol, fegin and tianyu-l February 28, 2025 00:43

XilunWu mentioned this pull request Feb 28, 2025

RuntimeError: Got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators pytorch/torchtitan#875

Closed

XilunWu marked this pull request as ready for review February 28, 2025 00:46

tianyu-l approved these changes Feb 28, 2025

View reviewed changes

XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 28, 2025

XilunWu added the topic: not user facing topic category label Feb 28, 2025

pytorchmergebot added the merging label Feb 28, 2025

fegin approved these changes Feb 28, 2025

View reviewed changes

pytorchmergebot added the Merged label Feb 28, 2025

pytorchmergebot closed this in 4106aa3 Feb 28, 2025

pytorchmergebot removed the merging label Feb 28, 2025

This was referenced Feb 28, 2025

[debug] 'No available kernel' error for cudnn on A100 #148204

Open

re-enable CP tests since DTensor scaled dot product flash attention has been fixed pytorch/torchtitan#912

Merged

fegin mentioned this pull request Mar 3, 2025

re-enable CP tests since DTensor scaled dot product flash attention h… pytorch/torchtitan#921

Merged

XilunWu mentioned this pull request Mar 3, 2025

test index_put #148357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

[dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

XilunWu commented Feb 27, 2025 •

edited

Loading

pytorch-bot bot commented Feb 27, 2025 •

edited

Loading

tianyu-l left a comment

XilunWu commented Feb 28, 2025

pytorchmergebot commented Feb 28, 2025

fegin left a comment

[dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

[dtensor][fix] fix _scaled_dot_product_flash_attention sharding #148125

Conversation

XilunWu commented Feb 27, 2025 • edited Loading

Summary

Test

Follow-up

pytorch-bot bot commented Feb 27, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148125

✅ No Failures

tianyu-l left a comment

Choose a reason for hiding this comment

XilunWu commented Feb 28, 2025

pytorchmergebot commented Feb 28, 2025

Merge started

fegin left a comment

Choose a reason for hiding this comment

XilunWu commented Feb 27, 2025 •

edited

Loading

pytorch-bot bot commented Feb 27, 2025 •

edited

Loading