Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Enable Hexagon User DMA bypass mode #13381

Merged
merged 5 commits into from
Nov 16, 2022

Conversation

adstraw
Copy link
Contributor

@adstraw adstraw commented Nov 14, 2022

Enables Hexagon User DMA bypass mode based on user-specified dma_bypass_cache option for DMA copies between DDR and VTCM.

The upside of this change is increased DMA bandwidth (up to 40 GBps observed using test_vtcm_bandwidth.py) and compute throughput using a 3-stage pipeline --- cache read, compute, cache write (up to 38 Gops using test_parallel_hvx_load_vtcm.py).

The downside of this change is the potential for data coherency issues resulting from the need to manage the cache in software when using DMA bypass hence the user dma_bypass_cache option to enable or disable bypass mode.

The strategy to manage the cache in software centers around the requirement for Hexagon to operate on HexagonBuffer objects regardless of scope --- DDR or VTCM. When copying to / from a HexagonBuffer we aggressively invalidate the cache for both the source and destination, both before and after the copy. Also note that the copy is now implemented with memcpy instead of DMA. With the cache clean after copy to / from a HexagonBuffer we can now use DMA bypass mode. However, this software cache management strategy is NOT infallible --- if a HexagonBuffer becomes dirty in the cache prior to a DMA with bypass mode enabled we may see data coherency issues.

Also simplifies Hexagon DMA flows by removing the unused mem_copy instrinsic and lowering as well as the hexagon_user_dma_1d_sync helper function which is replaced by calls to HexagonUserDMA::Copy and HexagonUserDMA::Wait.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 14, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 I know this one looks like a small change but getting it right was a huge feat 💪. Can't wait to see the performance numbers in-situ. Many many thanks @adstraw! Just a couple small CRs.

// VTCM -> VTCM
// NOTE: Cache bypass is disabled for VTCM -> VTCM transfers
ret =
user_dma->Copy(queue_id, vtcm2hb.GetPointer(), vtcm1hb.GetPointer(), length, DISABLE_BYPASS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the enabled test, should it be ENABLE_BYPASS?

Edit - Ah, I see the comment. But you should be able to say ENABLED, and have it still work. It just won't actually enable anything. Or, should it through if you attempt to set bypass if it is vtcm to vtcm?

Copy link
Contributor Author

@adstraw adstraw Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can "enable" or "disable" bypass and it will still work in all cases. Here is how the code operates:

  1. Sets bypass ON for VTCM -> DDR or DDR -> VTCM when the user enables bypass
  2. Sets bypass OFF for VTCM -> VTCM or DDR -> DDR regardless of whether the user enables bypass

Not sure if we should a) assert in case (2) when the user enables bypass or b) just silently turn bypass OFF. Currently I opt for (b) but I am open to feedback.

@adstraw adstraw force-pushed the straw-hex-dma-bypass branch from bd75db2 to 21063ff Compare November 15, 2022 22:27
Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you @adstraw!

@csullivan csullivan merged commit 14342a3 into apache:main Nov 16, 2022
@adstraw adstraw deleted the straw-hex-dma-bypass branch November 17, 2022 00:33
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
Enables Hexagon User DMA bypass mode based on user-specified dma_bypass_cache option for DMA copies between DDR and VTCM.

The upside of this change is increased DMA bandwidth (up to 40 GBps observed using test_vtcm_bandwidth.py) and compute throughput using a 3-stage pipeline --- cache read, compute, cache write (up to 38 Gops using test_parallel_hvx_load_vtcm.py).

The downside of this change is the potential for data coherency issues resulting from the need to manage the cache in software when using DMA bypass hence the user dma_bypass_cache option to enable or disable bypass mode.

The strategy to manage the cache in software centers around the requirement for Hexagon to operate on HexagonBuffer objects regardless of scope --- DDR or VTCM. When copying to / from a HexagonBuffer we aggressively invalidate the cache for both the source and destination, both before and after the copy. Also note that the copy is now implemented with memcpy instead of DMA. With the cache clean after copy to / from a HexagonBuffer we can now use DMA bypass mode. However, this software cache management strategy is NOT infallible --- if a HexagonBuffer becomes dirty in the cache prior to a DMA with bypass mode enabled we may see data coherency issues.

Also simplifies Hexagon DMA flows by removing the unused mem_copy instrinsic and lowering as well as the hexagon_user_dma_1d_sync helper function which is replaced by calls to HexagonUserDMA::Copy and HexagonUserDMA::Wait.

* restore vtcm tests; add TODO for ION buffer; check IsVtcm pointers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants