-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Enable Hexagon User DMA bypass mode #13381
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I know this one looks like a small change but getting it right was a huge feat 💪. Can't wait to see the performance numbers in-situ. Many many thanks @adstraw! Just a couple small CRs.
// VTCM -> VTCM | ||
// NOTE: Cache bypass is disabled for VTCM -> VTCM transfers | ||
ret = | ||
user_dma->Copy(queue_id, vtcm2hb.GetPointer(), vtcm1hb.GetPointer(), length, DISABLE_BYPASS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the enabled test, should it be ENABLE_BYPASS?
Edit - Ah, I see the comment. But you should be able to say ENABLED, and have it still work. It just won't actually enable anything. Or, should it through if you attempt to set bypass if it is vtcm to vtcm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can "enable" or "disable" bypass and it will still work in all cases. Here is how the code operates:
- Sets bypass ON for VTCM -> DDR or DDR -> VTCM when the user enables bypass
- Sets bypass OFF for VTCM -> VTCM or DDR -> DDR regardless of whether the user enables bypass
Not sure if we should a) assert in case (2) when the user enables bypass or b) just silently turn bypass OFF. Currently I opt for (b) but I am open to feedback.
1b95909
to
bd75db2
Compare
bd75db2
to
21063ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @adstraw!
Enables Hexagon User DMA bypass mode based on user-specified dma_bypass_cache option for DMA copies between DDR and VTCM. The upside of this change is increased DMA bandwidth (up to 40 GBps observed using test_vtcm_bandwidth.py) and compute throughput using a 3-stage pipeline --- cache read, compute, cache write (up to 38 Gops using test_parallel_hvx_load_vtcm.py). The downside of this change is the potential for data coherency issues resulting from the need to manage the cache in software when using DMA bypass hence the user dma_bypass_cache option to enable or disable bypass mode. The strategy to manage the cache in software centers around the requirement for Hexagon to operate on HexagonBuffer objects regardless of scope --- DDR or VTCM. When copying to / from a HexagonBuffer we aggressively invalidate the cache for both the source and destination, both before and after the copy. Also note that the copy is now implemented with memcpy instead of DMA. With the cache clean after copy to / from a HexagonBuffer we can now use DMA bypass mode. However, this software cache management strategy is NOT infallible --- if a HexagonBuffer becomes dirty in the cache prior to a DMA with bypass mode enabled we may see data coherency issues. Also simplifies Hexagon DMA flows by removing the unused mem_copy instrinsic and lowering as well as the hexagon_user_dma_1d_sync helper function which is replaced by calls to HexagonUserDMA::Copy and HexagonUserDMA::Wait. * restore vtcm tests; add TODO for ION buffer; check IsVtcm pointers
Enables Hexagon User DMA bypass mode based on user-specified
dma_bypass_cache
option for DMA copies between DDR and VTCM.The upside of this change is increased DMA bandwidth (up to 40 GBps observed using
test_vtcm_bandwidth.py
) and compute throughput using a 3-stage pipeline --- cache read, compute, cache write (up to 38 Gops usingtest_parallel_hvx_load_vtcm.py
).The downside of this change is the potential for data coherency issues resulting from the need to manage the cache in software when using DMA bypass hence the user
dma_bypass_cache
option to enable or disable bypass mode.The strategy to manage the cache in software centers around the requirement for Hexagon to operate on
HexagonBuffer
objects regardless of scope --- DDR or VTCM. When copying to / from aHexagonBuffer
we aggressively invalidate the cache for both the source and destination, both before and after the copy. Also note that the copy is now implemented withmemcpy
instead of DMA. With the cache clean after copy to / from aHexagonBuffer
we can now use DMA bypass mode. However, this software cache management strategy is NOT infallible --- if aHexagonBuffer
becomes dirty in the cache prior to a DMA with bypass mode enabled we may see data coherency issues.Also simplifies Hexagon DMA flows by removing the unused
mem_copy
instrinsic and lowering as well as thehexagon_user_dma_1d_sync
helper function which is replaced by calls toHexagonUserDMA::Copy
andHexagonUserDMA::Wait
.