-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Add scripts for e2e MetaSchedule tuning demonstration #13135
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these intrins and tests @masahi! It is invaluable to have these early demonstrations of the impact metaschedule has for Hexagon.
print( | ||
"max and mean abs difference with the reference:", | ||
np.max(np.abs(ref_result - hexagon_output)), | ||
np.mean(np.abs(ref_result - hexagon_output)), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there accuracy issues with tuning the fp16 variant that prevent an assert_allclose with the reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is non-trivial accuracy difference between x86 and HVX fp16. Added a sample output and assert check with a fairly loose bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @masahi!
…he#13135) I've worked on a series of PRs to enable e2e MS tuning on Hexagon (mostly for supporting link-params = True in MS). Now that all the pieces have been upstreamed, I'm adding demo tuning scripts under test_hexagon/metaschedule_e2e. They are not run on CI, and running it locally requires PyTorch to generate fp16 and int8 resnet50. The scripts use a small number of tuning trials and replay-trace search strategy instead of the evolutionary search, to finish tuning quickly. Those interested in MS tuning can tweak these settings for better performance at the cost of more tuning times. * change dtype in tvmscript roundtrip test to avoid int printing error * allow printing non int8 array * Revert "change dtype in tvmscript roundtrip test to avoid int printing error" * add loose assert check on fp16 result
…he#13135) I've worked on a series of PRs to enable e2e MS tuning on Hexagon (mostly for supporting link-params = True in MS). Now that all the pieces have been upstreamed, I'm adding demo tuning scripts under test_hexagon/metaschedule_e2e. They are not run on CI, and running it locally requires PyTorch to generate fp16 and int8 resnet50. The scripts use a small number of tuning trials and replay-trace search strategy instead of the evolutionary search, to finish tuning quickly. Those interested in MS tuning can tweak these settings for better performance at the cost of more tuning times. * change dtype in tvmscript roundtrip test to avoid int printing error * allow printing non int8 array * Revert "change dtype in tvmscript roundtrip test to avoid int printing error" * add loose assert check on fp16 result
I've worked on a series of PRs to enable e2e MS tuning on Hexagon (mostly for supporting
link-params = True
in MS). Now that all the pieces have been upstreamed, I'm adding demo tuning scripts undertest_hexagon/metaschedule_e2e
.They are not run on CI, and running it locally requires PyTorch to generate fp16 and int8 resnet50.
The scripts use a small number of tuning trials and
replay-trace
search strategy instead of the evolutionary search, to finish tuning quickly. Those interested in MS tuning can tweak these settings for better performance at the cost of more tuning times.@csullivan @kparzysz-quic @farshidsp