Skip to content

comparison with qualcomm ai hub model #8194

Answered by cccclai
DongGeun123 asked this question in Q&A
Discussion options

You must be logged in to vote

Yeah we found out the model definition in llama_transformer.py isn't ideal for running llama model on qnn backend. We've started a new model definition in /~https://github.com/pytorch/executorch/tree/e66cdaf514e15242692073db1271aae4657f2033/examples/qualcomm/oss_scripts/llama3_2 which have better latency number

It's still wip and please expect some burden if trying them out, or maybe wait till it's more settled.

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by Jack-Khuu
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/
5 participants
Converted from issue

This discussion was converted from issue #7411 on February 04, 2025 20:28.