You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
quantized BERT model with int8 is 2x slower than float32.
Download trained SST params from: https://dist-bert.s3.amazonaws.com/demo/finetune/sst.params
Clone is install gluon-nlp v0.9
I'm using c5.12xlarge. I tried to set OMP_NUM_THREAD=8, but int8 is still slower than float32.
The text was updated successfully, but these errors were encountered: