Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mkldnn quantized FC is slow #17705

Closed
eric-haibin-lin opened this issue Feb 27, 2020 · 4 comments
Closed

mkldnn quantized FC is slow #17705

eric-haibin-lin opened this issue Feb 27, 2020 · 4 comments

Comments

@eric-haibin-lin
Copy link
Member

quantized BERT model with int8 is 2x slower than float32.

Download trained SST params from: https://dist-bert.s3.amazonaws.com/demo/finetune/sst.params

Clone is install gluon-nlp v0.9

# calibration
KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 OMP_NUM_THREADS=1 numactl --physcpubind=0 --membind=0 python3 finetune_classifier.py --task_name SST --only_calibration --model_parameters  sst.params
# fp32 inference
python3 finetune_classifier.py --task_name SST --epoch 1 --only_inference --model_parameters sst.params --round_to 128 --dev_batch_size 1
# int8 inference
python3 finetune_classifier.py --task_name SST --epoch 1 --only_inference --model_prefix ./output_dir/model_bert_SST_quantized_customize --deploy --round_to 128 --dev_batch_size 1

I'm using c5.12xlarge. I tried to set OMP_NUM_THREAD=8, but int8 is still slower than float32.

@pengzhao-intel
Copy link
Contributor

@wuxun-zhang @ciyongch

@ciyongch
Copy link
Contributor

Thanks for reporting this, I will take a look.

@ciyongch
Copy link
Contributor

@eric-haibin-lin I just created a PR #17707 to address this issue, please take a review.

@pengzhao-intel
Copy link
Contributor

Feel free to reopen if the issue is not resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants