Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Re-enable static cached_op optimization #14931

Merged
merged 1 commit into from
May 14, 2019

Conversation

ZhennanQin
Copy link
Contributor

Description

This PR firstly filed at #14785, but reverted because of dmlc/gluon-nlp#690. Now it's fixed.

The fixed output is,

$ python run_pretraining.py --gpus 0 --batch_size 32 --lr 2e-5 --data 'out/*.npz' --warmup_ratio 0.5 --num_steps 20 --pretrained --log_interval=2 --data_eval 'out/*.npz' --batch_size_eval 8 --ckpt_dir ckpt
INFO:root:Namespace(accumulate=1, batch_size=32, batch_size_eval=8, ckpt_dir='ckpt', ckpt_interval=250000, data='out/*.npz', data_eval='out/*.npz', dataset_name='book_corpus_wiki_en_uncased', dtype='float32', dummy_data_len=None, gpus='0', kvstore='device', log_interval=2, lr=2e-05, model='bert_12_768_12', num_buckets=1, num_steps=20, pretrained=True, profile=None, seed=0, start_step=0, use_avg_len=False, verbose=False, warmup_ratio=0.5)
[12:41:47] src/storage/storage.cc:108: Using GPUPooledRoundedStorageManager.
INFO:root:Using training data at out/*.npz
INFO:root:[step 1]      mlm_loss=1.57931        mlm_acc=46.99793        nsp_loss=0.32225        nsp_acc=84.375  throughput=2.4K tks/s   lr=0.0000020 time=1.34, latency=668.3 ms/batch
INFO:root:[step 3]      mlm_loss=3.27771        mlm_acc=48.94737        nsp_loss=0.47513        nsp_acc=88.333  throughput=6.8K tks/s   lr=0.0000060 time=0.93, latency=466.5 ms/batch
INFO:root:[step 5]      mlm_loss=2.79173        mlm_acc=50.60241        nsp_loss=0.12542        nsp_acc=92.857  throughput=6.2K tks/s   lr=0.0000100 time=0.91, latency=453.5 ms/batch
INFO:root:[step 7]      mlm_loss=2.81245        mlm_acc=52.21421        nsp_loss=0.25357        nsp_acc=92.188  throughput=6.5K tks/s   lr=0.0000140 time=1.00, latency=499.9 ms/batch
INFO:root:[step 9]      mlm_loss=2.09288        mlm_acc=62.17143        nsp_loss=0.01765        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000180 time=0.93, latency=464.5 ms/batch
INFO:root:[step 11]     mlm_loss=1.53132        mlm_acc=71.63121        nsp_loss=0.00595        nsp_acc=100.000 throughput=6.8K tks/s   lr=0.0000090 time=0.98, latency=490.6 ms/batch
INFO:root:[step 13]     mlm_loss=1.04932        mlm_acc=79.85866        nsp_loss=0.00190        nsp_acc=100.000 throughput=5.9K tks/s   lr=0.0000070 time=0.97, latency=484.6 ms/batch
INFO:root:[step 15]     mlm_loss=0.93987        mlm_acc=81.75027        nsp_loss=0.00284        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000050 time=1.00, latency=500.4 ms/batch
INFO:root:[step 17]     mlm_loss=0.68529        mlm_acc=86.50794        nsp_loss=0.00250        nsp_acc=100.000 throughput=6.6K tks/s   lr=0.0000030 time=1.03, latency=516.9 ms/batch
INFO:root:[step 19]     mlm_loss=0.50951        mlm_acc=89.44900        nsp_loss=0.00218        nsp_acc=100.000 throughput=6.2K tks/s   lr=0.0000010 time=0.92, latency=460.7 ms/batch
INFO:root:[step 20] Saving checkpoints to ckpt/0000020.params, ckpt/0000020.states.
INFO:root:Train cost=26.3s
INFO:root:Using evaluation data at out/*.npz
INFO:root:[step 1]      mlm_loss=0.20752        mlm_acc=93.26923        nsp_loss=0.00008        nsp_acc=100.000 throughput=3.0K tks/s   lr=0.0000000 time=0.23, latency=115.1 ms/batch
INFO:root:[step 3]      mlm_loss=0.43117        mlm_acc=91.76955        nsp_loss=0.00139        nsp_acc=100.000 throughput=10.5K tks/s  lr=0.0000000 time=0.15, latency=77.5 ms/batch
INFO:root:[step 5]      mlm_loss=0.39373        mlm_acc=92.27941        nsp_loss=0.00016        nsp_acc=100.000 throughput=12.0K tks/s  lr=0.0000000 time=0.15, latency=76.4 ms/batch
INFO:root:[step 7]      mlm_loss=0.37862        mlm_acc=91.18943        nsp_loss=0.00093        nsp_acc=100.000 throughput=10.4K tks/s  lr=0.0000000 time=0.15, latency=73.4 ms/batch
INFO:root:mlm_loss=0.406        mlm_acc=92.0    nsp_loss=0.001  nsp_acc=100.0
INFO:root:Eval cost=0.7s

@pengzhao-intel @eric-haibin-lin

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@marcoabreu
Copy link
Contributor

Was there a test added in the process of resolving the linked issue?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind elaborating what caused the issue and how you fixed it?

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented May 12, 2019

Was there a test added in the process of resolving the linked issue?

@marcoabreu This is a performance issue in the gluonNLP so any suggestions for adding performance test on CI?

@@ -705,8 +705,10 @@ void CachedOp::StaticRunOps(
arg_shapes.emplace_back(ndinput->shape());
arg_dtypes.emplace_back(ndinput->dtype());
}
state.op_states[i] = createop[node.source->op()](
node.source->attrs, default_ctx, arg_shapes, arg_dtypes);
if (!config_.static_shape) {
Copy link
Contributor Author

@ZhennanQin ZhennanQin May 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin This is the only changed line to fix this PR. We should re-create op state if static_shape is false.

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label May 12, 2019
Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

@eric-haibin-lin @marcoabreu please help take a review :)

@pengzhao-intel
Copy link
Contributor

Will merge this PR at the end of today if there is no further comment.

@pengzhao-intel pengzhao-intel merged commit 1eba37a into apache:master May 14, 2019
@ZhennanQin ZhennanQin deleted the fix_static_cached_op branch May 31, 2019 02:06
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants