Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Performance decrease with pypi's mxnet==1.4.0 (Mac) #14563

Closed
fhieber opened this issue Mar 29, 2019 · 6 comments
Closed

Performance decrease with pypi's mxnet==1.4.0 (Mac) #14563

fhieber opened this issue Mar 29, 2019 · 6 comments

Comments

@fhieber
Copy link
Contributor

fhieber commented Mar 29, 2019

I am experiencing significant slowdowns with the latest pypi release of mxnet==1.4.0 on MacOs when running transformer training with Sockeye:

mxnet==1.3.1
[INFO:sockeye.training] Checkpoint [1]	Updates=1000 Epoch=1 Samples=16000 Time-cost=13.735 Updates/sec=72.805

Mxnet==1.4.0
[INFO:sockeye.training] Checkpoint [1]	Updates=1000 Epoch=1 Samples=16000 Time-cost=20.084 Updates/sec=49.791

mxnet-mkl==1.3.1
[INFO:sockeye.training] Checkpoint [1]	Updates=1000 Epoch=1 Samples=16000 Time-cost=15.200 Updates/sec=65.791

mxnet-mkl==1.4.0
[INFO:sockeye.training] Checkpoint [1]	Updates=1000 Epoch=1 Samples=16000 Time-cost=23.078 Updates/sec=43.331

Interestingly, this difference in performance does not exist on a Linux machine (with or without GPU). That is, I don't see any difference in speed between Linux' mxnet-cu92==1.3.1 and mxnet-cu92==1.4.0.post0, or mxnet==1.3.1 and mxnet==1.4.0.post0:

Note that the Linux packages all install a 1.4.0.post0 version that doesnt' seem to exist for MacOs. Is there a reason for that?

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

@apeforest
Copy link
Contributor

@fhieber Can you post a script to reproduce this? I am investigating one of the performance issue #14496 introduced in 1.4.0. They could be due to the same rootcause. Thanks!

@vdantu
Copy link
Contributor

vdantu commented Mar 30, 2019

@mxnet-label-bot add [performance, question, pending requester info]

@fhieber
Copy link
Contributor Author

fhieber commented Mar 31, 2019

@apeforest thanks for investigating! I cannot provide a small script at the moment that reproduces the issue, I just observed the speed difference running various system tests with Sockeye. If you want to run them for yourself you can do the following:

git clone /~https://github.com/awslabs/sockeye.git
cd sockeye
pip install -r requirements/requirements.dev.txt
pip install -r requirements/requirements.txt
pytest test/system -k "Sort:transformer:transformer"

The speed difference will be visible directly from the first log messages of the form Speed: 1004.51 samples/sec 10366.54 tokens/sec 62.78 updates/sec.

The difference in performance with 1.3.1 and 1.4.0 is visible for system tests with different architectures implemented in Sockeye (rnn/lstm, transformer, cnns).
All of these implementations use a transpose() operator somewhere, if that helps.

@piyushghai
Copy link
Contributor

Removing the Question label from this. This is an actual performance regression.

@mxnet-label-bot Update [Pending Requester Info, Performance]

@vdantu
Copy link
Contributor

vdantu commented Jun 30, 2019

@fhieber : I see this issue persisted with 1.4.0 and 1.4.1. But, 1.5.0 (pre release) seems to have similar performance to 1.3.1

1.5.0 (pre-release)

[INFO:sockeye.training] Epoch[0] Batch [600]	Speed: 1349.35 samples/sec 13709.44 tokens/sec 84.33 updates/sec	perplexity=1.420409
[INFO:sockeye.training] Epoch[1] Batch [650]	Speed: 1306.41 samples/sec 13534.45 tokens/sec 81.65 updates/sec	perplexity=1.390872
[INFO:sockeye.training] Epoch[1] Batch [700]	Speed: 1287.37 samples/sec 12976.67 tokens/sec 80.46 updates/sec	perplexity=1.368012

1.3.1

...
[INFO:sockeye.training] Epoch[1] Batch [1050]	Speed: 1396.45 samples/sec 14355.49 tokens/sec 87.28 updates/sec	perplexity=1.031028
[INFO:sockeye.training] Epoch[1] Batch [1100]	Speed: 1576.58 samples/sec 16207.22 tokens/sec 98.54 updates/sec	perplexity=1.022231
[INFO:sockeye.training] Epoch[1] Batch [1150]	Speed: 1556.92 samples/sec 15880.62 tokens/sec 97.31 updates/sec	perplexity=1.026734
[INFO:sockeye.training] Epoch[1] Batch [1200]	Speed: 1563.64 samples/sec 15824.06 tokens/sec 97.73 updates/sec	perplexity=1.038476
...

1.4.1

...
[INFO:sockeye.training] Epoch[0] Batch [300]	Speed: 959.48 samples/sec 9863.49 tokens/sec 59.97 updates/sec	perplexity=1.883570
[INFO:sockeye.training] Epoch[0] Batch [350]	Speed: 950.11 samples/sec 9691.12 tokens/sec 59.38 updates/sec	perplexity=1.747596
[INFO:sockeye.training] Epoch[0] Batch [400]	Speed: 937.03 samples/sec 9670.12 tokens/sec 58.56 updates/sec	perplexity=1.647115
[INFO:sockeye.training] Epoch[0] Batch [450]	Speed: 958.23 samples/sec 9735.66 tokens/sec 59.89 updates/sec	perplexity=1.575914
[INFO:sockeye.training] Epoch[0] Batch [500]	Speed: 918.91 samples/sec 9446.44 tokens/sec 57.43 updates/sec	perplexity=1.512286
...

We can probably close this issue after the release of 1.5.0.

@fhieber fhieber closed this as completed Nov 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants