Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) #13668

Merged
merged 23 commits into from
Mar 17, 2019

Conversation

TaoLv
Copy link
Member

@TaoLv TaoLv commented Dec 18, 2018

Description

Issue: #13451

Since the case provided in the issue needs huge memory, I'm not sure if it's appropriate to put it into unit test.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@Roshrini
Copy link
Member

@TaoLv Can you please check the failing test? @mseth10 Can you help review this PR?
@mxnet-label-bot Add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Dec 19, 2018
@TaoLv TaoLv requested a review from anirudh2290 as a code owner December 20, 2018 13:29
@TaoLv
Copy link
Member Author

TaoLv commented Jan 4, 2019

@zheng-da Please try this update to see if it can fix your issue. Thank you.

@stu1130
Copy link
Contributor

stu1130 commented Jan 16, 2019

@zheng-da ping for the update! Thanks

@sandeep-krishnamurthy
Copy link
Contributor

@azai91 - Can you please help review this PR?

@pengzhao-intel
Copy link
Contributor

We will update the latest MKLDNN CI before the merge.

@pengzhao-intel
Copy link
Contributor

@zheng-da please help verify the fix.

@ankkhedia
Copy link
Contributor

@zheng-da Ping for review !

@TaoLv
Copy link
Member Author

TaoLv commented Feb 17, 2019

MKL-DNN 0.18 is coming soon. We plan to upgrade MKL-DNN dependency to 0.18 and full validation is WIP. Please hold on to the code review and merge. Thank you.

@TaoLv TaoLv changed the title Update MKL-DNN to fix the Dense layer issue [WIP] Update MKL-DNN to fix the Dense layer issue Feb 17, 2019
@pengzhao-intel
Copy link
Contributor

@TaoLv Is there the test case to catch this issue?

@TaoLv
Copy link
Member Author

TaoLv commented Feb 17, 2019

There is one in the original issue: #13451. But since it needs a big NDArray (~20GB memory) so I'm not sure if we can put it into CI unit test. I'm considering to add it into nightly: /~https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/test_large_array.py

@pengzhao-intel
Copy link
Contributor

yes, it's a big concern for CI. @zheng-da
@apeforest do you have any plan to test this kind of functionality in your proposal?

@TaoLv TaoLv requested a review from szha as a code owner March 2, 2019 04:02
@TaoLv TaoLv changed the title [WIP] Update MKL-DNN to fix the Dense layer issue [WIP] Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) Mar 2, 2019
cmake/DownloadMKLML.cmake Outdated Show resolved Hide resolved
@pengzhao-intel
Copy link
Contributor

#14052

@TaoLv the 0.18 will fix this issue, please double check :)

@TaoLv TaoLv changed the title [WIP] Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) Mar 13, 2019
@TaoLv
Copy link
Member Author

TaoLv commented Mar 13, 2019

CI is finally passed. This PR is ready for review now. @pengzhao-intel @szha

@pengzhao-intel
Copy link
Contributor

@TaoLv
Copy link
Member Author

TaoLv commented Mar 13, 2019

incubator-mxnet/tests/python/unittest/test_operator.py

Line 1634 in 6aa8c27

@unittest.skip("Flaky test #14052")
Please add this test back, suppose this can be fixed.

I tried the test case and it passed on my local machine with this PR. But since it's flaky, I will submit another PR to re-enable it after this PR is merged. It deserves several times of CI run to make sure it's really fixed.

@@ -255,6 +255,7 @@ if(USE_MKLDNN)
add_subdirectory(3rdparty/mkldnn)

include_directories(3rdparty/mkldnn/include)
include_directories(${PROJECT_BINARY_DIR}/3rdparty/mkldnn/include)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this folder, there is a new header file of MKL-DNN which is automatically generated during compilation. I doubt it would be used by Horovod and should be distributed with pip package. @szha @yuxihu

@TaoLv
Copy link
Member Author

TaoLv commented Mar 15, 2019

Ping for review, @szha , @zheng-da , @yuxihu , @pengzhao-intel .

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please switch to release version before merging.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! LGTM

mkldnn.mk Show resolved Hide resolved
@pengzhao-intel
Copy link
Contributor

It's great that all CI passed.
@wkcn @szha please help confirm all concerns are resolved.

We will merge the PR in 24 hours in case there's no further concern.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for Your contribution! LGTM

@pengzhao-intel
Copy link
Contributor

Merging now :) Thanks all of the efforts from Tao and reviewers.

@pengzhao-intel pengzhao-intel merged commit c2f939f into apache:master Mar 17, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
…che#13668)

* update MKL-DNN to fix Dense layer issue(apache#13451)

* fix format numbers

* update mkldnn

* update mkldnn to v0.18 release

* update mklml version & fix mkldnn build

* fix typo

* fix cmake

* fix cmake

* skip test_quantized_conv

* fix quantized_conv and cmake

* Revert "fix cmake"

This reverts commit a9b3886.

* update mkldnn submodule

* update mkldnn head to 0.18.1 patch release

* empty commit to retrigger ci

* retrigger ci
nswamy pushed a commit that referenced this pull request Apr 5, 2019
)

* update MKL-DNN to fix Dense layer issue(#13451)

* fix format numbers

* update mkldnn

* update mkldnn to v0.18 release

* update mklml version & fix mkldnn build

* fix typo

* fix cmake

* fix cmake

* skip test_quantized_conv

* fix quantized_conv and cmake

* Revert "fix cmake"

This reverts commit a9b3886.

* update mkldnn submodule

* update mkldnn head to 0.18.1 patch release

* empty commit to retrigger ci

* retrigger ci
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
…che#13668)

* update MKL-DNN to fix Dense layer issue(apache#13451)

* fix format numbers

* update mkldnn

* update mkldnn to v0.18 release

* update mklml version & fix mkldnn build

* fix typo

* fix cmake

* fix cmake

* skip test_quantized_conv

* fix quantized_conv and cmake

* Revert "fix cmake"

This reverts commit a9b3886.

* update mkldnn submodule

* update mkldnn head to 0.18.1 patch release

* empty commit to retrigger ci

* retrigger ci
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MKLDNN Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants