Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Remove conflicting llvm OpenMP from cmake builds #12160

Closed
wants to merge 2 commits into from
Closed

Conversation

lebeg
Copy link
Contributor

@lebeg lebeg commented Aug 14, 2018

Description

Using a different OpenMP library from the one supplied by the compiler can lead to undefined behaviour. Multiple problems were reported by users of MXNet.

If there is need to use OpenMP library provided by llvm, one should compile MXNet with latest Clang, which is now supported as a test stage by the CI.

Checklist

Essentials

  • This is a minor change
  • Changes are complete

Changes

  • Removed OpenMP overwrite from CMakeLists.txt
  • Removed otherwise not used OpenMP submodule

Comments

Fixes:

@szha
Copy link
Member

szha commented Aug 14, 2018

you may want to check this on mac, where the "special" clang doesn't provide -fopenmp option

@lebeg
Copy link
Contributor Author

lebeg commented Aug 15, 2018

@szha yes, but for AppleClang we are failing on this line already, so no difference made in this case.

@lebeg lebeg force-pushed the omp branch 2 times, most recently from df646c3 to cc068d6 Compare August 17, 2018 16:12
@lebeg lebeg requested a review from marcoabreu as a code owner August 17, 2018 16:12
@vandanavk
Copy link
Contributor

Is this PR good to go? #10951 is waiting on this

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Aug 22, 2018

LGTM, might not work on older versions of MacOS as mentioned, but I think this is partially ok given it's broken already and that soon the current release, and current release -1 should both support openmp from clang.

Orthogonal to this PR, but @szha @lebeg: Wouldn't we expect the CMake behaviour would detect if you have OpenMP (or don't have it) and take appropriate actions without an error? Why do we even need to specify OPEN_MP=0/1? If we were setting this automatically we would not run into different versions of MacOS building/failing right? What do you think?

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

@KellenSunderland well the option USE_OPENMP i think is good to control explicit need for OpenMP (REQUIRED in cmake's find_package) or disable it if needed for testing for example. We could potentially have a string option with values like Auto, Require, Disable, but that would be a separate slightly bigger change.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always compile with openmp=off in mac anyway.

@KellenSunderland
Copy link
Contributor

@lebeg Yes different PR for sure. In the future it'd be great if we had 'auto', 'on', and 'off, and the default was auto.

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

@larroy Actually, if you install clang via brew (brew install llvm) then OpenMP would work.

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

And it would use the OpenMP lib that we had in the submodule 😁

@KellenSunderland
Copy link
Contributor

FYI here's what the deps look like before / after this patch is applied. Clearly moving from linking solely to the 3rdparty llvm openmp to the gomp version. https://gist.github.com/KellenSunderland/3e2852d929c444f6ba36c3c9b46194b8

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to check if this resolves #10856 Anton?

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

Yes, this certainly fixes #10856 since it's kmp API (thread affinity) failure available only in llvm omp.

@@ -370,32 +370,12 @@ endif()
# ---[ OpenMP
if(USE_OPENMP)
find_package(OpenMP REQUIRED)
# This should build on Windows, but there's some problem and I don't have a Windows box, so
# could a Windows user please fix?
if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you high? Why would you remove this?

@cjolivier01
Copy link
Member

cjolivier01 commented Aug 22, 2018

"Using a different OpenMP library from the one supplied by the compiler has side effects." <-- that doesn't really give any useful information

Using the llvm OpenMP gives a significant performance improvement over gcc's libgomp. It's loaded anyway for many builds such as mkl. It should not be removed unless there is a VERY good reason to, and no other reasonable solution around the problem exists.

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

Thanks for your review @cjolivier01 :)

Actually for MKL builds another OpenMP library is provided (from Intel), specifically for MKLML. And it is stated in MKLDNN README that:

Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application.

In MXNet exactly this happens but with llvm openmp.

@lebeg
Copy link
Contributor Author

lebeg commented Aug 22, 2018

I consider improving stability and lowering complexity of the build enough good reasons to do this change.

@cjolivier01
Copy link
Member

The build replaces libomp5 with this libomp when building with MKL, which is legal and essentially the same thing. I know of no stability changes caused by using IntelOMP (you'd have top call ALL MKL builds unstable in this case)

@cjolivier01
Copy link
Member

btw, I did a lot of work with mxnet and cython ( /~https://github.com/apache/incubator-mxnet/tree/cython ), and never had issues with this openmp library (always build with cmake). More likely the problems you see are symptoms of something else that can be adjusted. This OpenMP library has 66% less overhead than libgomp when running parallel OMP regions, which adds up to a lot of time quickly.

@larroy
Copy link
Contributor

larroy commented Aug 22, 2018

@cjolivier01 are these numbers available or is there some script / instructions on how to reproduce the benchmarks / how are they conducted?

@vandanavk
Copy link
Contributor

Will changes be made to this PR based on review comments?

@lebeg
Copy link
Contributor Author

lebeg commented Aug 27, 2018

I am about to get some performance numbers on how MXNet performs without the included as submodule OpenMP. I do not think there is any change I can make based on the reviewing comments.

@lupesko
Copy link
Contributor

lupesko commented Aug 28, 2018

+1 to @cjolivier01 ask to update the issue description to be more informative, and would be great to see perf data.
Thanks for the contribution!

@lebeg
Copy link
Contributor Author

lebeg commented Aug 28, 2018

I have updated the description.

@lebeg
Copy link
Contributor Author

lebeg commented Nov 22, 2018

I cross post the things I already sent to the @dev list:

The issue was closed, but I am strong in my opinion that it's the right thing to do.

Background

If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler:

Each of the compilers would enable the #pragma omp directive during C/C++ compilation and arrange for automatic linking of the OpenMP runtime library supplied by each compiler separately.

Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used (here and here) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." link

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." link

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
    libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f697bc55000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f69660cd000)

Performance

The only performance data related to OpenMP in MXNet I was able to find is here:
#9744 (comment)

Which in my understanding is testing impact of different environment variables for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity Interface may have significant impact on performance.

All compilers support it:

I see no reason why this submodule should be kept.

@pengzhao-intel
Copy link
Contributor

Thanks @lebeg it's a long thread and I need to go through the change and issue first.
Will back to you soon.

@pengzhao-intel
Copy link
Contributor

Also @xinyu-intel @ZhennanQin they have the experience of different platform, MacOS, Windows and compiler-level knowledge.

@lebeg
Copy link
Contributor Author

lebeg commented Nov 27, 2018

Link to the dev list discussion.

@larroy
Copy link
Contributor

larroy commented May 20, 2019

Please reopen this PR given the data collected in the wiki:

https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations

@ZhennanQin
Copy link
Contributor

@larroy Thanks for continuously watching this issue. My previous compiler developer experience tells me, build project with certain version of certain compiler component is not a good idea. Because compiler may have assumption about the version of omp be used. We should let toolchain to make the decision, this can avoid many unexpected behavior.
So, reopen this PR +1.

@stsukrov
Copy link

Yup.
Let's do it properly.

@larroy
Copy link
Contributor

larroy commented May 22, 2019

@stsukrov can you ping @lebeg ?

@lebeg
Copy link
Contributor Author

lebeg commented May 22, 2019

Guys, I can not reopen this PR - it was closed by a committer and I assume it takes one to open it again. The branch is still there, I can make any necessary changes if needed after it will be reopened.

@cjolivier01
Copy link
Member

Is all this still related to an assertion? Why does it assert?

You should also turn off operators tuning before running performance tests and include mnist.

Copy link
Member

@cjolivier01 cjolivier01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does assertion hit?
also above comment

@larroy
Copy link
Contributor

larroy commented May 22, 2019

This is not (only) about the assertion. Unit tests hang when compiling with OpenMP in Linux. I always compile without openmp for this reason.

There are several related issues that we hope that this PR will make go away:

#14999

@cjolivier01 is suggesting to run the tests with MXNET_USE_OPERATOR_TUNING=0 so OMP is always used for operators, but is this a useful scenario? If with the provided openmp and tuning we get similar perfomance why does it matter if some operators are tuned or not? Please elaborate on why are you requesting additional work and benchmarks and not just ask for open ended work with might discourage this contribution which was opened more than half a year ago. I suggest to provide a path to conclusion when asking for additional work, explain the reasons and limit the scope.

I think running a couple of end to end models like mnist or resnet is a valid request. There's also the internal benchmarks which could give us data on possible regressions so the PR could be reverted if we observe regressions after a couple of days. This is not a one way door type of decision.

@larroy
Copy link
Contributor

larroy commented May 22, 2019

I spent some time debugging this assertion, it was long ago, but if I remember correctly is related to the problematic use of pthread_atfork in /~https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L76
Which causes openmp to be reinitialized and the assertion to be triggered.
I'm not sure what you want to do with this information or what you suggest to do with that problem and that problematic piece of code which caused hard bugs in the past, but I would say it's out of the scope of this PR.

Copy link

@stsukrov stsukrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have multiple issues caused by tweaking with OpenMP.
For real-life payloads any potential overhead was proven to be insignificant.

@piyushghai
Copy link
Contributor

@lebeg Can you rebase with the latest master branch ?

@larroy
Copy link
Contributor

larroy commented Jun 14, 2019

Mnist results:

This PR:

(py3_venv) piotr@ec2 cpu:0: ~/m/e/module [omp]> /usr/bin/time python mnist_mlp.py
[23:07:54] ../src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[23:07:54] ../src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
epoch 000: accuracy=0.123883
epoch 001: accuracy=0.679983
INFO:root:Epoch[0] Train-accuracy=0.133283
INFO:root:Epoch[0] Time cost=1.047
INFO:root:Epoch[0] Validation-accuracy=0.252500
INFO:root:Epoch[1] Train-accuracy=0.725100
INFO:root:Epoch[1] Time cost=1.017
INFO:root:Epoch[1] Validation-accuracy=0.866900
batch 000 acc: 0.910
batch 020 acc: 0.890
batch 040 acc: 0.870
batch 060 acc: 0.910
batch 080 acc: 0.840
validation Accuracy: 0.867
accuracy=0.866900
174.10user 8.20system 0:07.17elapsed 2541%CPU (0avgtext+0avgdata 501636maxresident)

####
100 epoch:


4733.37user 13.77system 2:14.40elapsed 3531%CPU (0avgtext+0avgdata 501476maxresident)k
0inputs+0outputs (0major+649505minor)pagefaults 0swaps



Master 3b663ef

(py3_venv) piotr@ec2 cpu:0: ~/m/e/module [master]> /usr/bin/time python mnist_mlp.py
[23:08:09] ../src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[23:08:09] ../src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
epoch 000: accuracy=0.123883
epoch 001: accuracy=0.679983
INFO:root:Epoch[0] Train-accuracy=0.133283
INFO:root:Epoch[0] Time cost=0.781
INFO:root:Epoch[0] Validation-accuracy=0.252500
INFO:root:Epoch[1] Train-accuracy=0.725100
INFO:root:Epoch[1] Time cost=0.935
INFO:root:Epoch[1] Validation-accuracy=0.866900
batch 000 acc: 0.910        
batch 020 acc: 0.890        
batch 040 acc: 0.870        
batch 060 acc: 0.910        
batch 080 acc: 0.840        
validation Accuracy: 0.867  
accuracy=0.866900           
105.58user 83.53system 0:06.92elapsed 2733%CPU (0avgtext+0avgdata 502812maxresident)k
0inputs+0outputs (0major+266146minor)pagefaults 0swaps

#####
With 100 epochs:

2654.15user 2779.46system 2:33.42elapsed 3541%CPU (0avgtext+0avgdata 503584maxresident)k
0inputs+0outputs (0major+697918minor)pagefaults 0swaps



@larroy
Copy link
Contributor

larroy commented Jun 14, 2019

Gluon unit tests seem to run faster with the system omp:

test_gluon.test_np_shape_parameters ... (2, 2016)
(2, 16)
ok
test_gluon.test_gluon_param_load ... ok

----------------------------------------------------------------------
Ran 107 tests in 70.958s

OK (SKIP=18)
206.25user 490.32system 1:12.04elapsed 966%CPU (0avgtext+0avgdata 13262532maxresident)k
0inputs+976152outputs (0major+33768421minor)pagefaults 0swaps

Master:

(2, 16)
ok
test_gluon.test_gluon_param_load ... ok

----------------------------------------------------------------------
Ran 107 tests in 259.936s

OK (SKIP=18)
949.25user 1507.26system 4:21.16elapsed 940%CPU (0avgtext+0avgdata 13279492maxresident)k
0inputs+976152outputs (0major+34278780minor)pagefaults 0swaps

My previous runs were made on a 72 core machine (c5d.18xlarge)

@larroy
Copy link
Contributor

larroy commented Jun 14, 2019

#14999

@cjolivier01
Copy link
Member

As you can see, standard MKL builds also use libomp.

ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
linux-vdso.so.1 (0x00007ffc989cf000)
libmklml_intel.so => /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so (0x00007f0afb7c1000)
libiomp5.so => /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so (0x00007f0afb3e5000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0afb1dd000)
libmkldnn.so.0 => /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0 (0x00007f0afa7ba000)
libgfortran.so.3 => /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3 (0x00007f0afa493000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0afa28f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0af9f06000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0af9b68000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0af9950000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0af9731000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0af9340000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
libquadmath.so.0 => /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0 (0x00007f0af9100000)

@larroy
Copy link
Contributor

larroy commented Jun 17, 2019

Why do you mean by your latest comment and what additional info are you requesting?

What is your comment regarding the speed improvements and solving the issues with the bundled OpenMP please elaborate on your veto and how to move forward.

Please provide a way forward, there's a significant work providing data having made by project contributors that you are dismissing by closing this PR without justification.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-response PR is reviewed and waiting for contributor to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.