Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Error while running the mxnet spark examples and test cases #13853

Open
thomelane opened this issue Jan 11, 2019 · 13 comments
Open

Error while running the mxnet spark examples and test cases #13853

thomelane opened this issue Jan 11, 2019 · 13 comments
Assignees
Labels

Comments

@thomelane
Copy link
Contributor

Opening issue on behalf of @adwivedi on the discussion forum (https://discuss.mxnet.io/t/error-while-running-the-mxnet-spark-examples-and-test-cases/2720).

Quotes from the thread...

I am trying to run the mxnet in distributed mode using spark as implemented here : /~https://github.com/apache/incubator-mxnet/tree/master/scala-package/spark

but I am not able to run the examples and/or tests.

The commented tests in the file : /~https://github.com/apache/incubator-mxnet/blob/master/scala-package/spark/src/test/scala/org/apache/mxnet/spark/MXNetGeneralSuite.scala keep running into the following error. I get the same error when I try to run the examples in the repo.

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Exception in thread "Thread-21" java.lang.IllegalArgumentException: requirement failed: Failed to start ps scheduler process with exit code 1
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.mxnet.spark.MXNet.org$apache$mxnet$spark$MXNet$$startPSSchedulerInner$1(MXNet.scala:159)
	at org.apache.mxnet.spark.MXNet$$anonfun$startPSScheduler$1.apply(MXNet.scala:162)
	at org.apache.mxnet.spark.MXNet$$anonfun$startPSScheduler$1.apply(MXNet.scala:162)
	at org.apache.mxnet.spark.MXNet$MXNetControllingThread.run(MXNet.scala:38)
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more

I have seen this error generally when there’s a mismatch between the scala versions in the api, but in this case I am using the using the pom file that’s in the project and not including any external libraries.
I have also looked at the pom file but I’ve not found any lib that might have a mismatch in this case, all the libraries in pom are 2.11 version

I am building it from source, running it with this vm parameter : -

Djava.library.path=/path_to_mxnet_source/incubator-mxnet/scala-package/native/osx-x86_64-cpu/target

to let it find the native library.

@thomelane
Copy link
Contributor Author

@lanking520

@thomelane
Copy link
Contributor Author

@mxnet-label-bot add [Scala]

@lanking520 lanking520 self-assigned this Jan 11, 2019
@lanking520
Copy link
Member

Thanks for @thomelane raising it here. Add a bunch of Scala guru here:
@piyushghai @zachgk @andrewfayres @CodingCat
Please take a look at this issue. I also think it is a good time now to fix the flaky test on the disabled tests and allow Spark run in the CI.

@piyushghai
Copy link
Contributor

piyushghai commented Jan 17, 2019

@adwivedi , Your issue for running the examples should be resolved by this PR : #13849 and #13891. These fixes are now merged into master.
There were recent changes to the Maven POM files due to which the examples were temporarily not running. They should be back to normal now. Let me know if you still face issues with running your examples.

And yes. the java library path that you are setting is the correct one.

@ashutosh-dwivedi-e3502
Copy link
Contributor

ashutosh-dwivedi-e3502 commented Jan 25, 2019

@piyushghai The error still persists. You can reproduce it by un-commenting the test = run spark with MLP with it's related methods in the file org/apache/mxnet/spark/MXNetGeneralSuite.scala

Also, the latest version in master doesn't respect the -Djava.library.path or LD_LIBRARY_PATH_VARIABLES and only picks up the native files from the jar (which doesn't seem to have these files). So to test this I am running a hacked version to make the libraries get picked up from the path (incubator-mxnet/scala-package/native/osx-x86_64-cpu/target) I hard code in NativeLibraryLoader.scala

@ashutosh-dwivedi-e3502
Copy link
Contributor

@piyushghai The path of the jars are still incorrect / incomplete. I've fixed this in this pull request, here : #14020

However there's still one problem, with the latest changes. The mxnet shared library fails to build when USE_DIST_KVSTORE = 1 because of a the following error :

checking whether the C compiler works... no
configure: error: in `/path_to_mxnet/incubator-mxnet/3rdparty/ps-lite/protobuf-2.5.0':
configure: error: C compiler cannot create executables
See `config.log' for more details
make[1]: *** [/path_to_mxnet/incubator-mxnet/deps/include/google/protobuf/message.h] Error 77
make: *** [PSLITE] Error 2

@piyushghai
Copy link
Contributor

@aashudwivedi Thanks for making the fix to the calsspath problem for the Spark issue.

Can you point me to the specific build instructions you followed to build the mxnet shared library ?
Also what's the instance type on which you're building it ?

@piyushghai
Copy link
Contributor

Here's what I did to build the libmxnet.so from source :

git clone --recursive /~https://github.com/apache/incubator-mxnet.git 
cd incubator-mxnet
make clean && make -j$(nproc) USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda-9.0 USE_CUDNN=1 USE_DIST_KVSTORE=1

Here's the instance info on which I built mxnet:

('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Nov 12 2018 14:36:49'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '18.1')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version      :', '1.5.0')
('Directory    :', '/home/ubuntu/.local/lib/python2.7/site-packages/mxnet')
('Commit Hash   :', 'da5242b732de39ad47d8ecee582f261ba5935fa9')
----------System Info----------
('Platform     :', 'Linux-4.4.0-1074-aws-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', 'ip-172-31-78-46')
('release      :', '4.4.0-1074-aws')
('version      :', '#84-Ubuntu SMP Thu Dec 6 08:57:58 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1227.804
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.14
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63

@ashutosh-dwivedi-e3502
Copy link
Contributor

ashutosh-dwivedi-e3502 commented Jan 29, 2019

@piyushghai

I am building it on macOS High Sierra 10.13.6.
Here's what I do to build the libmxnet.so. I have followed the instructions from http://mxnet.incubator.apache.org/versions/master/install/osx_setup.html#build-the-shared-library

    git clone --recursive /~https://github.com/apache/incubator-mxnet ~/mxnet
    cd ~/mxnet
    cp make/osx.mk ./config.mk
    echo "USE_BLAS = openblas" >> ./config.mk
    echo "ADD_CFLAGS += -I/usr/local/opt/openblas/include" >> ./config.mk
    echo "ADD_LDFLAGS += -L/usr/local/opt/openblas/lib" >> ./config.mk
    echo "ADD_LDFLAGS += -L/usr/local/lib/graphviz/" >> ./config.mk
    echo "USE_DIST_KVSTORE=1" >> ./config.mk
    make -j$(sysctl -n hw.ncpu)

Here's the instance info :

----------Python Info----------
('Version      :', '2.7.15')
('Compiler     :', 'GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)')
('Build        :', ('default', 'Oct  2 2018 11:47:18'))
('Arch         :', ('64bit', ''))
------------Pip Info-----------
('Version      :', '18.0')
('Directory    :', '/usr/local/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform     :', 'Darwin-17.7.0-x86_64-i386-64bit')
('system       :', 'Darwin')
('node         :', 'ashutdwi-mac')
('release      :', '17.7.0')
('version      :', 'Darwin Kernel Version 17.7.0: Wed Oct 10 23:06:14 PDT 2018; root:xnu-4570.71.13~1/RELEASE_X86_64')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'i386')
machdep.cpu.brand_string: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID FPU_CSDS
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI

Here are the details of xcode:

Xcode 10.1
Build version 10B61

I've also attached the config.log from 3rdparty/ps-lite/protobuf-2.5.0 dir here config.log

@lanking520
Copy link
Member

lanking520 commented Jan 31, 2019

cannot reproduce the same issue with High Sierra. Here is the list of dependencies I installed:

xcode command-line tool 10

brew install openssl automake pkg-config nasm
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@piyushghai
Copy link
Contributor

@aashudwivedi Any luck with these above steps ?

@ashutosh-dwivedi-e3502
Copy link
Contributor

@piyushghai unfortunately I still have the same problem and I am not sure what other information I should provide you to be able to reproduce this.

@lanking520
Copy link
Member

From the previous message:

checking whether the C compiler works... no
configure: error: in `/path_to_mxnet/incubator-mxnet/3rdparty/ps-lite/protobuf-2.5.0':
configure: error: C compiler cannot create executables

Have checked your C compiler clang --version? It seemed the problems appeared to be some compiler issues. Try make clean and make -j again

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants