Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

gluon.utils.clip_global_norm/nd.dot with fp16 throws fatal message #13736

Closed
eric-haibin-lin opened this issue Dec 27, 2018 · 3 comments
Closed

Comments

@eric-haibin-lin
Copy link
Member

eric-haibin-lin commented Dec 27, 2018

On GPU with 1-D input:

>>> b = mx.nd.ones((10,), dtype='float16', ctx=mx.gpu())
>>> b

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
<NDArray 10 @gpu(0)>
>>> mx.nd.dot(b,b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 189, in __repr__
    return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
  File "/home/ubuntu/.local/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1972, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/ubuntu/.local/lib/python2.7/site-packages/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:26:22] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./././dot_engine-inl.h:571: Not implmented!

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x382d4a) [0x7fa1b33ced4a]
[bt] (1) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x383381) [0x7fa1b33cf381]
[bt] (2) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3d19080) [0x7fa1b6d65080]
[bt] (3) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2b884f8) [0x7fa1b5bd44f8]
[bt] (4) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2ae9039) [0x7fa1b5b35039]
[bt] (5) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2af2a24) [0x7fa1b5b3ea24]
[bt] (6) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2af6cf3) [0x7fa1b5b42cf3]
[bt] (7) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2af6f46) [0x7fa1b5b42f46]
[bt] (8) /home/ubuntu/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2af3134) [0x7fa1b5b3f134]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa1eb8dcc80]

Note that gluon.utils.clip_global_norm internally calls nd.dot and fails, too

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Gluon, Feature

@szha szha added the Bug label Dec 28, 2018
@ChaiBapchya
Copy link
Contributor

Unable to reproduce it
With latest master

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
>>> import mxnet as mx
>>> b = mx.nd.ones((10,), dtype='float16', ctx=mx.gpu())
[18:10:21] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
>>> mx.nd.dot(b,b)

[10.]
<NDArray 1 @gpu(0)>

@ChaiBapchya
Copy link
Contributor

Fixed by #14102

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants