Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

Closed
heiyuxiaokai opened this issue Apr 23, 2019 · 10 comments
Closed

[install MXNET] wrong: src/operator/contrib/roi_align_v2.cc #76

heiyuxiaokai opened this issue Apr 23, 2019 · 10 comments
Labels
bug Something isn't working

Comments

@heiyuxiaokai
Copy link

src/operator/contrib/roi_align_v2.cc:210:2: error: no matching function for call to ‘nnvm::Op::set_attr(const char [12], mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape, std::vectormxnet::TShape)>)’
})
^
In file included from include/mxnet/base.h:35:0,
from src/operator/contrib/./../mshadow_op.h:29,
from src/operator/contrib/./roi_align_v2-inl.h:12,
from src/operator/contrib/roi_align_v2.cc:7:
/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include/nnvm/op.h:432:12: note: candidate: template nnvm::Op& nnvm::Op::set_attr(const string&, const ValueType&, int)
inline Op& Op::set_attr( // NOLINT()
^
/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include/nnvm/op.h:432:12: note: template argument deduction/substitution failed:
src/operator/contrib/roi_align_v2.cc:210:2: note: cannot convert ‘mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape
, std::vectormxnet::TShape)>{}’ (type ‘mxnet::op::<lambda(const nnvm::NodeAttrs&, std::vectormxnet::TShape, std::vectormxnet::TShape)>’) to type ‘const std::function<bool(const nnvm::NodeAttrs&, std::vector<nnvm::TShape, std::allocatornnvm::TShape >, std::vector<nnvm::TShape, std::allocatornnvm::TShape >)>&’
})
^
src/operator/contrib/roi_align_v2.cc:211:27: error: expected primary-expression before ‘>’ token
.set_attrnnvm::FInferType("FInferType", [](const nnvm::NodeAttrs& attrs,
^
src/operator/contrib/roi_align_v2.cc:223:1: warning: left operand of comma operator has no effect [-Wunused-value]
})
^
src/operator/contrib/roi_align_v2.cc:224:2: error: ‘struct mxnet::op::<lambda(const struct nnvm::NodeAttrs&, class std::vector<int, std::allocator >
, class std::vector<int, std::allocator >)>’ has no member named ‘set_attr’
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:19: error: expected primary-expression before ‘>’ token
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:38: warning: left operand of comma operator has no effect [-Wunused-value]
.set_attr("FCompute", ROIAlignForward_v2)
^
src/operator/contrib/roi_align_v2.cc:224:38: error: no context to resolve type of ‘ROIAlignForward_v2mxnet::cpu’
src/operator/contrib/roi_align_v2.cc:225:26: error: expected primary-expression before ‘>’ token
.set_attrnnvm::FGradient("FGradient", ROIAlignGrad_v2{"_backward_ROIAlign_v2"})
^
src/operator/contrib/roi_align_v2.cc:225:80: warning: left operand of comma operator has no effect [-Wunused-value]
.set_attrnnvm::FGradient("FGradient", ROIAlignGrad_v2{"_backward_ROIAlign_v2"})
^
src/operator/contrib/roi_align_v2.cc:226:2: error: ‘struct mxnet::op::ROIAlignGrad_v2’ has no member named ‘add_argument’
.add_argument("data", "NDArray-or-Symbol", "Input data to the pooling operator, a 4D Feature maps")
^
g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mshadow/ -I/home/fw/Softwares/simpledet/mxnet/3rdparty/dmlc-core/include -fPIC -I/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/nnvm/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/dlpack/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mkldnn/build/install/include -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_MKLDNN=1 -DUSE_MKL=1 -I/home/fw/Softwares/simpledet/mxnet/src/operator/nn/mkldnn/ -I/home/fw/Softwares/simpledet/mxnet/3rdparty/mkldnn/build/install/include -DMXNET_USE_OPENCV=0 -DMSHADOW_USE_CUDNN=1 -DMXNET_USE_DIST_KVSTORE -I/home/fw/Softwares/simpledet/mxnet/3rdparty/ps-lite/include -I/home/fw/Softwares/simpledet/mxnet/deps/include -I/home/fw/Softwares/simpledet/mxnet/3rdparty/nvidia_cub -I/include -DMXNET_USE_NCCL=1 -DMXNET_USE_LIBJPEG_TURBO=0 -MMD -c src/operator/contrib/sync_batch_norm.cc -o build/src/operator/contrib/sync_batch_norm.o
Makefile:508: recipe for target 'build/src/operator/contrib/roi_align_v2.o' failed
make: *** [build/src/operator/contrib/roi_align_v2.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from src/operator/contrib/sync_batch_norm.cc:26:0:
src/operator/contrib/sync_batch_norm-inl.h: In member function ‘virtual bool mxnet::op::SyncBatchNormProp::InferType(std::vector<int, std::allocator >
, std::vector<int, std::allocator >, std::vector<int, std::allocator >) const’:
src/operator/contrib/sync_batch_norm-inl.h:587:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (index_t i = 1; i < in_type->size(); ++i) {
^
src/operator/contrib/sync_batch_norm-inl.h:594:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (index_t i = 0; i < aux_type->size(); ++i) {
^

@chenxia-han
Copy link
Contributor

It is due to the modification of NNVM API in apache/mxnet#14270. We will fix it soon, or you can switch to an early version of MXNet such as 1.4.0.

@chenxia-han chenxia-han added the bug Something isn't working label Apr 24, 2019
@heiyuxiaokai
Copy link
Author

Thanks for your reply!@xchani
I will try an early version.But when I tried the docker image to train, I got some error with CUDA:

04-24 16:02:04 total iter 868428
04-24 16:02:04 lr 0.00125, lr_iters [2880000, 3840000]
04-24 16:02:04 warmup lr 0.0, warmup step 48000
04-24 16:02:07 Initialized bbox_cls_logit_bias as bias: 0.0
04-24 16:02:07 Initialized bbox_cls_logit_weight as ["normal", {"sigma": 0.01}]: 0.009981153
04-24 16:02:07 Initialized bbox_reg_delta_bias as bias: 0.0
04-24 16:02:07 Initialized bbox_reg_delta_weight as ["normal", {"sigma": 0.001}]: 0.0009942584
04-24 16:02:07 Initialized rpn_bbox_delta_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_bbox_delta_weight as ["normal", {"sigma": 0.01}]: 0.009982816
04-24 16:02:07 Initialized rpn_cls_logit_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_cls_logit_weight as ["normal", {"sigma": 0.01}]: 0.010042911
04-24 16:02:07 Initialized rpn_conv_3x3_bias as bias: 0.0
04-24 16:02:07 Initialized rpn_conv_3x3_weight as ["normal", {"sigma": 0.01}]: 0.009972133
04-24 16:02:08 Initialized stage3_unit21_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit21_conv2_offset_weight as weight: 0.029432593
04-24 16:02:08 Initialized stage3_unit22_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit22_conv2_offset_weight as weight: 0.029415503
04-24 16:02:08 Initialized stage3_unit23_conv2_offset_bias as bias: 0.0
04-24 16:02:08 Initialized stage3_unit23_conv2_offset_weight as weight: 0.029460358
Traceback (most recent call last):
File "detection_train.py", line 231, in
train_net(parse_args())
File "detection_train.py", line 215, in train_net
num_epoch=end_epoch
File "/home/core/detection_module.py", line 995, in fit
self.update_metric(eval_metric, data_batch.label)
File "/home/core/detection_module.py", line 783, in update_metric
self.exec_group.update_metric(eval_metric, labels, pre_sliced)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/module/executor_group.py", line 639, in update_metric
eval_metric.update_dict(labels
, preds)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/metric.py", line 304, in update_dict
metric.update_dict(labels, preds)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/metric.py", line 132, in update_dict
self.update(label, pred)
File "/home/core/detection_metric.py", line 41, in update
pred_label = mx.ndarray.argmax_channel(pred).astype('int32').asnumpy().reshape(-1)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/ndarray/ndarray.py", line 1972, in asnumpy
ctypes.c_size_t(data.size)))
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/base.py", line 251, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [08:02:14] /mnt/ournas/yuntao.chen/mxnet-1.3.1-cuda9.0/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (48 vs. 0) Name: MapPlanKernel ErrStr:no kernel image is available for execution on the device

Stack trace returned 10 entries:
[bt] (0) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f49ad84743b]
[bt] (1) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f49ad847fa8]
[bt] (2) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mshadow::cuda::MapPlan<mshadow::sv::saveto, mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::expr::ScalarExp, float>(mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 2, float>, float>, mshadow::expr::Plan<mshadow::expr::ScalarExp, float> const&, mshadow::Shape<2>, CUstream_st*)+0x1d0) [0x7f49b2563b30]
[bt] (3) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mxnet::ndarray::Evalmshadow::gpu(float const&, mxnet::TBlob*, mxnet::RunContext)+0x16a) [0x7f49b275ce2a]
[bt] (4) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(+0x3890d19) [0x7f49b0342d19]
[bt] (5) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(+0x3df061b) [0x7f49b08a261b]
[bt] (6) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f49b089bf15]
[bt] (7) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptrdmlc::ManualEvent const&)+0xeb) [0x7f49b08b28ab]
[bt] (8) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptrdmlc::ManualEvent), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock
, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptrdmlc::ManualEvent)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrdmlc::ManualEvent&&)+0x4e) [0x7f49b08b2b1e]
[bt] (9) /root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/mxnet-1.3.1-py3.6.egg/mxnet/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptrdmlc::ManualEvent)> (std::shared_ptrdmlc::ManualEvent)> >::_M_run()+0x4a) [0x7f49b089b51a]

@RogerChern
Copy link
Collaborator

Could you please provide your gpu model. This docker will not run on RTX GPUs.

@heiyuxiaokai
Copy link
Author

2X GTX TITAN X @RogerChern

@RogerChern
Copy link
Collaborator

@heiyuxiaokai maxwell or pascal TITAN X?

@heiyuxiaokai
Copy link
Author

maxwell @RogerChern

@chenxia-han
Copy link
Contributor

@heiyuxiaokai We will provide docker for this gpu arch later.

@heiyuxiaokai
Copy link
Author

@xchani Thanks!

@heiyuxiaokai
Copy link
Author

mxnet==1.3x works
When I import mxnet, libcudart.so.8.0 can't be found. But My ubuntu use cuda9.
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import mxnet
Traceback (most recent call last):
File "", line 1, in
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/init.py", line 24, in
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/context.py", line 24, in
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/base.py", line 213, in
_LIB = _load_lib()
File "/home/fw/Softwares/simpledet/mxnet/python/mxnet/base.py", line 204, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.5/ctypes/init.py", line 347, in init
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.8.0: cannot open shared object file: No such file or directory

That's information of libmxnet.so:
fw@whu:~/Softwares/simpledet/mxnet/lib$ ldd libmxnet.so
linux-vdso.so.1 => (0x00007fff1376e000)
libcudart.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.9.0 (0x00007fb16dbdc000)
libcublas.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.9.0 (0x00007fb16a7a6000)
libcurand.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.9.0 (0x00007fb166842000)
libcusolver.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.9.0 (0x00007fb161c47000)
libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007fb15fbb3000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb15f9ab000)
libcudnn.so.7 => /usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.7 (0x00007fb14e514000)
libcufft.so.9.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.9.0 (0x00007fb146473000)
libnccl.so.1 => /usr/local/lib/libnccl.so.1 (0x00007fb143e10000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb143a8e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb143785000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb143563000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb14334d000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb143130000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb142d66000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb18057a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb142b62000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fb142837000)
libcudart.so.8.0 => not found
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fb1425f8000)

cuda9 can be found, but cuda8 can't be found while there is no cuda8. Or cuda8 is required?
@xchani @RogerChern

@RogerChern
Copy link
Collaborator

We have updated cuda9 image to support Maxwell GPUs. Please follow instructions in setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants