Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet crashes while trying load ONNX model #13138

Open
movchan74 opened this issue Nov 6, 2018 · 6 comments · Fixed by #13604
Open

MXNet crashes while trying load ONNX model #13138

movchan74 opened this issue Nov 6, 2018 · 6 comments · Fixed by #13604

Comments

@movchan74
Copy link

movchan74 commented Nov 6, 2018

Description

MXNet crashes when I'm trying load ONNX model. The model is SE-ResNet50 that was converted from pytorch.

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Dec  4 2017 14:50:18'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '10.0.1')
('Directory    :', '/home/aleksandr/pytorch/local/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform     :', 'Linux-4.4.0-135-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', 'hal9000')
('release      :', '4.4.0-135-generic')
('version      :', '#161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Stepping:              4
CPU MHz:               1288.945
CPU max MHz:           3900,0000
CPU min MHz:           1200,0000
BogoMIPS:              6804.03
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: /~https://github.com/apache/incubator-mxnet, DNS: 0.0264 sec, LOAD: 0.5575 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0248 sec, LOAD: 0.7411 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0350 sec, LOAD: 1.2637 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0395 sec, LOAD: 0.4630 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0611 sec, LOAD: 0.2351 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0495 sec, LOAD: 0.8250 sec.

Package used (Python/R/Scala/Julia):
I'm using mxnet==1.3.0.post0, pytorch==0.4.0, onnx==1.3.0

Error Message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/import_model.py", line 53, in import_model
    sym, arg_params, aux_params = graph.from_onnx(model_proto.graph)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/import_onnx.py", line 115, in from_onnx
    mxnet_sym = self._convert_operator(node_name, op_name, onnx_attr, inputs)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/import_onnx.py", line 61, in _convert_operator
    op_name, new_attrs, inputs = convert_map[op_name](attrs, inputs, self)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/_op_translations.py", line 69, in multiply
    broadcast_axis, proto_obj)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/_translation_utils.py", line 160, in _fix_broadcast
    input0_shape = get_input_shape(inputs[0], proto_obj)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/contrib/onnx/onnx2mx/_translation_utils.py", line 232, in get_input_shape
    mod.bind(for_training=False, data_shapes=data_shapes, label_shapes=None)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/module/module.py", line 429, in bind
    state_names=self._state_names)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 279, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 375, in bind_exec
    shared_group))
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 662, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1528, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
0: (1, 3L, 224L, 224L)
Error in operator broadcast_mul1: [18:21:01] src/operator/tensor/./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1 operands could not be broadcast together with shapes [256,256,56,56] [65536,1,1,1]

Stack trace returned 10 entries:
[bt] (0) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1d86a2) [0x7f5f5390e6a2]
[bt] (1) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1d8cb8) [0x7f5f5390ecb8]
[bt] (2) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0xd00257) [0x7f5f54436257]
[bt] (3) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2bfc3bf) [0x7f5f563323bf]
[bt] (4) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2bfef20) [0x7f5f56334f20]
[bt] (5) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2beb8af) [0x7f5f563218af]
[bt] (6) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2bec394) [0x7f5f56322394]
[bt] (7) /home/aleksandr/pytorch/local/lib/python2.7/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2260) [0x7f5f562810a0]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5f7860be40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f5f7860b8ab]

Minimum reproducible example

The model as ONNX file: https://drive.google.com/file/d/1M8i8n8hWs6wP8eCERKcc3rORUGHq4ghw/view?usp=sharing

The code to reproduce error:

import mxnet as mx
sym, arg_params, aux_params = mx.contrib.onnx.import_model('imagenet_se_resnet50.onnx')

To reproduce the model:
model definition: /~https://github.com/movchan74/pretrained-models.pytorch
the code to convert to ONNX

import pretrainedmodels
import torch
import torch.onnx

model = pretrainedmodels.se_resnet50(num_classes=1000, pretrained='imagenet')

dummy_input = torch.randn(1, 3, 224, 224)

torch.onnx.export(model, dummy_input, 'imagenet_se_resnet50.onnx')

@leleamol
Copy link
Contributor

leleamol commented Nov 6, 2018

Hi @movchan74 ,
Thank you for submitting the issue! I'm labeling it so MXNet community members can help resolve it.
@sandeep-krishnamurthy @ankkhedia @Roshrini

@mxnet-label-bot [ONNX, Bug]

@movchan74
Copy link
Author

It is worth to mention that model works with mxnet==1.2.1.post1 but only for batch size 1. For batch size, more than one an error is the same.

@ankkhedia
Copy link
Contributor

@Roshrini @vandanavk

@lupesko
Copy link
Contributor

lupesko commented Nov 24, 2018

Was able to reproduce with latest MXNet (1.4.0b20181123) and ONNX 1.3.0.

Bouncing this @Roshrini @vandanavk can you guys check out this error and see if you can help?

@vandanavk
Copy link
Contributor

@movchan74 @lupesko Working on this. Will have an update soon.

@Roshrini
Copy link
Member

Roshrini commented Mar 1, 2019

Fix with PR #13604 doesnt fix all the models. Reopening this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants