SoftmaxOutput crashes with normalization "valid" #14301

ashokei · 2019-03-02T07:34:24Z

Description

Environment info (Required)

ubuntu 16.04 default build
and, run below script.

import numpy as np                                                
import mxnet as mx                                                
xpu = mx.cpu()                                                    
x = mx.sym.Variable('x')                                          
label = mx.sym.Variable('label')                                  
x_nd = mx.nd.array([[1, 6, 4, 2],[1, 6, 4, 2]], ctx=xpu)          
grad_x = mx.nd.zeros((2,4), ctx=xpu)                              
label_nd = mx.nd.array([1,1], ctx=xpu)                            
                                                                  
sym = mx.sym.SoftmaxOutput(data=x, label=label, ignore_label=0,   
                           use_ignore=True, normalization="valid")
ex = sym.bind(ctx=xpu, args={'x': x_nd, 'label': label_nd},       
              args_grad={'x': grad_x})                            
                                                                  
ex.forward(is_train=True)                                         
softmax_out = ex.outputs[0].asnumpy()                             
ex.backward(is_train=True)

MXNet commit hash:
fb4f9d5

Build config:
make

Error Message:

terminated by signal SIGSEGV (Address boundary error)

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-03-02T07:34:26Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

wkcn · 2019-03-02T08:51:09Z

Thanks for your report!

reproduce the bug in MXNet fb4f9d5

It is strange that the address of ctx.requested[softmaxout_enum::kTempSpace] is 0 in src/operator/softmax_output-inl.h .
ctx.requested.size() is 0 in Backward.

It is a bug that BackwardResource is not called when ex.backward(is_train=true) is called.
I do not know why SoftmaxOutput is not a legacy operator.

DickJC123 · 2019-03-04T00:04:44Z

I too recently saw an issue with Softmax that generated a segfault. This behavior began with the Softmax operator changes introduced by #13699 and occurs when the framework is compiled with USE_MKLDNN=0. The failing test is with sockeye:

test/integration/test_constraints_int.py::test_constraints[--encoder rnn --decoder rnn --num-layers 1 --rnn-cell-type lstm --rnn-num-hidden 8 --num-embed 4  --rnn-attention-type mlp --rnn-attention-num-hidden 8 --loss cross-entropy --optimized-metric perplexity --max-updates 2 --checkpoint-frequency 2 --optimizer adam --initial-learning-rate 0.01 --batch-type sentence  --decode-and-evaluate 0-2-10] ./test.sh: line 3:    62 Segmentation fault      python setup.py test
++ RV=139

Perhaps you could verify that your fix corrects this behavior?

wkcn · 2019-03-04T06:15:06Z

@DickJC123
Hi! I test sockeye in my laptop with MXNet(master) with make -j 5 USE_OPENCV=1 USE_BLAS=openblas USE_MKLDNN=0 USE_CPP_PACKAGE=1

All tests pass except for test/unit/test_inference.py::test_topk_func

test/unit/test_inference.py::test_topk_func[1-5-200] FAILED              [ 60%]
test/unit/test_inference.py::test_topk_func[5-5-200] FAILED              [ 60%]
test/unit/test_inference.py::test_topk_func[1-1-200] PASSED              [ 60%]
test/unit/test_inference.py::test_topk_func[5-1-200] PASSED              [ 60%]
test/unit/test_inference.py::test_topk_func[10-10-100] FAILED            [ 60%]

In sockeye, there is no any Softmax with normalization valid, so it couldn't trigger the bug in this issue.

anirudhacharya · 2019-03-04T21:09:58Z

I can also confirm this issue, it happens only when normalization-"valid" and while executing the Executor.backward function call. For instance this sample code works fine -

import mxnet as mx                                                                                                                                            
import numpy as np

xpu = mx.cpu()
x_nd = mx.nd.array([[1, 6, 4, 2],[1, 6, 4, 2]], ctx=xpu)    
grad_x = mx.nd.zeros((2,4), ctx=xpu)    
label_nd = mx.nd.array([1,1], ctx=xpu)

x_nd.attach_grad()

with mx.autograd.record():
    y = mx.nd.SoftmaxOutput(data=x_nd, label=label_nd, ignore_label=0, use_ignore=True) #, normalization="valid")

y.backward()
print(x_nd.grad)

So the bug is with the gradient calculation of softmax output when normalization="valid"

fhieber · 2019-03-04T21:48:16Z

@wkcn Sockeye can use 'valid' normalization in its SoftmaxOutput operator use, see here.
The failure you are observing for test/unit/test_inference.py::test_topk_func is related to #13862, which is an open problem.

wkcn · 2019-03-04T23:02:26Z

@fhieber Sorry that I overlooked it.
@anirudhacharya #14302 will address the problem.

wkcn added the Bug label Mar 2, 2019

wkcn mentioned this issue Mar 2, 2019

fix SoftmaxOutput resource bug #14302

Merged

8 tasks

wkcn closed this as completed in #14302 Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SoftmaxOutput crashes with normalization "valid" #14301

SoftmaxOutput crashes with normalization "valid" #14301

ashokei commented Mar 2, 2019

mxnet-label-bot commented Mar 2, 2019

wkcn commented Mar 2, 2019 •

edited

Loading

DickJC123 commented Mar 4, 2019

wkcn commented Mar 4, 2019 •

edited

Loading

anirudhacharya commented Mar 4, 2019

fhieber commented Mar 4, 2019

wkcn commented Mar 4, 2019

SoftmaxOutput crashes with normalization "valid" #14301

SoftmaxOutput crashes with normalization "valid" #14301

Comments

ashokei commented Mar 2, 2019

Description

Environment info (Required)

Error Message:

mxnet-label-bot commented Mar 2, 2019

wkcn commented Mar 2, 2019 • edited Loading

DickJC123 commented Mar 4, 2019

wkcn commented Mar 4, 2019 • edited Loading

anirudhacharya commented Mar 4, 2019

fhieber commented Mar 4, 2019

wkcn commented Mar 4, 2019

wkcn commented Mar 2, 2019 •

edited

Loading

wkcn commented Mar 4, 2019 •

edited

Loading