Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Flaky Test] Python3: MKLDNN-GPU test_kvstore_gpu.test_rsp_push_pull #14189

Open
Chancebair opened this issue Feb 18, 2019 · 5 comments · Fixed by #14483
Open

[Flaky Test] Python3: MKLDNN-GPU test_kvstore_gpu.test_rsp_push_pull #14189

Chancebair opened this issue Feb 18, 2019 · 5 comments · Fixed by #14483

Comments

@Chancebair
Copy link
Contributor

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/347/pipeline

======================================================================

ERROR: test_kvstore_gpu.test_rsp_push_pull

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/gpu/test_kvstore_gpu.py", line 106, in test_rsp_push_pull

    check_rsp_push_pull('local', sparse_pull)

  File "/work/mxnet/tests/python/gpu/test_kvstore_gpu.py", line 89, in check_rsp_push_pull

    check_rsp_pull(kv, [mx.gpu(0)], sparse_pull)

  File "/work/mxnet/tests/python/gpu/test_kvstore_gpu.py", line 74, in check_rsp_pull

    retained = val.asnumpy()

  File "/work/mxnet/python/mxnet/ndarray/sparse.py", line 195, in asnumpy

    return self.tostype('default').asnumpy()

  File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1995, in asnumpy

    ctypes.c_size_t(data.size)))

  File "/work/mxnet/python/mxnet/base.py", line 252, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [02:56:09] src/operator/nn/mkldnn/mkldnn_base.cc:567: Check failed: similar 



Stack trace returned 10 entries:

[bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x60) [0x7ff47647fa60]

[bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7ff476480052]

[bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x1d61) [0x7ff4765382e1]

[bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x5218c5b) [0x7ff479667c5b]

[bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x20) [0x7ff4765a9d80]

[bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x15e) [0x7ff4799d235e]

[bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x1c) [0x7ff4799d249c]

[bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::engine::ThreadedEngine::BulkFlush()::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x234) [0x7ff47a3ca744]

[bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xc84) [0x7ff47a3d1684]

[bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x161) [0x7ff47a3e7be1]





-------------------- >> begin captured logging << --------------------

common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1032824746 to reproduce.

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1155716252 to reproduce.

--------------------- >> end captured logging << ---------------------

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Test

@Chancebair Chancebair changed the title [Test Failure] Python3: MKLDNN-GPU test_kvstore_gpu.test_rsp_push_pull [Flaky Test] Python3: MKLDNN-GPU test_kvstore_gpu.test_rsp_push_pull Feb 18, 2019
@Chancebair
Copy link
Contributor Author

@mxnet-label-bot add [Flaky, Test]

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [Disabled test]

wkcn pushed a commit that referenced this issue Mar 11, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this issue Mar 31, 2019
nswamy pushed a commit that referenced this issue Apr 5, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this issue Jun 23, 2019
@leezu leezu reopened this Apr 5, 2020
@leezu
Copy link
Contributor

leezu commented Apr 5, 2020

@stu1130 @eric-haibin-lin failing again http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-17969/2/pipeline/

[2020-04-04T22:33:15.136Z] test_kvstore_gpu.test_rsp_push_pull ... /work/runtime_functions.sh: line 1239: 6 Bus error (core dumped) python3.6 -m "nose" $NOSE_COVERAGE_ARGUMENTS $NOSE_TIMER_ARGUMENTS --with-xunit --xunit-file nosetests_gpu.xml --verbose tests/python/gpu

@leezu
Copy link
Contributor

leezu commented Apr 6, 2020

[2020-04-06T01:12:50.019Z] test_kvstore_gpu.test_rsp_push_pull ... *** Error in python3.6': double free or corruption (fasttop): 0x00007f59fc00ce90 ***`

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/centos-gpu/branches/PR-17734/runs/11/nodes/74/steps/100/log/?start=0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants