Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-545] Fix broken cython build #10951

Merged
merged 114 commits into from
May 24, 2019
Merged

[MXNET-545] Fix broken cython build #10951

merged 114 commits into from
May 24, 2019

Conversation

asitstands
Copy link
Contributor

@asitstands asitstands commented May 15, 2018

Description

Fix broken cython module for ndarray and also ensure the compatibility with the recent version (0.28) of cython (#10295, #10738). See this comment for the details of the changes.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@asitstands asitstands requested a review from szha as a code owner May 15, 2018 14:16
@szha
Copy link
Member

szha commented May 21, 2018

How is this tested?

@asitstands
Copy link
Contributor Author

@szha I just checked that .cpp files are generated at python/mxnet/cython and .so files at python/build/lib.linux-x86_64-3.6/mxnet/_cy3. All unit tests are passed in my environment. I'm not sure CI has cython build. Are there any other checks do you recommend?

@szha
Copy link
Member

szha commented May 22, 2018

Thanks. @marcoabreu do you have recommendation on how to integrate the tests for cython (e.g. whether to integrate into existing environments or to add a new environment)?

@asitstands
Copy link
Contributor Author

asitstands commented May 22, 2018

This PR does not work as expected. I made a mistake to set MXNET_ENABLE_CYTHON instead of MXNET_ENFORCE_CYTHON when I tested. An import error occurs if a python script using mxnet runs with MXNET_ENFORCE_CYTHON=1.

Traceback (most recent call last):
  File "/pds/pds131/asitdepends/MXNet/usr/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/ndarray/_internal.py", line 30, in <module>
    from .._cy3.ndarray import NDArrayBase, CachedOp
ImportError: /pds/pds131/asitdepends/MXNet/usr/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/_cy3/ndarray.cpython-36m-x86_64-linux-gnu.so: undefined symbol: NNGetLastError

The .so files from the .pyx files are created but the linker does not find libmxnet.so. I guess that this line is the source of the problem. However the cython compilation itself fails without that line. The compile error is

error: each element of 'ext_modules' option must be an Extension instance or 2-tuple

as reported in #10295.

@asitstands asitstands changed the title Fix broken build with cython 0.28 Fix broken cython build May 26, 2018
@asitstands
Copy link
Contributor Author

asitstands commented May 26, 2018

Now, it passed all tests under tests/python in my environment (python2 & python3, mkl, cuda, no mkldnn, no cudnn). There were three problems in the cython module.

  1. setup.py is not compatible with cython 0.28. It is fixed.
  2. The shared library files generated by cython have symbols defined in libmxnet.so, but the dynamic linker failed to find libmxnet.so unless LD_LIBRARY_PATH is set to the installation directory. This caused import error at runtime but the default behavior of mxnet is to try cython first and fallback to ctypes silently if an import error is encountered. So users cannot catch the error unless they set MXNET_ENFORCE_CYTHON env variable. This PR provides default search paths for libmxnet.so via rpath so that the linker can find it without LD_LIBRARY_PATH.
  3. It looks like that the cython module for ndarray has been broken for months. It caused segfault due to the lack of a proper support for sparse arrays. This is also fixed.

@asitstands
Copy link
Contributor Author

@szha @marcoabreu I think that it is necessary to add a cython build to CI. I'm not familiar with the CI system. Would you help?

@asitstands asitstands changed the title Fix broken cython build [MXNET-545] Fix broken cython build Jun 14, 2018
@asitstands
Copy link
Contributor Author

@piiswrong @szha @marcoabreu Would you review this PR?

@marcoabreu
Copy link
Contributor

Sure. How can I help you?

@asitstands
Copy link
Contributor Author

asitstands commented Jun 14, 2018

@marcoabreu Thank you. First, I'm just not sure what is the best way to add a cython build to CI. There are already many environments. Would you recommend some existing (linux) environments to add cython build? Or is it a good idea to add new environments for cython build? Second, I'm not familiar with Jenkins and docker. It looks like that I need to edit Jenkinsfile and files under docker dir. Are there any other files that I need to check?

@marcoabreu
Copy link
Contributor

Depends, I'm not familiar with with cython :) Does it require its own pipeline or is it just a compile-flag we can turn on?

Everything of interest to you would be in Jenkinsfile and ci/docker, right.

@asitstands
Copy link
Contributor Author

asitstands commented Jun 14, 2018

make
make cython

cd python
# `pip install -e .` or 
export PYTHONPATH=my_installation_dir
python setup.py install --with-cython --prefix=my_installation_dir

# run tests ...

I think that this is the typical way to build and run the tests. cmake build would be similar. Setting cxx/nvcc compiler flags is not needed.

@marcoabreu
Copy link
Contributor

marcoabreu commented Jun 14, 2018 via email

@asitstands
Copy link
Contributor Author

asitstands commented Apr 15, 2019

Can we get this PR merged? I'm not sure what prevents the merging. Anyway MXNet does not use cython modules unless the env var MXNET_USE_CYTHON is set in runtime. So this PR does not change any behavior of MXNet unless the env var is set. Merging this PR is relatively safe and we could observe its effect after merging. Or we could completely remove the cython modules. They exist but are broken. Keeping the broken things would not be a good idea.

@sl1pkn07
Copy link
Contributor

broken again if use with master

@larroy
Copy link
Contributor

larroy commented May 1, 2019

Agree we should move this PR forward, or do something about it. Has been open for too long.

@larroy
Copy link
Contributor

larroy commented May 1, 2019

Do we need to rebase it? the patch shows unrelated diffs.

@larroy
Copy link
Contributor

larroy commented May 2, 2019

@asitstands could you rebase so we try to get this merged?

@asitstands
Copy link
Contributor Author

I'm not sure why this PR shows unrelated commits with hundreds of changed files. My own repository shows only 13 files actually changed by this PR. I found that the target branch of this PR has been changed to apache:cython_fix from apache:master. Any relation with this? @szha

@szha szha changed the base branch from cython_fix to master May 3, 2019 18:03
@szha
Copy link
Member

szha commented May 3, 2019

@asitstands as this PR has been around for quite a while I changed it to a feature branch with the hope to merge it there so that others may carry on this work. Since you're working on this now, it's no longer necessary so I changed it back. It should be ready for review after another rebase.

@asitstands
Copy link
Contributor Author

Now this PR has no confliction with the main branch and passes the tests. Please review whether it is OK to be merged.

@szha szha merged commit d0ff3cd into apache:master May 24, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Fix broken build with cython 0.28

* Fix setup.py to be compatible with cython 0.28

* Fix broken cython ndarray module

* Revised comments

* Replace hard coded library path with one obtained by find_lib_path

* Add documentation for MXNET_ENABLE_CYTHON and MXNET_ENFORCE_CYTHON

* Add cython build to CI

* Fix for cython CI

* Adjust python environment for cython CI

* Add make variables to set python executable

* Fix typo

* Fix nnvm include path

* Does not use ccache for cython

* Fix issues with the wildcards in the library list in Jenkinsfile

* Fix issues with the wildcards in the library list in Jenkinsfile (continued)

* Fix issues with the wildcards in the library list in Jenkinsfile (continued)

* Intentionally introduce a bug to check that the tests actually runwith cython

* Remove the intentionally introduced bug

* Update installation doc

* Retrigger CI

* Run cython CI in ubuntu environment instead of CentOS environment

* Commit a missed file

* Fix a bug in check_cython

* Refine environments for cython CI

* Restore unrelated changes

* Fix a problem occurring when the cython modules for python 2 and 3 are built successively

* Trigger CI

* Pin the cython version in the CI

* Catch up apache#11320

* Remove optional arguments unused after apache#11320

* Trigger CI

* Trigger CI

* Trigger CI

* Remove unnecessary stype argument from the NDArrayBase constructor

* Revise confusing initialization of `_ndarray_cls`

* Add cython build for python3 in CI

* Fix misplaced cython build in CI

* Adjust CI environments for cython

* Fix invalid path for cython generated .so files in cmake build

* Revert invalid fix

* Revise docs

* Revise check_cython

* Temporaily use make instead of cmake for debugging

* Temporal changes for debugging

* Temporal changes for debugging

* Temporaily use ctypes instead of cython modules for debugging

* Temporaily disable ccache for debugging

* Temporaily use make (DEV = 0) instead of cmake for debugging

* Temporaily disable cudnn for debugging

* Restore temporal changes

* Temporarily disable coverage report

* Adapt to Jenkinsfile_utils

* Adapt to Jenkinsfile_utils (cont.)

* Restore unrelated changes

* Restore temporal changes

* Resolving conflict

* Test with the cmake build is removed

* Add MXNET_ENABLE_CYTHON=0 to tensorrt test

* Fix typo

* Trigger CI

* Trigger CI

* Adapt to Jenkinsfile refactoring

* Adapt to Jenkinsfile refactoring (cont.)

* Trigger CI

* Trigger CI

* Stash missing cython modules

* Trigger CI

* CMake build of cython modules without unit tests

* Fix typo

* Trigger CI

* Fix a mistake introduced in merging process

* trigger test

* Update Jenkinsfile_utils.groovy

* Trigger CI

* Trigger CI

* Trigger CI

* Trigger tests

* Trigger tests

* Trigger tests

* Trigger tests

* Trigger tests

* Trigger tests

* Trigger tests

* Trigger tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.