Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken osx-arm64 builds #112

Closed
1 task done
chrisburr opened this issue Aug 10, 2022 · 32 comments
Closed
1 task done

Broken osx-arm64 builds #112

chrisburr opened this issue Aug 10, 2022 · 32 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@chrisburr
Copy link
Member

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

All builds since rapidfuzz 2 was released segfault on osx-arm64.

$ mamba create --yes --name test rapidfuzz
$ python -c 'import rapidfuzz; print("Success!")'
fish: Job 1, 'python -c 'import rapidfuzz'' terminated by signal SIGSEGV (Address boundary error)

The last functional build was 1.9.1:

$ mamba create --yes --name test rapidfuzz==1.9.1
$ python -c 'import rapidfuzz; print("Success!")'
Success!

I also see similar segfaults in some of the CI pipelines such as:

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=548027&view=logs&jobId=81eb4d60-76fc-5ac4-a959-9ebb9871bfee&j=696704cc-6fef-57a3-ea36-f27779b8cd5e&t=ed2e6513-0a06-519f-13f9-1e5619642f2a

See also: conda-forge/grayskull-feedstock#52

Installed packages

# packages in environment at /Users/cburr/mambaforge/envs/test:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2022.6.15            h4653dfc_0    conda-forge
jarowinkler               1.2.0           py310hb07a4bc_0    conda-forge
libblas                   3.9.0           16_osxarm64_openblas    conda-forge
libcblas                  3.9.0           16_osxarm64_openblas    conda-forge
libcxx                    14.0.6               h04bba0f_0    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0.dev0      11_0_1_hf114ba7_23    conda-forge
libgfortran5              11.0.1.dev0         hf114ba7_23    conda-forge
liblapack                 3.9.0           16_osxarm64_openblas    conda-forge
libopenblas               0.3.21          openmp_hcb59c3b_0    conda-forge
libzlib                   1.2.12               ha287fd2_2    conda-forge
llvm-openmp               14.0.4               hd125106_0    conda-forge
ncurses                   6.3                  h07bb92c_1    conda-forge
numpy                     1.23.1          py310h0a343b5_0    conda-forge
openssl                   3.0.5                h7aea29f_1    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
python                    3.10.5          h4eee789_0_cpython    conda-forge
python_abi                3.10                    2_cp310    conda-forge
rapidfuzz                 2.4.3           py310hc6dc59f_0    conda-forge
readline                  8.1.2                h46ed386_0    conda-forge
setuptools                63.4.2          py310hbe9552e_0    conda-forge
sqlite                    3.39.2               h40dfcc0_0    conda-forge
tk                        8.6.12               he1e0b03_0    conda-forge
tzdata                    2022a                h191b570_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h642e427_1    conda-forge
zlib                      1.2.12               ha287fd2_2    conda-forge

Environment info

active environment : test
    active env location : /Users/cburr/mambaforge/envs/test
            shell level : 4
       user config file : /Users/cburr/.condarc
 populated config files : /Users/cburr/mambaforge/.condarc
                          /Users/cburr/.condarc
          conda version : 4.12.0
    conda-build version : 3.21.9
         python version : 3.9.13.final.0
       virtual packages : __osx=12.4=0
                          __unix=0=0
                          __archspec=1=arm64
       base environment : /Users/cburr/mambaforge  (writable)
      conda av data dir : /Users/cburr/mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-arm64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /Users/cburr/mambaforge/pkgs
                          /Users/cburr/.conda/pkgs
       envs directories : /Users/cburr/mambaforge/envs
                          /Users/cburr/.conda/envs
               platform : osx-arm64
             user-agent : conda/4.12.0 requests/2.27.1 CPython/3.9.13 Darwin/21.5.0 OSX/12.4
                UID:GID : 501:20
             netrc file : None
           offline mode : False
@chrisburr chrisburr added the bug Something isn't working label Aug 10, 2022
@maxbachmann
Copy link
Contributor

Do the same issues arise when installing via pip?

@maxbachmann
Copy link
Contributor

maxbachmann commented Aug 10, 2022

I also see similar segfaults in some of the CI pipelines such as:

I did see this segfault, but could never figure out the cause. Since this already occurs when importing the package this is all Cython generated code. So I would assume that this is either a build issue or a bug in Cython. However I have no machine with osx to properly debug this.

It would probably help if you could get me a stacktrace for the segfault via gdb (or whatever is common on MacOs)

@maxbachmann maxbachmann added the help wanted Extra attention is needed label Aug 11, 2022
@ngam
Copy link

ngam commented Aug 11, 2022

@maxbachmann have you figured out the problems with ppc/aarch or not yet? I remember trying to figure this out with you a long time ago...

either a build issue or a bug in Cython

Likely a build issue.

However I have no machine with osx to properly debug this

Please give us details and we can do it. @BastianZim can also help?

@ngam
Copy link

ngam commented Aug 11, 2022

The order of business here:

  1. mark all these arm64 builds as broken
  2. correct the build issues

@BastianZim
Copy link
Member

@BastianZim can also help?

Sure, just let me know what to run – happy to help.

Note, I'm on macOS M1 not ppc/aarch

@ngam
Copy link

ngam commented Aug 11, 2022

As an advice, @maxbachmann, please avoid merging PRs when there are tests failing. If we can't figure out the issues with ppc/aarch, let's just disable them for now and if someone really needs them, they can help figuring that out; skip: true # [ ... ] is the command

Yes, I am mainly interested in fixing the osxarm builds for now as well. This is quite important to fix. The PyPI builds (as proxied by grayskull) work completely fine, so it is a build issue here likely.

@maxbachmann
Copy link
Contributor

mark all these arm64 builds as broken

How can this be done. The broken pypy builds are likely an issue with scikit-build and should be marked as broken as well.

Maybe the /~https://github.com/conda-forge/jarowinkler-feedstock package is easier to start for debugging, since it is a smaller package which apparently has the same problems: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=538295&view=logs&jobId=696704cc-6fef-57a3-ea36-f27779b8cd5e&j=696704cc-6fef-57a3-ea36-f27779b8cd5e&t=ed2e6513-0a06-519f-13f9-1e5619642f2a

@ngam
Copy link

ngam commented Aug 11, 2022

I think it is scikit-build

@BastianZim do you have any experience with that?

First, we likely need to add it to build for cross compile. I also found this note by a quick code search:

# override scikit-build handling for macOS arm platforn on conda-forge cross-compiling
if [ "$target_platform" = "osx-arm64" ]; then
  export CMAKE_OSX_ARCHITECTURES="arm64"
  # workaround for /~https://github.com/scikit-build/scikit-build/issues/589
  rm -rf $PREFIX/lib/libpython$PY_VER.dylib
  ln -sf $PREFIX/lib/libc++.dylib $PREFIX/lib/libpython$PY_VER.dylib
fi

/~https://github.com/conda-forge/fastscapelib-f2py-feedstock/blob/3062c35eded6f0e3015f5968550107c72ba945c4/recipe/build.sh

@BastianZim
Copy link
Member

How can this be done.

As outlined in the docs here.

In short, the process is:

  1. create a file called rapidfuzz.txt
  2. add all of the broken builds to the file. A list of builds is here. The format will be osx-64/rapidfuzz-2.4.3-py38hd331d03_0.tar.bz2 See here for example.
  3. Make a PR to the admin-requests feedstock and add the rapidfuzz.txt file to /~https://github.com/conda-forge/admin-requests/tree/main/broken This is one example
  4. Once merged, the bot will mark them as broken.

Sorry, away from my computer today, so I can't do it myself but feel free to ping me if I can help with anything.

@BastianZim
Copy link
Member

@BastianZim do you have any experience with that?

With scikit-build? No, sorry, this is the first time seeing this error. But its feedstock maintainers are very helpful, maybe we can ask them once we have a clearer understanding.

@ngam
Copy link

ngam commented Aug 11, 2022

I can handle the the admin stuff later this afternoon, no problem

@ngam
Copy link

ngam commented Aug 11, 2022

working on jarowinkler conda-forge/jarowinkler-feedstock#12

@ngam
Copy link

ngam commented Aug 11, 2022

fyi: conda-forge/admin-requests#472

@ngam
Copy link

ngam commented Aug 12, 2022

@maxbachmann could you hold on for ~one week before merging anything? If we can't find a solution, we will just go ahead and disable the bad builds. But let's try not to merge anything with osxarm or the failing tests on the other systems for now.

@maxbachmann
Copy link
Contributor

Sure I am in no hurry

@maxbachmann
Copy link
Contributor

Could someone at least check whats leading to the segmentation fault by running it with a debugger? This might already tell us the cause of the failure. Not sure how this is done on OSX under Linux this could be done with gdb.

@maxbachmann
Copy link
Contributor

In addition: Is there any way to print a backtrace when encountering a segmentation fault in the conda forge tests? I do this when running tests in cibuildwheel:

/~https://github.com/maxbachmann/RapidFuzz/blob/dcf67467677a89a8613c65956a5aadc073c918a2/.github/workflows/releasebuild.yml#L41

However I could not find a way to do the same for conda-forge. I do not have most platform combinations locally. So when there is a segmentation fault like here for linux cpython3.7 aarch64 in the CI I am unable to reproduce this locally (without way to much work). At this point I simply do not care about the failure enough, especially because apparently the builds for PyPi work.

@maxbachmann
Copy link
Contributor

@henryiii adding you into the discussion, since these build issues are related to scikit-build (all builds worked fine until v2.0.0 which replaced setuptools with scikit-build). To give you a short overview I am experiencing the following builds issues on all of my projects using scikit-build:

  • when building for PyPy windows the Python headers are not found
  • when building for PyPy Linux aarch64/ppc64le on conda forge the Python headers are not found.
    Note that this work fine when building under emulation using cibuildwheel, so this is likely related to cross compiling
  • the builds for Linux Cpython 3.7 aarch64/ppc64le leads to a segmentation fault on import for conda forge. Again this appears to be related to cross compiling, since I do not experience this issue for the PyPi build. In addition I found it surprising that the same issue does not occur for other Python versions (it is unclear to me how to get a stacktrace of the segmentation fault in the conda forge tests)
  • any builds for osx arm64 are broken when building for conda forge. I assume this is related to python dylib is linked in when cross-compiling scikit-build/scikit-build#589, but this does not occur in cibuildwheel. As far as I know cibuildwheel cross compiles the osx arm64 binaries as well.

The issues with PyPy and Python3.7 aarch64/ppc64le are not super important since they do not have a ton of users. However the issue for osx arm64 is going to affect a growing number of osx users. I appears the osx issue was fixed in other packages (e.g. /~https://github.com/conda-forge/slycot-feedstock) by setting:

export CMAKE_OSX_ARCHITECTURES="arm64"

however this did not fix the issue for @ngam (I do not own any osx devices, so I am unable to debug this)

@ngam
Copy link

ngam commented Aug 25, 2022

I tried debugging but I couldn't get anywhere. This is difficult problem that's beyond me at this point.

If you want me to set up a PR for you to skip these faulty builds, I can do that so that you can resume pushing and merging your work. Would you like me to do that?

Or you can do it yourself, in both here and jarowinkler, set:

skip: true  # [osx and build_platform != target_platform]
skip: true  # [py37]
skip: true  # [python_impl == 'pypy']

to skip all osx-arm64 builds, all python37 builds, and all pypy builds.

@ngam
Copy link

ngam commented Aug 25, 2022

And rerender

@henryiii
Copy link
Contributor

I think fixing this will likely require changes in scikit-build (which I can work on, hopefully today or tomorrow). It might be best to put in the skips then I can work on removing the skips in a PR.

@henryiii
Copy link
Contributor

Any idea how to trigger the failure in CI? python -c 'import rapidfuzz; print("Success!")' looks like tests: imports: [rapidfuzz] but that passes in CI. (I can run this locally, but would like to be able to make the issue show up in CI).

@maxbachmann
Copy link
Contributor

It fails for Python3.7. Looking at the CI log it appears that tests are simply not run for OSX arm64.

@BastianZim
Copy link
Member

Yes, arm64 is cross-compiled and conda-forge therefore cannot test it so all of the cross-compiled builds can only be tested locally.

@maxbachmann
Copy link
Contributor

I wrote a temporary workaround which packages the pure python version of rapidfuzz on the broken platforms, so this will not stop poetry 1.2.0 from building for conda: #120.

Obviously it would be better if a fix for this issue can be found.

@maxbachmann
Copy link
Contributor

As a small update: source of the issue is that for whatever reason scikit-build picks up an x64_86 compiler instead of the arm64 compiler when cross compiling for mac os arm64, which is not going to run on an arm64 target

@henryiii
Copy link
Contributor

And this is only on conda-forge, it happily cross compiles on cibuildwheel, FWIW.

@maxbachmann
Copy link
Contributor

maxbachmann commented Sep 22, 2022

@henryiii I tried the patches from rapidfuzz/JaroWinkler#11 here in conda-forge as well: conda-forge/jarowinkler-feedstock#22

They appear to fix most issues:

  • fixes the CPython3.7 segfault while testing
  • mac os arm64 picks up the correct compiler -> should work, but someone should validate

The PyPy build issues still exist, but that might be fixed once there is a new release of scikit-build (I do not know whether I can reference the master version here).

edit: in fact even pypy windows was fixed, so only pypy linux aarch64/ppc64le remains

@maxbachmann
Copy link
Contributor

maxbachmann commented Sep 26, 2022

I released working binaries for all broken platforms except PyPy on Linux arm64/ppc64, which still fails to build.

@ngam
Copy link

ngam commented Sep 26, 2022

@maxbachmann Thank you for your work on this. You're a star! 🌟

@maxbachmann
Copy link
Contributor

Special thanks to @henryiii as he did most of the work on this.

@maxbachmann
Copy link
Contributor

finally all builds are fixed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants