Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.8.x][BACKPORT]Stablizing CI and making binaries apache compliant #20015

Merged
merged 7 commits into from
Mar 14, 2021

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented Mar 12, 2021

Description

Backport PRs #20014 #19930 #19974 #19506 #19522

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Testing

Tested on local CD pipeline identical to the one for v1.8.x: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/rohit_v1.8.x/

access2rohit and others added 4 commits March 12, 2021 00:38
apache#19764) (apache#19930)

* Enable CUDA 11.0 on nightly development builds (apache#19295)

Remove CUDA 9.2 and CUDA 10.0

* [PIP] add build variant for cuda 11.2 (apache#19764)

* adding ci docker files for cu111 and cu112

* removing previous CUDA make versions and adding support for cuda11.2

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
…eline (apache#19974)

* migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions

* installing NCCL manually for cuda11.2 container

* set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support

* adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests

* updating cd_test containers to ubuntu 18

* adding cmake config for linux native and adding USE_KV_STORE in linux_cpu

* updating zmq builds to statically link to libmxnet.so

* updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI

* fix versions for pip install in ubuntu_core_sh add new search path for cuDNN

* finxing cudnn link problem for CUDA<=11.0

* adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images

* removing ASAN integration test from miscellaneous CI as its not required

* fix lapack path for gpu builds

* correctly installing libjpegturbo for ubuntu 18

* updating docker images of r,jekyll,julia etc test containers+ fix java version to 8

* installing libomp.so

* removing debug test as its not required. Code clean-up

* adding alternate URL source for MNIST dataset as original website is down

* skipping flaky tests issue tracked apache#20011

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Mar 12, 2021
@access2rohit
Copy link
Contributor Author

@samskalicky once this PR merges our binaries will be apache compliant

@access2rohit
Copy link
Contributor Author

@samskalicky can you review ?

@access2rohit access2rohit changed the title [DO NOT MERGE][WIP][v1.8.x][BACKPORT]Stabling CI and making binaries apache compliant [v1.8.x][BACKPORT]Stabling CI and making binaries apache compliant Mar 12, 2021
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 12, 2021
@access2rohit access2rohit changed the title [v1.8.x][BACKPORT]Stabling CI and making binaries apache compliant [v1.8.x][BACKPORT]Stablizing CI and making binaries apache compliant Mar 12, 2021
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 12, 2021
@samskalicky
Copy link
Contributor

@leezu should we upgrade the CI in 1.x to Ubuntu 18 from 16? I thought we were only doing that for master/2.0 and later

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 12, 2021
@access2rohit
Copy link
Contributor Author

@samskalicky its already merged in v1.x

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 12, 2021
@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Mar 13, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 13, 2021
@access2rohit
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 13, 2021
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments.

@@ -42,8 +42,8 @@ pipeline {
// Using string instead of choice parameter to keep the changes to the parameters minimal to avoid
// any disruption caused by different COMMIT_ID values chaning the job parameter configuration on
// Jenkins.
string(defaultValue: "mxnet_lib", description: "Pipeline to build", name: "RELEASE_JOB_TYPE")
string(defaultValue: "cpu,native,cu100,cu101,cu102,cu110", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
string(defaultValue: "mxnet_lib/static", description: "Pipeline to build", name: "RELEASE_JOB_TYPE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be mxnet_lib, we removed "/static" in a previous commit dd4661a#diff-dd43bbf192e508d18e340337cf5a6094e137fba710759718cfbde6cf38e27a54R45

@@ -17,7 +17,7 @@

# Artifact Repository - Pushing and Pulling libmxnet

The artifact repository is an S3 bucket accessible only to restricted Jenkins nodes. It is used to store compiled MXNet artifacts that can be used by downstream CD pipelines to package the compiled libraries for different delivery channels (e.g. DockerHub, PyPI, Maven, etc.). The S3 object keys for the files being posted will be prefixed with the following distinguishing characteristics of the binary: branch, commit id, operating system, variant and dependency linking strategy (static or dynamic). For instance, s3://bucket/73b29fa90d3eac0b1fae403b7583fdd1529942dc/ubuntu16.04/cu92mkl/static/libmxnet.so
The artifact repository is an S3 bucket accessible only to restricted Jenkins nodes. It is used to store compiled MXNet artifacts that can be used by downstream CD pipelines to package the compiled libraries for different delivery channels (e.g. DockerHub, PyPI, Maven, etc.). The S3 object keys for the files being posted will be prefixed with the following distinguishing characteristics of the binary: branch, commit id, operating system, variant and dependency linking strategy (static or dynamic). For instance, s3://bucket/73b29fa90d3eac0b1fae403b7583fdd1529942dc/ubuntu16.04/cu100/static/libmxnet.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's ubuntu18.04 now in s3 folders

RUN /work/deb_ubuntu_ccache.sh

COPY install/ubuntu_python.sh /work/
COPY install/requirements /work/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, duplicate of line 25

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Mar 13, 2021
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the release ASF compliant.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 13, 2021
@mseth10 mseth10 merged commit a0535dd into apache:v1.8.x Mar 14, 2021
mseth10 added a commit to mseth10/incubator-mxnet that referenced this pull request Mar 15, 2021
…pache#20015)

* [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (apache#19295)(apache#19764) (apache#19930)

* Enable CUDA 11.0 on nightly development builds (apache#19295)

Remove CUDA 9.2 and CUDA 10.0

* [PIP] add build variant for cuda 11.2 (apache#19764)

* adding ci docker files for cu111 and cu112

* removing previous CUDA make versions and adding support for cuda11.2

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (apache#19974)

* migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions

* installing NCCL manually for cuda11.2 container

* set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support

* adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests

* updating cd_test containers to ubuntu 18

* adding cmake config for linux native and adding USE_KV_STORE in linux_cpu

* updating zmq builds to statically link to libmxnet.so

* updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI

* fix versions for pip install in ubuntu_core_sh add new search path for cuDNN

* finxing cudnn link problem for CUDA<=11.0

* adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images

* removing ASAN integration test from miscellaneous CI as its not required

* fix lapack path for gpu builds

* correctly installing libjpegturbo for ubuntu 18

* updating docker images of r,jekyll,julia etc test containers+ fix java version to 8

* installing libomp.so

* removing debug test as its not required. Code clean-up

* adding alternate URL source for MNIST dataset as original website is down

* skipping flaky tests issue tracked apache#20011

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* update cudnn from 7 to 8 for cu102 (apache#19506)

* update cudnn from 7 to 8 for cu102 (apache#19522)

* downloading MNIST dataset from alternate URL (apache#20014)

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* fixing CI issue with v1.8.x

* addressing review comments

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
Co-authored-by: Manu Seth <22492939+mseth10@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants