-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[CI] Prevent timeouts when rebuilding containers with docker. #13818
Conversation
@mxnet-label-bot add [pr-awaiting-review, Scala] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, could you please point to some PRs that have this issue?
I restarted 4 PRs because of this issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the website publish job need this increase too?
/~https://github.com/apache/incubator-mxnet/blob/master/docs/Jenkinsfile#L24
@aaronmarkham yes, could we move that into the ci/ folder for consistency? it's easy to miss if we have scripts and infrastructure in the docs folder. |
Could we hold on with the merge please. I'm not really sure whether this fixes the problem or works around another regression |
When I first put this together @marcoabreu and I discussed that, but I can't remember why it was better to have it in docs. Maybe that's changed? Marco, do you remember why? If we need to leave it there we could add some notes to the CI readme so it doesn't get overlooked. |
While we hold on CI is having time outs. It took me quite a bit to get the PR to pass CI because of the timeouts (had to manually rebuild the cache). What steps are you taking to understand if it fixes the problem? What makes you think my fix doesn't address the problem? If CI is having failures we can't merge PRs that fix CI because of protected master. |
Increase timeout from 120 to 180 for pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small concern there. Otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
Responded, would appreciate if this would be merged to prevent CI failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great to merge
* upstream/master: (109 commits) Code modification for testcases of various network models in directory example (apache#12498) [CI] Prevent timeouts when rebuilding containers with docker. (apache#13818) fix Makefile for rpkg (apache#13590) change to compile time (apache#13835) Disabled flaky test (apache#13758) Improve license_header tool by only traversing files under revision c… (apache#13803) Removes unneeded nvidia driver ppa installation (apache#13814) Add Local test stage and option to jump directly to menu item from commandline (apache#13809) Remove MXNET_STORAGE_FALLBACK_LOG_VERBOSE from test_autograd.py (apache#13830) Fix scala doc build break for v1.3.1 (apache#13820) [MXNET-1263] Unit Tests for Java Predictor and Object Detector APIs (apache#13794) [MXNET-1260] Float64 DType computation support in Scala/Java (apache#13678) onnx export ops (apache#13821) [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (apache#13676) fix minor indentation (apache#13827) Fixing a symlink issue with R install (apache#13708) remove useless code (apache#13777) ONNX ops: norm exported and lpnormalization imported (apache#13806) Add new Maven build for Scala package (apache#13819) Dockerfiles for Publish Testing (apache#13707) ...
…#13818) * Prevent timeouts when rebuilding containers with docker. Increase timeout from 120 to 180 for pipelines * Increase docker cache timeout * Increase timeout also for docs * limit parallel builds to 10
Increase timeout from 120 to 180 for pipelines
Increase timeout for docker pull as we get timeout when rebuilding the docker cache:
http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-docker-cache-refresh/job/master/1190/console
Limit parallel builds to 10
Description
Mitigation for failing CI
fixes #13817
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.