Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

increased docker shared memory for nightly test #14119

Merged
merged 1 commit into from
Feb 11, 2019

Conversation

roywei
Copy link
Member

@roywei roywei commented Feb 11, 2019

Description

This is a potential fix for #14026
need to be tested on CI docker environment before merge.

Suggest to disable the test first #14120

according to #11872
Gluon data loader will hang and give connection refused error if shared memory is too small. Our nightly test on tutorials give 500m shared memory on docker now. Increasing it to 1500m.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

Copy link
Contributor

@jlcontreras jlcontreras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@roywei roywei changed the title increased docker shared memory for nightly test [WIP] increased docker shared memory for nightly test Feb 11, 2019
@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-review, Gluon]

@roywei could you add "Fixes #14026" in the PR description? This will automatically close the issue when the PR is merged.

@marcoabreu marcoabreu added Gluon pr-awaiting-review PR is waiting for code review labels Feb 11, 2019
@vandanavk
Copy link
Contributor

@mxnet-label-bot update [pr-work-in-progress, Gluon]

@marcoabreu marcoabreu added pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review labels Feb 11, 2019
@marcoabreu
Copy link
Contributor

Is there an open ticket to improve the error message?

@marcoabreu marcoabreu merged commit f906681 into apache:master Feb 11, 2019
@roywei roywei changed the title [WIP] increased docker shared memory for nightly test increased docker shared memory for nightly test Feb 11, 2019
@roywei roywei mentioned this pull request Feb 12, 2019
5 tasks
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Feb 19, 2019
drivanov pushed a commit to drivanov/incubator-mxnet that referenced this pull request Mar 4, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
@roywei roywei deleted the fix_nightly branch June 3, 2019 22:08
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gluon pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants