Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Re-Enabling Large Tensor and Vector Nightly on GPU #16164

Merged
merged 1 commit into from
Feb 6, 2020

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented Sep 13, 2019

Description

Reverts PR: #15141. Since the fix: #17450 for issue #14981 has been merged
To be merged only after nightly tests are restored. This test has been re-enabled since PRs have been merged that have significantly reduced memory footprint of ops like topk, argsort and sort from over 400GB to around 220GB on Large Tensor tests.

Also adding large vector nightly

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Tests

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_test_large_tensor

2020-01-28 01:08:03,925 - root - INFO - Started container: 5bceca2b8f26
+ NOSE_COVERAGE_ARGUMENTS='--with-coverage --cover-inclusive --cover-xml --cover-branches --cover-package=mxnet'
+ NOSE_TIMER_ARGUMENTS='--with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error'
+ CI_CUDA_COMPUTE_CAPABILITIES='-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_70,code=sm_70'
+ CI_CMAKE_CUDA_ARCH='5.2 7.0'
+ set +x
+ export PYTHONPATH=./python/
+ PYTHONPATH=./python/
+ export DMLC_LOG_STACK_TRACE_DEPTH=10
+ DMLC_LOG_STACK_TRACE_DEPTH=10
+ nosetests-3.4 tests/nightly/test_large_array.py:test_tensor
S
----------------------------------------------------------------------
Ran 1 test in 125.654s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_array.py:test_nn
[01:15:48] src/executor/graph_executor.cc:2062: Subgraph backend MKLDNN is activated.
[01:15:52] src/executor/graph_executor.cc:2062: Subgraph backend MKLDNN is activated.
S
----------------------------------------------------------------------
Ran 1 test in 344.892s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_array.py:test_basic
S
----------------------------------------------------------------------
Ran 1 test in 156.411s

OK (SKIP=1)

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_test_large_vector

+ NOSE_TIMER_ARGUMENTS='--with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error'
+ CI_CUDA_COMPUTE_CAPABILITIES='-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_70,code=sm_70'
+ CI_CMAKE_CUDA_ARCH='5.2 7.0'
+ set +x
+ export PYTHONPATH=./python/
+ PYTHONPATH=./python/
+ export DMLC_LOG_STACK_TRACE_DEPTH=10
+ DMLC_LOG_STACK_TRACE_DEPTH=10
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_tensor
S
----------------------------------------------------------------------
Ran 1 test in 107.439s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_nn
[06:22:40] src/executor/graph_executor.cc:1982: Subgraph backend MKLDNN is activated.
[06:27:34] src/executor/graph_executor.cc:1982: Subgraph backend MKLDNN is activated.
.
----------------------------------------------------------------------
Ran 1 test in 653.536s

OK
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_basic
S
----------------------------------------------------------------------
Ran 1 test in 65.930s

OK (SKIP=1)

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@access2rohit
Copy link
Contributor Author

@apeforest can you review ?

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Sep 16, 2019
@Vikas-kum Vikas-kum mentioned this pull request Sep 16, 2019
7 tasks
@apeforest
Copy link
Contributor

Can you run all the tests in the same container as nightly and paste the results here? Thanks!

@access2rohit access2rohit force-pushed the re-enable_large_tensor branch 2 times, most recently from bd2105a to ac6cb59 Compare January 14, 2020 19:28
@access2rohit access2rohit changed the title Re-Enabling Large Tensor Nightly on GPU Re-Enabling Large Tensor and Vector Nightly on GPU Jan 14, 2020
@access2rohit access2rohit force-pushed the re-enable_large_tensor branch from ac6cb59 to f75aad8 Compare January 27, 2020 18:26
@access2rohit access2rohit force-pushed the re-enable_large_tensor branch 3 times, most recently from 4e871db to e004b56 Compare February 5, 2020 19:34
@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@access2rohit
Copy link
Contributor Author

@apeforest can you take a look ?

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot update [pr-awaiting-merge]

@lanking520 lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 6, 2020
@apeforest apeforest merged commit f850170 into apache:master Feb 6, 2020
zheyuye pushed a commit to zheyuye/incubator-mxnet that referenced this pull request Feb 19, 2020
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants