enabling build stage gpu_int64 to enable large tensor nightly runs #17546

access2rohit · 2020-02-07T20:07:34Z

Description

Fixes nightly build failure due to absence of large tensor build artifact required for testing large tensor support on a nightly basis.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

access2rohit · 2020-02-07T20:07:49Z

@mxnet-label-bot add [pr-awaiting-review]

access2rohit · 2020-02-07T20:08:07Z

@apeforest can you review this

apeforest · 2020-02-07T21:46:05Z

tests/nightly/JenkinsfileForBinaries

@@ -49,7 +49,7 @@ core_logic: {
          utils.pack_lib('cpu_int64', mx_cmake_lib)
        }
      }
-    },
+    }*/,


I think this is a little bit confusing. We are actually testing cpu context on GPU platforms here. The reason we don't use CPU node is simply the CPU node type does not have big enough memory as GPU node.

We should either modify the CPU node type with a memory optimized one such as R5 (this is the ideal solution)

Or we rename the pipeline stage so that it's clear to people these are tests running on CPUs and remove the commented CPU test here.

I can change “GPU: USE_INT64_TENSOR_SIZE” -> “USE_INT64_TENSOR_SIZE” to enable first.

Changing instance type is a temp fix. Theoritically everything should be able to run on CPU. So, fixing MXNet memory management would be the ideal solution but that would take lot of time. Switching to R5 will bump up our costs a bit too.

Let me know if "USE_INT64_TENSOR_SIZE" makes sense as interim solution to enable LT on nightly first.

access2rohit · 2020-02-07T22:09:32Z

@apeforest updated the PR after incorporating your suggestions as discussed offline.

access2rohit · 2020-02-10T00:12:49Z

@mxnet-label-bot update [pr-awaiting-merge]

…pache#17546)

lanking520 added the pr-awaiting-review PR is waiting for code review label Feb 7, 2020

access2rohit requested a review from apeforest February 7, 2020 20:17

ChaiBapchya approved these changes Feb 7, 2020

View reviewed changes

apeforest reviewed Feb 7, 2020

View reviewed changes

access2rohit force-pushed the re-enable_large_tensor branch from 2972ed5 to 1238839 Compare February 7, 2020 22:05

enabling build stage gpu_int64 to enable large tensor nightly runs

3886b96

access2rohit force-pushed the re-enable_large_tensor branch from 1238839 to 3886b96 Compare February 7, 2020 22:08

access2rohit requested a review from apeforest February 10, 2020 00:12

lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 10, 2020

apeforest approved these changes Feb 10, 2020

View reviewed changes

apeforest merged commit 4a827f3 into apache:master Feb 10, 2020

apeforest mentioned this pull request Feb 12, 2020

[mxnet 2.0] [item 2.4] Turning on large tensor support by default #17331

Open

zheyuye pushed a commit to zheyuye/incubator-mxnet that referenced this pull request Feb 19, 2020

enabling build stage gpu_int64 to enable large tensor nightly runs (a…

d00b049

…pache#17546)

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020

enabling build stage gpu_int64 to enable large tensor nightly runs (a…

44b4157

…pache#17546)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabling build stage gpu_int64 to enable large tensor nightly runs #17546

enabling build stage gpu_int64 to enable large tensor nightly runs #17546

access2rohit commented Feb 7, 2020 •

edited

Loading

access2rohit commented Feb 7, 2020

access2rohit commented Feb 7, 2020

apeforest Feb 7, 2020 •

edited

Loading

access2rohit Feb 7, 2020

access2rohit commented Feb 7, 2020

access2rohit commented Feb 10, 2020

enabling build stage gpu_int64 to enable large tensor nightly runs #17546

enabling build stage gpu_int64 to enable large tensor nightly runs #17546

Conversation

access2rohit commented Feb 7, 2020 • edited Loading

Description

Checklist

Essentials

access2rohit commented Feb 7, 2020

access2rohit commented Feb 7, 2020

apeforest Feb 7, 2020 • edited Loading

Choose a reason for hiding this comment

access2rohit Feb 7, 2020

Choose a reason for hiding this comment

access2rohit commented Feb 7, 2020

access2rohit commented Feb 10, 2020

access2rohit commented Feb 7, 2020 •

edited

Loading

apeforest Feb 7, 2020 •

edited

Loading