Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.x] add large matrix tests for linalg ops: det, inverse, trsm, trmm #18744

Merged
merged 14 commits into from
Jul 27, 2020

Conversation

mseth10
Copy link
Contributor

@mseth10 mseth10 commented Jul 17, 2020

Description

In this PR, we add large matrix (Input Size > 2^32 -1) tests for the following linear algebra ops: det, inverse, trsm, trmm
We test for shape and numerical correctness of forward and backward passes, with normal and batched inputs.

Here's the pytest output:

>>> pytest tests/nightly/test_large_array.py::test_linalg
================================================================================= test session starts =================================================================================
platform linux -- Python 3.7.7, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/ubuntu/incubator-mxnet
plugins: remotedata-0.3.2, openfiles-0.4.0, astropy-header-0.1.2, hypothesis-5.8.3, arraydiff-0.3, doctestplus-0.5.0
collected 1 item                                                                                                                                                                      

tests/nightly/test_large_array.py .                                                                                                                                             [100%]

================================================================================== warnings summary ===================================================================================
python/mxnet/contrib/onnx/mx2onnx/_op_translations.py:67
  /home/ubuntu/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py:67: DeprecationWarning: invalid escape sequence \(
    tuple_re = re.compile('\([0-9L|,| ]+\)')

/home/ubuntu/anaconda3/lib/python3.7/site-packages/nose/importer.py:12
  /home/ubuntu/anaconda3/lib/python3.7/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================================================== 1 passed, 2 warnings in 8859.56s (2:27:39) ======================================================================

@mxnet-bot
Copy link

Hey @mseth10 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-gpu, centos-cpu, sanity, unix-cpu, miscellaneous, edge, website, windows-cpu, centos-gpu, unix-gpu, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor

leezu commented Jul 17, 2020

The nightly large tensor tests are already broken: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/752/pipeline/96

Adding more tests may not help. Should they be fixed and enabled as part of the PR runs? As this will become a default feature, I don't think testing nightly is sufficient (especially as the test only runs 30min).

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChaiBapchya
Copy link
Contributor

1.x edge pipeline will be fixed in this cherrypick #18742
It includes #18713 that solves the cuda issue in nvidia jetson.
Once my PR is merged, pl rebase.

Ofcourse, we have to decide if LT tests are to be included in per CI test alongside nightly.

@mseth10 mseth10 force-pushed the add-lts-nd-tests branch from aaa6120 to fce478f Compare July 23, 2020 10:14
@mseth10
Copy link
Contributor Author

mseth10 commented Jul 23, 2020

@access2rohit please review

@access2rohit
Copy link
Contributor

The nightly large tensor tests are already broken: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/752/pipeline/96

Adding more tests may not help. Should they be fixed and enabled as part of the PR runs? As this will become a default feature, I don't think testing nightly is sufficient (especially as the test only runs 30min).

@leezu This is currently for MXnet1.x branch and not master. For master more tests will be added. In the meantime since we are removing NDarray's completely it makes sense to disabled them from nightly.

@mseth10 mseth10 changed the title add linalg large matrix tests [v1.x] add linalg large matrix tests Jul 23, 2020
@mseth10 mseth10 requested a review from szha as a code owner July 23, 2020 22:37
@mseth10 mseth10 force-pushed the add-lts-nd-tests branch from b56fad4 to 7dc2004 Compare July 23, 2020 22:54
@mseth10
Copy link
Contributor Author

mseth10 commented Jul 24, 2020

@access2rohit made the changes, can you please approve

@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Jul 24, 2020

Also I don't see a out.backward() invocation. How are the gradients been tested? @mseth10

@mseth10 mseth10 force-pushed the add-lts-nd-tests branch from b76a34a to 303e5ab Compare July 24, 2020 21:46
@mseth10 mseth10 force-pushed the add-lts-nd-tests branch from 7ec9749 to 889bfd2 Compare July 25, 2020 00:39
@mseth10
Copy link
Contributor Author

mseth10 commented Jul 25, 2020

All unit tests pass. Here are the results using pytest:

>>> pytest tests/nightly/test_large_array.py::test_linalg
================================================================================= test session starts =================================================================================
platform linux -- Python 3.7.7, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/ubuntu/incubator-mxnet
plugins: remotedata-0.3.2, openfiles-0.4.0, astropy-header-0.1.2, hypothesis-5.8.3, arraydiff-0.3, doctestplus-0.5.0
collected 1 item                                                                                                                                                                      

tests/nightly/test_large_array.py .                                                                                                                                             [100%]

================================================================================== warnings summary ===================================================================================
python/mxnet/contrib/onnx/mx2onnx/_op_translations.py:67
  /home/ubuntu/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/_op_translations.py:67: DeprecationWarning: invalid escape sequence \(
    tuple_re = re.compile('\([0-9L|,| ]+\)')

/home/ubuntu/anaconda3/lib/python3.7/site-packages/nose/importer.py:12
  /home/ubuntu/anaconda3/lib/python3.7/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================================================== 1 passed, 2 warnings in 8859.56s (2:27:39) ======================================================================

@mseth10
Copy link
Contributor Author

mseth10 commented Jul 25, 2020

Also I don't see a out.backward() invocation. How are the gradients been tested? @mseth10

Added backward invocation @ChaiBapchya

@access2rohit
Copy link
Contributor

access2rohit commented Jul 25, 2020

I see missing shape checks, can you add them? ... rest LGTM!

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 26, 2020

LGTM thanks Manu for the tweaks

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Jul 26, 2020
@mseth10 mseth10 changed the title [v1.x] add linalg large matrix tests [v1.x] add large matrix tests for linalg ops: det, inverse, trsm, trmm Jul 26, 2020
@mseth10
Copy link
Contributor Author

mseth10 commented Jul 26, 2020

I see missing shape checks, can you add them? ... rest LGTM!

Added shape checks. Can you please approve? @access2rohit

@access2rohit
Copy link
Contributor

access2rohit commented Jul 26, 2020

@mseth10 Did you run the tests again after making these changes ? Did they all pass ?

@mseth10
Copy link
Contributor Author

mseth10 commented Jul 26, 2020

@mseth10 Did you run the tests again after making these changes ? Did they all pass ?

Yes @access2rohit

@mseth10
Copy link
Contributor Author

mseth10 commented Jul 26, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

Copy link
Contributor

@access2rohit access2rohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean code and Tests run successfully.... LGTM!

@mseth10
Copy link
Contributor Author

mseth10 commented Jul 26, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 566d9d3 into apache:v1.x Jul 27, 2020
@mseth10 mseth10 deleted the add-lts-nd-tests branch July 27, 2020 17:05
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Aug 15, 2020
apache#18744)

* add linalg large matrix tests

* add batch inputs linalg tests

* reducing bsize to 1 to save time

* move matrix generator to utils

* passing mat size as arg

* import util fn

* fix sanity

* add mx

* call backward

* merge fn

* update grad value

* refactor tests

* add mx

* add shape check

Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-26.us-west-2.compute.internal>
@mseth10 mseth10 removed the pr-awaiting-review PR is waiting for code review label Oct 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants