Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_eigh_op, test_eigvalsh_op tolerance #39446

Closed
zlsh80826 opened this issue Feb 10, 2022 · 11 comments
Closed

test_eigh_op, test_eigvalsh_op tolerance #39446

zlsh80826 opened this issue Feb 10, 2022 · 11 comments
Assignees
Labels

Comments

@zlsh80826
Copy link
Collaborator

  • 标题:test_eigh_op, test_eigvalsh_op 單測問題
  • 版本、环境信息:
       1)PaddlePaddle版本:develop
       2)CPU/GPU:GPU, CUDA 11.6
       3)系统环境:Ubuntu20.04
       4)Python版本号: python3.8
  • 复现信息:
ctest -R test_eigh_op
ctest -R test_eigvalsh_op
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
  1. 在 CUDA 11.6 後 cusolver 改進了 Ssyevd 的性能, 但同時和 numpy 的差異會略微放大 test_eigh_op, test_eigvalsh_op tolerance 需要修改至 2e-6 之上
  2. Complex 單測需求 Hermitian matrix (對角元素 imag 的部分需要為 0), 參考文件, test_eigh_op, test_eigvalsh_op
@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@Zjq9409
Copy link
Contributor

Zjq9409 commented Feb 15, 2022

eigh 算子修改PR链接为:#39568

@zlsh80826 zlsh80826 assigned huangjun12 and Zjq9409 and unassigned yaoxuefeng6 and Zjq9409 Feb 17, 2022
@zlsh80826
Copy link
Collaborator Author

zlsh80826 commented Feb 17, 2022

Thank @Zjq9409. I am running the unit test on our environment to ensure the PR works well.
Hi @huangjun12, test_eigvalsh_op has same the issue with test_eigh_op. Could you help to update the test_eigval_sh_op.py? Thanks

@zlsh80826
Copy link
Collaborator Author

@Zjq9409 I have tested #39568, but it seems I had wrong conclusion in the past. The tolerance needs to be more higher. Following is the error message in our environment.
I investaged the reason and found that 2e-6 is for [5, 5] input(the case I investigated before). The [32, 32] input needs a higher tolerance (atol=6e-5, rtol=5e-6)

test_eigh_op failed
 .......F....
======================================================================
FAIL: test_check_output_gpu (test_eigh_op.TestEighGPUCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/paddle/paddle/build/python/paddle/fluid/tests/unittests/test_eigh_op.py", line 75, in test_check_output_gpu
    np.testing.assert_allclose(
  File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=2e-06, atol=2e-06

Mismatched elements: 10 / 32 (31.2%)
Max absolute difference: 5.054474e-05
Max relative difference: 4.041603e-06
 x: array([-2.78129 , -2.573316, -2.460475, -2.090341, -1.802978, -1.725366,
       -1.633694, -1.468698, -1.250094, -0.896905, -0.634157, -0.491122,
       -0.47392 , -0.3968  , -0.152721, -0.108726, -0.039511,  0.275858,...
 y: array([-2.781283, -2.573307, -2.460468, -2.090336, -1.802972, -1.72536 ,
       -1.633688, -1.468692, -1.250091, -0.896903, -0.634154, -0.49112 ,
       -0.473918, -0.396798, -0.15272 , -0.108726, -0.039511,  0.275857,...

----------------------------------------------------------------------
Ran 12 tests in 2.927s

FAILED (failures=1)

@Zjq9409
Copy link
Contributor

Zjq9409 commented Feb 21, 2022

@Zjq9409 I have tested #39568, but it seems I had wrong conclusion in the past. The tolerance needs to be more higher. Following is the error message in our environment. I investaged the reason and found that 2e-6 is for [5, 5] input(the case I investigated before). The [32, 32] input needs a higher tolerance (atol=6e-5, rtol=5e-6)

test_eigh_op failed
 .......F....
======================================================================
FAIL: test_check_output_gpu (test_eigh_op.TestEighGPUCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/paddle/paddle/build/python/paddle/fluid/tests/unittests/test_eigh_op.py", line 75, in test_check_output_gpu
    np.testing.assert_allclose(
  File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=2e-06, atol=2e-06

Mismatched elements: 10 / 32 (31.2%)
Max absolute difference: 5.054474e-05
Max relative difference: 4.041603e-06
 x: array([-2.78129 , -2.573316, -2.460475, -2.090341, -1.802978, -1.725366,
       -1.633694, -1.468698, -1.250094, -0.896905, -0.634157, -0.491122,
       -0.47392 , -0.3968  , -0.152721, -0.108726, -0.039511,  0.275858,...
 y: array([-2.781283, -2.573307, -2.460468, -2.090336, -1.802972, -1.72536 ,
       -1.633688, -1.468692, -1.250091, -0.896903, -0.634154, -0.49112 ,
       -0.473918, -0.396798, -0.15272 , -0.108726, -0.039511,  0.275857,...

----------------------------------------------------------------------
Ran 12 tests in 2.927s

FAILED (failures=1)

已经更新rtol和atol值

@huangjun12
Copy link
Contributor

huangjun12 commented Feb 23, 2022

Thank @Zjq9409. I am running the unit test on our environment to ensure the PR works well.
Hi @huangjun12, test_eigvalsh_op has same the issue with test_eigh_op. Could you help to update the test_eigval_sh_op.py? Thanks

eigvalsh修复pr:#39841

@zlsh80826
Copy link
Collaborator Author

@huangjun12
The atol needs to be 6e-5 and rtol needs to be 5e-6. The reason is discussed at #39446 (comment).
Reference: #39568

@huangjun12
Copy link
Contributor

@huangjun12
The atol needs to be 6e-5 and rtol needs to be 5e-6. The reason is discussed at #39446 (comment).
Reference: #39568

Done

@zlsh80826 zlsh80826 changed the title test_eigh_op, test_eigvalsh_op 單測問題 test_eigh_op, test_eigvalsh_op unit test tolerance Mar 11, 2022
@zlsh80826 zlsh80826 changed the title test_eigh_op, test_eigvalsh_op unit test tolerance test_eigh_op, test_eigvalsh_op tolerance Mar 11, 2022
@zlsh80826
Copy link
Collaborator Author

#40699

@zlsh80826
Copy link
Collaborator Author

This issue is blocked by #40699

@zlsh80826
Copy link
Collaborator Author

#40699 merged. Close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants