Reactivate dashboard CI #952

notoraptor · 2022-06-21T15:25:34Z

Description

Re-add CI tests for dashboard src.

Changes

Make sure to run backend in repo root, so that it can find database PKL file

Checklist

Tests

I added corresponding tests for bug fixes and new features. If possible, the tests fail without the changes
All new and existing tests are passing ($ tox -e py38; replace 38 by your Python version if necessary)

Documentation

I have updated the relevant documentation related to my changes

Quality

I have read the CONTRIBUTING doc
My commits messages follow this format
My code follows the style guidelines ($ tox -e lint)

lebrice · 2022-06-22T23:54:04Z

Is there a way for this to only run when a change has been made to the dashboard?

notoraptor · 2022-06-22T23:56:50Z

Hi @lebrice yes, it should be possible !

notoraptor · 2022-06-28T18:10:36Z

Hi @bouthilx !

Thanks to your suggestion to use cpulimit, I am now able to reproduce timeout issues on my computer.

After further investigations, my current hypothesis is that timeout in frontend is (at least) based on timeout configuration for gunicorn in backend side, which seems to be 30 seconds by default: https://docs.gunicorn.org/en/stable/settings.html#timeout .

When running with cpulimit (with 10% limit of CPU usage applied to worker processus), I found in backend output these lines that seems to occur just before timeout error in frontend:

[2022-06-28 12:06:12 -0400] [172390] [CRITICAL] WORKER TIMEOUT (pid:172408)
[2022-06-28 12:06:12 -0400] [172408] [INFO] Worker exiting (pid: 172408)
[2022-06-28 12:06:12 -0400] [172390] [WARNING] Worker with pid 172408 was terminated due to signal 9
[2022-06-28 12:06:12 -0400] [172579] [INFO] Booting worker with pid: 172579

Backend terminates the worker due to signal 9 then restarts another worker, and this seems to disconnect API call.

I don't yet know why this happens, but I decide to just increase timeout in backend. To do that, I modified your branch feature/benchmark_webapi into another branch feature/benchmark_webapi_rebased with a rebase and supplementary commits to make sure timeout config is taken into account: /~https://github.com/Epistimio/orion/compare/develop...notoraptor:orion:feature/benchmark_webapi_rebased?expand=1 (see changes in files .github/workflows/orion/orion_config.yaml and src/orion/core/cli/serve.py ).

What do you think ?

bouthilx · 2022-06-28T19:37:36Z

Thank you for the detailed explanation! I think increasing the timeout on the backend is a good solution. I did experience timeouts on the backend too when I was playing with large benchmarks. We'll need to improve the efficiency of the backend/API latter, but for now increasing the timeout is good.

bouthilx · 2022-06-29T01:58:37Z

It's a bit late for me to say this but you could have removed all tests except for test-dashboard-build and make it run 10 times. This would have make it easier to automate the 10 runs and would have saved a lot of compute time. Sorry for saying this only after 7 reruns 😅

notoraptor · 2022-07-01T02:26:32Z

(for record/reminder) I am finally to get orion backend output when errors occur, using Github artifacts. I fortunately got an error with last commit, and artifact for corresponding error (with node 14.x) is available at bottom of this page: /~https://github.com/Epistimio/orion/actions/runs/2593818816

There is indeed an error reported in log, and then worker is restarted:

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/pool.py", line 729, in _terminate_pool
    p.join()
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
    sys.exit(1)
SystemExit: 1
[2022-07-01 01:59:41 +0000] [1862] [WARNING] Worker with pid 1873 was terminated due to signal 9
[2022-07-01 01:59:41 +0000] [2692] [INFO] Booting worker with pid: 2692

Some remarks:

exception seems related to gunicorn
I guess the worker restarted in backend is what causes the error in frontend.
error in frontend is because an expected test was not found in page after 300 seconds, so it seems worker lasts for at least 300 seconds in backend, which corresponds to 300 seconds timeout set in backend. So, is the exception in backend caused because of the timeout, or is it raised for another reason?

This reverts commit 7dfaed5.

Move Axios and HTTP implementations in separate files. Still use Axios implementations.

…ark_webapi_rebased for orion backend - [dashboard/src] Get back to Axios only

…ion/runs/7097242640?check_suite_focus=true Trigger CI again, with backend branch updated to use only 1 worker as in default

CI passed. Trigger again bis. Ci passed. Trigger again (x3)

…e updated here. - add timeout 300 - set workers to 1 and threads to 1 as it is by default: https://docs.gunicorn.org/en/stable/settings.html

CI passed (2). Retry CI passed (3). Retry. CI passed (4). Retry. CI passed (5). Retry. CI passed (6). Retry. CI passed (7). Retry.

…onfig timeout to 300 seconds.

Try to kill backend process on end, to make sure log files are saved on disk

gunicorn: set worker_tmp_dir to a local directory to try preventing workers block testing: - update testing libraries - update tests according to new versions of testing libraries src: - move DndProvier closest to drag-and-drop use to prevent warnings in tests - revert queryServer code back to previous code, so that backend queries run asynchronously again

notoraptor · 2022-11-09T18:38:58Z

Rebased

dashboard/src/src/__tests__/experiments.databasePage.test.js

…tAll

bouthilx

LGTM, thanks!! :)

notoraptor force-pushed the reactivate-dashboard-ci branch from 999ec1b to 0c034be Compare June 28, 2022 15:45

notoraptor force-pushed the reactivate-dashboard-ci branch 3 times, most recently from 1d6925f to d289e3e Compare July 12, 2022 17:37

notoraptor force-pushed the reactivate-dashboard-ci branch from d289e3e to e38b9b5 Compare September 13, 2022 03:28

notoraptor force-pushed the reactivate-dashboard-ci branch from 046990c to b29ae87 Compare September 25, 2022 20:54

notoraptor changed the title ~~(WIP) Reactivate dashboard ci~~ Reactivate dashboard CI Sep 25, 2022

notoraptor added 17 commits November 9, 2022 13:38

Revert "Remove dashboard CI"

525dbb3

This reverts commit 7dfaed5.

ci: make sure to run orion backend in repository root

12c6837

Rename dashboarc CI tests workflow

b96b94b

Re-implement API call function based on NodeJS http module.

1e2b963

Move Axios and HTTP implementations in separate files. Still use Axios implementations.

- [dashboard/ci] use rebased and updated branch origin/feature/benchm…

1f3d7ce

…ark_webapi_rebased for orion backend - [dashboard/src] Get back to Axios only

CI passed, trigger again

59f8919

CI faile with internal server errors: /~https://github.com/Epistimio/or…

87dc839

…ion/runs/7097242640?check_suite_focus=true Trigger CI again, with backend branch updated to use only 1 worker as in default

CI passed. Trigger again.

d936a25

CI passed. Trigger again bis. Ci passed. Trigger again (x3)

CI failed. Backend uses config YAML from current PR, so config must b…

1b68954

…e updated here. - add timeout 300 - set workers to 1 and threads to 1 as it is by default: https://docs.gunicorn.org/en/stable/settings.html

CI passed. Retry

6cf464c

CI passed (2). Retry CI passed (3). Retry. CI passed (4). Retry. CI passed (5). Retry. CI passed (6). Retry. CI passed (7). Retry.

dashboard tests: factorize and increment testing waitFor() function c…

115c1c8

…onfig timeout to 300 seconds.

Increase timeout for axios and test duration.

db204dd

Try to get orion log from dashboard CI

e8445aa

Relative path is not allowed for artifacts

2d82e80

Try to check if artifact folder is created

145ac93

Try to kill backend process on end, to make sure log files are saved on disk

Fix YAML syntax

4ce4946

Correctly pass environment variables through steps

dec3bef

notoraptor added 21 commits November 9, 2022 13:38

Reduce number of workers and threads

e6eb842

Increase number of workers

d2f70b5

Increase workers to 4

33813a9

Back to 2 workers

10ec682

Make orion backend more verbose

69709c4

Try to check filesystem type

ff3b188

Get info about current directory

78ca9ff

Use only 1 worker

38a21f9

Re-run

e9d231a

Re-run again

6a2c14c

Try to use MongoDB.

8977204

Do not create user

6d67e87

Reformat code

6cf9dcc

Try to simplify mongodb config in Github CI

894e3fb

Test display trial info in a dialog.

ca4840c

Test pagination

0b9c2f4

Test (de)select columns

8521efa

Test sort columns

7ba2cb4

Test drag-and-drop columns.

c7bd970

Update dates for mongodb

e395cdc

notoraptor force-pushed the reactivate-dashboard-ci branch from b29ae87 to e395cdc Compare November 9, 2022 18:38

bouthilx reviewed Nov 11, 2022

View reviewed changes

dashboard/src/src/__tests__/experiments.databasePage.test.js Show resolved Hide resolved

[dashboard/test/database page] check supplementary columns fore selec…

866c034

…tAll

bouthilx approved these changes Nov 23, 2022

View reviewed changes

bouthilx merged commit ea7ae76 into Epistimio:develop Nov 23, 2022

bouthilx added the ci label Dec 19, 2022

notoraptor deleted the reactivate-dashboard-ci branch January 19, 2023 17:31

notoraptor mentioned this pull request Mar 2, 2023

Release 0.2.7rc #1087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reactivate dashboard CI #952

Reactivate dashboard CI #952

notoraptor commented Jun 21, 2022

lebrice commented Jun 22, 2022

notoraptor commented Jun 22, 2022

notoraptor commented Jun 28, 2022

bouthilx commented Jun 28, 2022

bouthilx commented Jun 29, 2022

notoraptor commented Jul 1, 2022

notoraptor commented Nov 9, 2022

bouthilx left a comment

Reactivate dashboard CI #952

Reactivate dashboard CI #952

Conversation

notoraptor commented Jun 21, 2022

Description

Changes

Checklist

Tests

Documentation

Quality

lebrice commented Jun 22, 2022

notoraptor commented Jun 22, 2022

notoraptor commented Jun 28, 2022

bouthilx commented Jun 28, 2022

bouthilx commented Jun 29, 2022

notoraptor commented Jul 1, 2022

notoraptor commented Nov 9, 2022

bouthilx left a comment

Choose a reason for hiding this comment