-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 'release-debug' build for ATDM Trilinos builds and use to avoid timeouts #3633
Comments
FYI: I think we should call this new build type |
This will allow many of the ATDM Trilinos 'debug' builds to be switched to 'release-debug' builds and help to avoid a bunch of timeouts that we are dealing with.
…rilinos#3633) I renamed the 'cuda' builds to 'cuda-9.2' builds since that is what they are and that matches the Jenkins drive names. I kept the existing cuda-9.2-debug-Power9-Volta70 build since there are currently not any timing out tests in that build and I figured that the CUDA builds was most likey the one a developer would want to run a debug with. But I created a cuda-9.2-release-debug-Power9-Volta70 build so that we can avoid having to disable slow Kokkos, KokkosKernels, and other tests that run super slow with -O0. I just changed the build gnu-debug-openmp-Power9-Volta70 to a gnu-release-debug-openmp-Power9-Volta70 build since I don't think it is as important to run this build with a debugger and the full 'debug' build currently has some timing-out tests for Kokkos and KokkosKernals as described in trilinos#3336. If the APP teams tell us they want a full gnu-debug-openmp-Power9-Volta70 build, we will add one back. NOTE: By having both 'debug' and 'release-debug' builds, we can be free to disable some slow tests in the 'debug' build and not loose any runtime debug checking since these tests will be running in the 'release-debug' build. So going forward, if a test times-out in the 'debug' build but not the 'release-debug' build, then we will just disable it in the 'debug' build and move on.
…rilinos#3633) I kept the existing cuda-9.2-debug-Power9-Volta70 build since there are currently not any timing out tests in that build and I figured that the CUDA build was most likey the one a developer would want to run with a debugger. But I created a new cuda-9.2-release-debug-Power9-Volta70 build so that we can avoid having to disable slow Kokkos, KokkosKernels, and other tests that run super slow with -O0. I changed the build gnu-debug-openmp-Power9-Volta70 to a gnu-release-debug-openmp-Power9-Volta70 build since I don't think it is as important to run this build with a debugger and the full 'debug' build and this build currently has some timing-out tests for Kokkos and KokkosKernals as described in trilinos#3336. (The new gnu-release-debug-openmp-Power9-Volta70 build has not have any timeouts.) If the APP teams tell us they want a full gnu-debug-openmp-Power9-Volta70 build, then we will add one back and deal with the timeouts. NOTE: By having both 'debug' and 'release-debug' builds, we can be free to disable some slow tests in the full 'debug' build and not loose much runtime debug checking since these tests will be running in the 'release-debug' build (with runtime debug checking enabled). So going forward, if a test times-out in the 'debug' build but not the 'release-debug' build, then we will just disable it in the 'debug' build and move on. I also renamed the 'cuda' builds to 'cuda-9.2' builds since that is what they are and that matches the Jenkins drive names.
…terman' cuda-9.2-debug build (trilinos#3336) Now that this test is running and passing in the new build Trilinos-atdm-waterman-cuda-9.2-release-debug (see trilinos#3659 and trilinos#3633), it is fine to disable this in this full -O3 build. # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # On branch 3336-waterman-disable-kokkoscontainers-test # Changes to be committed: # modified: cmake/std/atdm/waterman/tweaks/CUDA-9.2-DEBUG-CUDA-POWER9-VOLTA70.cmake #
…terman' cuda-9.2-debug build (trilinos#3336) Now that this test is running and passing in the new build Trilinos-atdm-waterman-cuda-9.2-release-debug (see trilinos#3659 and trilinos#3633), it is fine to disable this in this full -O3 build.
CC: @fryeguy52, @trilinos/intrepid2, @trilinos/piro, @trilinos/framework After the merge of #3659 yesterday, the new What this shows is that there is some code in Trilinos that behaves differently on some platforms with optimized compiler flags and runtime debug checking turned on compared to builds with non-optimized compiler flags and debug checking disabled. What is interested is that the GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP build actually uses optimized compiler flags with runtime debug checking enabled so it is not like were don't have testing for this code path in PR testing. We just don't test all platforms in PR testing (obviously). If I was a Trilinos customer, my default development build would be against a |
@bartlettroscoe I had a look at the failures and they seem to be all due to tolerances being too tight. I can try to fix this later today or tomorrow |
I was going to switch over more Therefore, I am going to close this issue and we will deal with the new failures in other issues. |
@mperego, these same failures have been happening in lots of other builds as well as described in #2474. Let's continue the discuss on the Piro test failure there. |
Compute error using l2 norms of arrays instead of element-wise magnitude to avoid issues w/ relative errors for zero entries. This should partially address issues trilinos#3633 and trilinos#2474
Compute error using l2 norms of arrays instead of element-wise magnitude to avoid issues w/ relative errors for zero entries. This should partially address issues trilinos#3633 and trilinos#2474
…ebug-pt build (trilinos#2464, trilinos#3633) We really need to switch most of these 'debug' builds to 'release-debug' builds (see trilinos#3633). Also, the Trilinos CUDA PR build really needs to be a cuda-9.2-release-debug build since that runs more tests and catches more issues than either a cuda-9.2-opt or cuda-9.2-debug build (see trilinos#3939).
This will allow many of the ATDM Trilinos 'debug' builds to be switched to 'release-debug' builds and help to avoid a bunch of timeouts that we are dealing with.
…rilinos#3633) I kept the existing cuda-9.2-debug-Power9-Volta70 build since there are currently not any timing out tests in that build and I figured that the CUDA build was most likey the one a developer would want to run with a debugger. But I created a new cuda-9.2-release-debug-Power9-Volta70 build so that we can avoid having to disable slow Kokkos, KokkosKernels, and other tests that run super slow with -O0. I changed the build gnu-debug-openmp-Power9-Volta70 to a gnu-release-debug-openmp-Power9-Volta70 build since I don't think it is as important to run this build with a debugger and the full 'debug' build and this build currently has some timing-out tests for Kokkos and KokkosKernals as described in trilinos#3336. (The new gnu-release-debug-openmp-Power9-Volta70 build has not have any timeouts.) If the APP teams tell us they want a full gnu-debug-openmp-Power9-Volta70 build, then we will add one back and deal with the timeouts. NOTE: By having both 'debug' and 'release-debug' builds, we can be free to disable some slow tests in the full 'debug' build and not loose much runtime debug checking since these tests will be running in the 'release-debug' build (with runtime debug checking enabled). So going forward, if a test times-out in the 'debug' build but not the 'release-debug' build, then we will just disable it in the 'debug' build and move on. I also renamed the 'cuda' builds to 'cuda-9.2' builds since that is what they are and that matches the Jenkins drive names.
…terman' cuda-9.2-debug build (trilinos#3336) Now that this test is running and passing in the new build Trilinos-atdm-waterman-cuda-9.2-release-debug (see trilinos#3659 and trilinos#3633), it is fine to disable this in this full -O3 build.
Compute error using l2 norms of arrays instead of element-wise magnitude to avoid issues w/ relative errors for zero entries. This should partially address issues trilinos#3633 and trilinos#2474
…ebug-pt build (trilinos#2464, trilinos#3633) We really need to switch most of these 'debug' builds to 'release-debug' builds (see trilinos#3633). Also, the Trilinos CUDA PR build really needs to be a cuda-9.2-release-debug build since that runs more tests and catches more issues than either a cuda-9.2-opt or cuda-9.2-debug build (see trilinos#3939).
CC: @fryeguy52, @mhoemmen, @rppawlo, @bathmatt, @micahahoward, @trilinos/kokkos, @trilinos/kokkos-kernels
Next Action Status
ATDM Trilinos scripts now support a
release-debug
build type and this has been used in newrelease-debug
builds on 'waterman'. Convertingdebug
builds torelease-debug
builds on other platforms will be done in follow-on issues ...Description
Currently, the ATDM Trilinos builds support
debug
and anopt
build. Thedebug
build usesCMAKE_BUILD_TYPE=DEBUG
(with-O0
) and enables runtime debug-mode checking while theopt
build usesCMAKE_BUILD_TYPE=RELEASE
(with-O3
) and no runtime debug-mode checking. The problem with this approach is that some of the Trilinos tests (especially many of the Kokkos and KokkosKernels tests) run many times slower wtih-O0
than with-O3
. This has caused many tests to timeout at 10 minutes indebug
builds that finish is well under 10 minutes inopt
builds (e.g. #2964, #2921, #2461).A solution that we discussed was to change most
debug
builds intorelease-debug
builds that will setCMAKE_BUILD_TYPE=RELEASE
(with-O3
) but enable runtime debug-mode checking.Proposed solution
The idea would be to add a new
release-debug
keyword that matches beforeopt
ordebug
which will setATDM_CONFIG_BUILD_TYPE=RELEASE_DEBUG
and then update the fileATDMDevEnvSettings.cmake
accordingly. That will be easy. The harder part will be updating the tweaks*.cmake
files and all of the Jenkins jobs to accommodate the name change. NOTE: calling thisrelease-debug
as apposed toopt-debug
hopefully might be more clear.We can still leave some full
debug
builds to help support full GDB debugging by the ATDM APPs teams but they should be sparing (because we are constantly dealing with timeouts with full debug builds).Tasks:
The text was updated successfully, but these errors were encountered: