Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstdc++ assertion failures in debug mode #1143

Closed
lahwaacz opened this issue Oct 15, 2022 · 8 comments · Fixed by #1176
Closed

libstdc++ assertion failures in debug mode #1143

lahwaacz opened this issue Oct 15, 2022 · 8 comments · Fixed by #1176
Assignees

Comments

@lahwaacz
Copy link
Contributor

While working on an AUR package, I've got this error:

[8/555] Building CXX object _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
FAILED: _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
/usr/bin/c++ -DGTEST_CREATE_SHARED_LIBRARY=1 -Dgtest_EXPORTS -I/home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest/include -I/home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -O3 -DNDEBUG -fPIC -Wall -Wshadow -Werror -Wno-error=dangling-else -DGTEST_HAS_PTHREAD=0 -fexceptions -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -MD -MT _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o -MF _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o.d -o _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o -c /home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest/src/gtest-all.cc
In file included from /usr/include/c++/12.2.0/ios:40,
                 from /usr/include/c++/12.2.0/ostream:38,
                 from /home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest/include/gtest/gtest.h:58,
                 from /home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest/src/gtest-all.cc:38:
In static member function ‘static std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
    inlined from ‘static void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12.2.0/bits/basic_string.h:423:21,
    inlined from ‘std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12.2.0/bits/basic_string.tcc:532:22,
    inlined from ‘std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12.2.0/bits/basic_string.h:2171:19,
    inlined from ‘std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12.2.0/bits/basic_string.h:1928:22,
    inlined from ‘std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(const _CharT*, __cxx11::basic_string<_CharT, _Traits, _Allocator>&&) [with _CharT = char; _Traits = char_traits<char>; _Alloc = allocator<char>]’ at /usr/include/c++/12.2.0/bits/basic_string.h:3541:36,
    inlined from ‘static std::string testing::internal::StreamingListener::UrlEncode(const char*)’ at /home/klinkovsky/build/builddir/ginkgo-hpc-git/src/build/_deps/googletest-src/googletest/src/gtest.cc:4882:27:
/usr/include/c++/12.2.0/bits/char_traits.h:431:56: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775807] and 1 may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
  431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
      |                                        ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

This is with external gtest downloaded by ginkgo. When I use the system gtest-1.12.1-1 on Arch Linux, the error is gone, but some of the tests fail at runtime due to weird assertion errors in the stdlib:

  1/273 Test  #85: omp/test/reorder/rcm_kernels .............................Subprocess aborted***Exception:   4.77 sec
Running main() from /build/gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Rcm
[ RUN      ] Rcm.OmpPermutationIsRcmOrdered
/usr/include/c++/12.2.0/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = int; _Alloc = gko::ExecutorAllocator<int>; reference = int&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

        Start 233: test/matrix/matrix_cuda
  2/273 Test #232: test/matrix/matrix_omp ...................................Subprocess aborted***Exception:   4.82 sec
Running main() from /build/gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
[==========] Running 323 tests from 17 test suites.
[----------] Global test environment set-up.
[----------] 19 tests from Matrix/DenseWithDefaultStride, where TypeParam = DenseWithDefaultStride
[ RUN      ] Matrix/DenseWithDefaultStride.SpMVIsEquivalentToRef
/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97: std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) [with _IntType = long int]: Assertion '_M_a <= _M_b' failed.

        Start 213: test/factorization/par_ilut_kernels_cuda
  3/273 Test #210: test/factorization/par_ilu_kernels_omp ...................Subprocess aborted***Exception:   4.88 sec
Running main() from /build/gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
[==========] Running 56 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from ParIlu/<float, int>, where TypeParam = std::tuple<float, int>
[ RUN      ] ParIlu/<float, int>.KernelAddDiagonalElementsSortedEquivalentToRef
[       OK ] ParIlu/<float, int>.KernelAddDiagonalElementsSortedEquivalentToRef (73 ms)
[ RUN      ] ParIlu/<float, int>.KernelAddDiagonalElementsUnsortedEquivalentToRef
/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97: std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) [with _IntType = int]: Assertion '_M_a <= _M_b' failed.

        Start 203: test/factorization/lu_kernels_cuda
  4/273 Test #211: test/factorization/par_ilu_kernels_cuda ..................Subprocess aborted***Exception:   9.61 sec
Running main() from /build/gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
[==========] Running 56 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from ParIlu/<float, int>, where TypeParam = std::tuple<float, int>
[ RUN      ] ParIlu/<float, int>.KernelAddDiagonalElementsSortedEquivalentToRef
[       OK ] ParIlu/<float, int>.KernelAddDiagonalElementsSortedEquivalentToRef (717 ms)
[ RUN      ] ParIlu/<float, int>.KernelAddDiagonalElementsUnsortedEquivalentToRef
/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97: std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) [with _IntType = int]: Assertion '_M_a <= _M_b' failed.

        Start 209: test/factorization/par_ict_kernels_cuda
  5/273 Test #233: test/matrix/matrix_cuda ..................................Subprocess aborted***Exception:   9.19 sec
Running main() from /build/gtest/src/googletest-release-1.12.1/googletest/src/gtest_main.cc
[==========] Running 399 tests from 21 test suites.
[----------] Global test environment set-up.
[----------] 19 tests from Matrix/DenseWithDefaultStride, where TypeParam = DenseWithDefaultStride
[ RUN      ] Matrix/DenseWithDefaultStride.SpMVIsEquivalentToRef
/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97: std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) [with _IntType = long int]: Assertion '_M_a <= _M_b' failed.
@upsj upsj self-assigned this Oct 15, 2022
@upsj
Copy link
Member

upsj commented Oct 15, 2022

I'll investigate the runtime issues. The build issues we can't do much about, since GTest insists on using -Werror in its flags - something something live at head. Does the gtest package have a workaround in place for this?

@lahwaacz
Copy link
Contributor Author

Looking at its PKGBUILD, I don't think there is any specific workaround. But it's building 1.12.1 instead of 1.11.0 that ginkgo pulls.

@upsj
Copy link
Member

upsj commented Nov 2, 2022

it seems to me like somehow you got -D_GLIBCXX_DEBUG=1 into your compiler flags. Any idea how that happened? The warning is valid, the following change seems to fix it

diff --git a/core/test/utils/matrix_generator.hpp b/core/test/utils/matrix_generator.hpp
index f4e6e4e26d..93b5166331 100644
--- a/core/test/utils/matrix_generator.hpp
+++ b/core/test/utils/matrix_generator.hpp
@@ -91,7 +91,7 @@ matrix_data<ValueType, IndexType> generate_random_matrix_data(
             size_type(0),
             std::min(static_cast<size_type>(nonzero_dist(engine)), num_cols));
         std::uniform_int_distribution<IndexType> col_dist{
-            0, static_cast<IndexType>(num_cols) - 1};
+            0, std::max(static_cast<IndexType>(num_cols) - 1, IndexType{})};
         if (nnz_in_row > num_cols / 2) {
             present_cols.assign(num_cols, true);
             // remove num_cols - nnz_in_row entries from present_cols
@@ -228,7 +228,7 @@ matrix_data<ValueType, IndexType> generate_random_triangular_matrix_data(
         // randomly generate number of nonzeros in this row
         const auto min_col = lower_triangular ? 0 : row;
         const auto max_col =
-            lower_triangular ? row : static_cast<IndexType>(size) - 1;
+            lower_triangular ? row : std::max(static_cast<IndexType>(size) - 1, IndexType{});
         const auto max_row_nnz = max_col - min_col + 1;
         const auto nnz_in_row = std::max(
             size_type(0), std::min(static_cast<size_type>(nonzero_dist(engine)),

@upsj
Copy link
Member

upsj commented Nov 2, 2022

The RCM issue: @lksriemer how does this diff look?

diff --git a/omp/reorder/rcm_kernels.cpp b/omp/reorder/rcm_kernels.cpp
index 920b23d627..38d855fccc 100644
--- a/omp/reorder/rcm_kernels.cpp
+++ b/omp/reorder/rcm_kernels.cpp
@@ -715,7 +715,7 @@ void write_permutation(std::shared_ptr<const OmpExecutor> exec,
 
                 // Sort neighbours. Can not be more than there are nodes.
                 const IndexType size = valid_neighbours.size();
-                sort_small(&valid_neighbours[0], size,
+                sort_small(valid_neighbours.data(), size,
                            [&](IndexType l, IndexType r) {
                                return degrees[l] < degrees[r];
                            });

@lksriemer
Copy link
Contributor

@upsj That diff looks good, .data() is well-defined for an empty vector. I however doubt we caught full-on UB from this, though it would technically be allowed.

@lahwaacz
Copy link
Contributor Author

lahwaacz commented Nov 2, 2022

@upsj Oh, Arch defines _GLIBCXX_ASSERTIONS by default for all packages: /~https://github.com/archlinux/svntogit-packages/blob/packages/pacman/trunk/makepkg.conf#L44

@upsj upsj changed the title Fetched gtest does not build with GCC 12 libstdc++ assertion failures in debug mode Nov 3, 2022
@upsj
Copy link
Member

upsj commented Apr 4, 2024

An update on this: We will not be able to support _GLIBCXX_DEBUG with CUDA and HIP in the short run, because that requires significant changes to our internals (mixing flags leads to some ABI incompatibility, if you try to link the libraries together), but a fix for the CPU side will be merged today

@upsj upsj closed this as completed in #1176 Apr 4, 2024
@lahwaacz
Copy link
Contributor Author

FWIW, building Ginkgo with -Wp,-D_GLIBCXX_ASSERTIONS in CMAKE_CXX_FLAGS but not CMAKE_HIP_FLAGS seems to work. Putting it in CMAKE_HIP_FLAGS results in compiler errors due to calling some host-only assertion functions from host-device functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants