-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParU: demo intermittently stuck indefinitely #474
Comments
We haven't tested ParU on Windows until now, if I recall. This is using OpenMP 4.5 for ParU itself, correct? @Aznaveh : We're making progress on getting the ParU cmake build system updated to the latest SuiteSparse, thanks to @mmuetzel's help. One solution would be to disable OpenMP for ParU itself on Windows entirely, since it might take some time to track this down. ParU would then use parallelism inside the BLAS and LAPACK alone, on Windows. That's not ideal since ParU is meant to be able to factorize many frontal matrices in parallel, but this would at least be a stable temporary solution in order to get to a first stable release. |
Seems to have happened in CI here: The job timed out after 6 hours. |
Happened again: /~https://github.com/DrTimothyAldenDavis/SuiteSparse/actions/runs/6736593096/job/18312283038#step:9:4262 So far, this happened once for MINGW32 and three times for MINGW64 (afaict). It didn't happen on CLANG* or MSVC runners. This is still very speculative. But maybe that "pattern" solidifies if we wait a bit longer... |
I will try to track it down, early next week or so. ParU has some parallel data structures and I'm guessing we're missing a #pragma omp flush somewhere. Some sort of race condition, anyway. |
Paru has an extensive internal debug code where it can print out data structures , status, does asserts, etc. I will turn that on (takes a code edit if I recall) and then the log should show me where it's stuck. Tied up most of today though |
I opened #494 to avoid that the runners are blocked for the full 6 hours in case this (or something akin) is happening. |
I'm working on debugging this now, by enabling the ParU debug mode with its extensive printing. I forced on the GraphBLAS COMPACT mode to speed up the tests, temporarily. I wonder if the old MSVCRT libraries are thread-safe. ParU uses various C++ libraries in parallel, in multiple threads, to do things inside individual openmp threads. If those libraries are not thread-safe, then this will fail. |
I don't know if the MSVCRT libraries are different to UCRT when it comes to thread safety. I didn't find anything in this respect online. (But I might have used the wrong search terms.) It might also be that it is the compiler (not the C runtime) that makes the difference. On their "Environments" page (link in comment above), they list for the LLVM/Clang compiler "Native support for TLS (Thread-local storage)". That might mean that GCC does not have native support for TLS (whatever "native" means in this circumstances). |
I asked on the MSYS2 Discord. @mati865 proposed trying to statically link and/or build with Clang in the MINGW* environments of MSYS2 for a test.
|
GCC with emuTLS on Windows might have trouble with shared linking. See DrTimothyAldenDavis#474.
GCC with emuTLS on Windows might have trouble with shared linking. See DrTimothyAldenDavis#474.
(Meant for testing DrTimothyAldenDavis#474.)
GCC with emuTLS on Windows might have trouble with shared linking. See DrTimothyAldenDavis#474.
There weren't any cases where the CI got stuck on this issue in a while. |
Describe the bug
When running
make demos
for ParU, execution occasionally is stuck indefinitely inparu_demo
.To Reproduce
Build ParU and run
make demos
.It only happens occasionally. So, it might be a threading issue. I'll try to attach a debugger to the process when that happens the next time. Maybe that can give a clue where it is stuck.
Expected behavior
The demo executable terminates in a finite time.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: