Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent #251

Merged
merged 1 commit into from
Dec 25, 2024

Conversation

dscho
Copy link
Collaborator

@dscho dscho commented Dec 25, 2024

After the commit d243e51, zsh sometimes hangs at startup. This occurs because SIGCHLD, which should trigger sigsuspend(), is handled in cygwait() that is used to wait for a wakeup event in sig_send(), even when __SIGFLUSHFAST is sent. Despite __SIGFLUSHFAST being required to return before handling the signal, this does not happen. With this patch, if the signal currently being sent is __SIGFLUSHFAST, do not handle the received signal and keep it asserted after the cygwait() for the wakeup event. Apply the same logic to the cygwait() in the retrying loop for WriteFile() as well.

Applied-from: https://inbox.sourceware.org/cygwin-patches/20241223013332.1269-1-takashi.yano@nifty.ne.jp Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html Fixes: d243e51 ("Cygwin: signal: Fix deadlock between main thread and sig thread")
Reported-by: Daisuke Fujimura booleanlabel@gmail.com
Reviewed-by:

@dscho dscho requested a review from jeremyd2019 December 25, 2024 18:27
@dscho dscho self-assigned this Dec 25, 2024
@dscho dscho mentioned this pull request Dec 25, 2024
Copy link
Member

@jeremyd2019 jeremyd2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member was renamed on master but not in 3.5 branch

winsup/cygwin/sigproc.cc Outdated Show resolved Hide resolved
winsup/cygwin/sigproc.cc Outdated Show resolved Hide resolved
After the commit d243e51, zsh sometimes hangs at startup. This
occurs because SIGCHLD, which should trigger sigsuspend(), is handled
in cygwait() that is used to wait for a wakeup event in sig_send(),
even when __SIGFLUSHFAST is sent. Despite __SIGFLUSHFAST being
required to return before handling the signal, this does not happen.
With this patch, if the signal currently being sent is __SIGFLUSHFAST,
do not handle the received signal and keep it asserted after the
cygwait() for the wakeup event.  Apply the same logic to the cygwait()
in the retrying loop for WriteFile() as well.

Applied-from: https://inbox.sourceware.org/cygwin-patches/20241223013332.1269-1-takashi.yano@nifty.ne.jp
Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html
Fixes: d243e51 ("Cygwin: signal: Fix deadlock between main thread and sig thread")
Reported-by: Daisuke Fujimura <booleanlabel@gmail.com>
Reviewed-by:
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the special-case-sigflushfast branch from 6326293 to f1affcc Compare December 25, 2024 20:36
@dscho
Copy link
Collaborator Author

dscho commented Dec 25, 2024

This member was renamed on master but not in 3.5 branch

Thanks @jeremyd2019! The build is now successful.

@jeremyd2019
Copy link
Member

I tested the msys-2.0.dll from the CI artifact on the arm64 autobuild, rebuilding imagemagick which hung 3 times in a row with 3.5.5-1, and it succeeded.

@jeremyd2019 jeremyd2019 merged commit 33799c2 into msys2:msys2-3.5.5 Dec 25, 2024
1 check passed
@jeremyd2019
Copy link
Member

@dscho dscho deleted the special-case-sigflushfast branch December 26, 2024 08:27
@dscho
Copy link
Collaborator Author

dscho commented Dec 26, 2024

Hrm. This is apparently still not good enough. I noticed that Git's test suite would time out after 6h with MSYS2 runtime v3.5.5, opened git-for-windows/msys2-runtime#80 to apply this here fix, and then triggered another Git test suite run. It looks like it is hanging again (albeit in less matrix jobs, which is good).

lazka added a commit to lazka/MSYS2-packages that referenced this pull request Dec 26, 2024
lazka added a commit to msys2/MSYS2-packages that referenced this pull request Dec 26, 2024
@jeremyd2019
Copy link
Member

Have you reported this to upstream Cygwin/Takashi?

@dscho
Copy link
Collaborator Author

dscho commented Dec 26, 2024

Have you reported this to upstream Cygwin/Takashi?

@jeremyd2019 No, I didn't have time to investigate yet... so I do not even have a reproducer.

@dscho
Copy link
Collaborator Author

dscho commented Dec 26, 2024

Have you reported this to upstream Cygwin/Takashi?

@jeremyd2019 No, I didn't have time to investigate yet... so I do not even have a reproducer.

I spent quite a while on trying to reproduce, and unsurprisingly it is hard. It is almost certainly a race condition, indicating that some sort of locking is needed and just isn't there.

I'll just jot down some notes to be able to jog my memory later:

The test script in Git's test suite that I zoomed in on is t7814-grep-recurse-submodules (because it should have completed as part of /~https://github.com/dscho/git-for-windows-automation/actions/runs/12502682476/job/34882008623 but has not).

I actually ran make -j15 GIT_PROVE_OPTS=-j15 GIT_TEST_OPTS=-iVx T='t7602-merge-octopus-many.sh t7004-tag.sh t3425-rebase-topology-merges.sh t3600-rm.sh t7814-grep-recurse-submodules.sh' prove with the still-not-quite-fixed MSYS2 runtime, first thing, and t7602 hung in its 18th test case, "grep using relative path".

Apparently this was "lucky" because re-running that test script individually did not cause a hang.

Then, I ran sh t7814-grep-recurse-submodules.sh -iVx --run=1,18 --stress-jobs=20 --stress-limit=20 (the --stress-* options are designed to catch test failures that are hard to reproduce and require either a high CPU load or resource depletion to reproduce more reliably). It was a bit hard to see from its output that after 5 full rounds of 20 concurrent runs, one of them did not actually finish. And a couple more rounds later (must have been somewhere between 2-5 more rounds), a second job got stuck, but even after 15 more rounds no other job got stuck.

Now, this is where it gets interesting: both of those jobs got stuck in the first "test case" (Git's test suite uses a home-grown test framework that lacks modern test frameworks' features like "setup" and "teardown" functions, therefore the first test case frequently serves as a common "setup" step and it is quite normal, unfortunately so, for the test cases to depend on side effects of previous test cases; As you can tell, I am absolutely not a fan of the way Git's test suite is architected), when running git submodule add ./submodule, which is very similar to the git -C parent submodule add ../sub call that had originally hung in the 18th test case of that test script.

The git submodule command is still implemented as a shell script, therefore it is a frequent guest in any season of the show called "MSYS2 runtime regression detected via Git".

When I had Ctrl+C'ed the hung test script and there were still a couple of left-over processes blocking the working directory of that test from being deleted, I had actually tried to make some sense of the process topology when that git -C parent submodule add ../sub command hung. There were the following processes:

  • winpid 34996 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe /d/git-sdk-64/usr/src/git/bin-wrappers/git -C parent submodule add ../sub
  • winpid 32256 (parent 34996): D:\git-sdk-64\usr\src\git\git.exe -C parent submodule add ../sub
  • winpid 34620 (parent 32256): sh "D:/git-sdk-64/usr/src/git\\git-submodule" add ../sub
  • winpid 14252 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx
  • winpid 32904 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx
  • winpid 6668 (parent 32904): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx

I had to stop, manually (via wmic process ... delete) both 14252 and 34620 (but not 6668 nor any of the listed parent processes) for the worktree to be unblocked from being deleted.

This is an unsatisfying place to take a break from my investigation, but it'll have to do for now.

Potential next steps:

  • do some set -x style debugging
  • try to reproduce with a single script that simply adds a submodule, then removes it, then repeats, over and over again.
  • try to bisect when this started (although historically, it is my experience that it is unlikely to help because the commits I found typically had commit messages that left me as puzzled as before, and when I asked for clarification from the author, I got even less clear explanations). But at least it might be possible to revert those patches for now specifically for Git for Windows because we're super late in the v2.48.0 release cycle and I had actually intended to take some time off from work during my vacation.

@jeremyd2019
Copy link
Member

The git submodule command is still implemented as a shell script, therefore it is a frequent guest in any season of the show called "MSYS2 runtime regression detected via Git".

When I had Ctrl+C'ed the hung test script and there were still a couple of left-over processes blocking the working directory of that test from being deleted, I had actually tried to make some sense of the process topology when that git -C parent submodule add ../sub command hung. There were the following processes:

  • winpid 34996 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe /d/git-sdk-64/usr/src/git/bin-wrappers/git -C parent submodule add ../sub
  • winpid 32256 (parent 34996): D:\git-sdk-64\usr\src\git\git.exe -C parent submodule add ../sub
  • winpid 34620 (parent 32256): sh "D:/git-sdk-64/usr/src/git\\git-submodule" add ../sub
  • winpid 14252 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx
  • winpid 32904 (parent no longer existed): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx
  • winpid 6668 (parent 32904): D:\git-sdk-64-minimal\usr\bin\sh.exe t7814-grep-recurse-submodules.sh -iVx

I had to stop, manually (via wmic process ... delete) both 14252 and 34620 (but not 6668 nor any of the listed parent processes) for the worktree to be unblocked from being deleted.

This is odd, so far the issues that this patch helped could be killed with SIGTERM within Cygwin.

@lazka
Copy link
Member

lazka commented Dec 27, 2024

There were two stuck jobs here right now: /~https://github.com/msys2/msys2-autobuild/actions/runs/12514784401/job/34911395541

@dscho
Copy link
Collaborator Author

dscho commented Dec 29, 2024

This is odd, so far the issues that this patch helped could be killed with SIGTERM within Cygwin.

To be honest, I did not even try the kill command, as it has failed me so often in the past.

@dscho
Copy link
Collaborator Author

dscho commented Dec 30, 2024

There were two stuck jobs here right now: /~https://github.com/msys2/msys2-autobuild/actions/runs/12514784401/job/34911395541

I tried to figure out what command it would be stuck in, but failed (I saw that PROGS is defined in fftools/Makefile, but the STRIP command that was successfully executed is defined in a Makefile rule of the top-level Makefile).

@lazka
Copy link
Member

lazka commented Dec 31, 2024

@jeremyd2019
Copy link
Member

@lazka
Copy link
Member

lazka commented Jan 4, 2025

I had a hang locally right now building gimp2. pstree shows this:

?──mintty───bash───bash───bash───make───make───sh───make───make───sh─┬─grep
                                                                     ├─sed
                                                                     └─sh───sh─┬─sed
                                                                               └─sh───sh

stack traces of sed and grep:

sh (parent of grep and sed1)
Thread 5 (Thread 17492.0x1438 "waitproc"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800a1c14 in proc_waiter (arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/pinfo.cc:1268
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x1802465e8 <threads+264>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x1802465e8 <threads+264>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x3a4ce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x1802465e8 <threads+264>, buf=buf@entry=0x3a4cd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 4 (Thread 17492.0x2fb8 "waitproc"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800a1c14 in proc_waiter (arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/pinfo.cc:1268
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x180246590 <threads+176>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x180246590 <threads+176>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x384ce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x180246590 <threads+176>, buf=buf@entry=0x384cd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 3 (Thread 17492.0x4474 "waitproc"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800a1c14 in proc_waiter (arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/pinfo.cc:1268
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x180246538 <threads+88>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x180246538 <threads+88>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x364ce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x180246538 <threads+88>, buf=buf@entry=0x364cd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 2 (Thread 17492.0x274c "sig"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800c51f7 in wait_sig () at /c/S/B/src/msys2-runtime/winsup/cygwin/sigproc.cc:1378
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x1802464e0 <threads>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x1802464e0 <threads>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x291ce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x1802464e0 <threads>, buf=buf@entry=0x291cd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 17492.0x1508 "sh"):
#0  0x00007fffbb880274 in ntdll!ZwWaitForMultipleObjects () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8dedc23 in WaitForMultipleObjectsEx () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00007fffb8dedaf1 in WaitForMultipleObjects () from /c/WINDOWS/System32/KERNELBASE.dll
#3  0x00000001800461de in cygwait (object=<optimized out>, timeout=timeout@entry=0x0, mask=mask@entry=3) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygwait.cc:75
#4  0x00000001800f0cb7 in wait4 (intpid=-1, status=<optimized out>, options=0, r=0x0) at /c/S/B/src/msys2-runtime/winsup/cygwin/wait.cc:80
#5  0x000000018019a33b in _sigfe () at sigfe.s:36
sed 1
Thread 2 (Thread 18032.0x41e0 "sig"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800c51f7 in wait_sig () at /c/S/B/src/msys2-runtime/winsup/cygwin/sigproc.cc:1378
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x1802464e0 <threads>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x1802464e0 <threads>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x115ce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x1802464e0 <threads>, buf=buf@entry=0x115cd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 18032.0x1d7c "sed"):
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x00007fffbb87f944 in ntdll!ZwQueryInformationFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x000000018012bac4 in fhandler_pipe::raw_read (this=0x800021240, ptr=0xa000023a0, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/pipe.cc:342
#2  0x00000001800ff173 in fhandler_base::read (this=0x800021240, in_ptr=0xa000023a0, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/base.cc:839
#3  0x00000001800d4614 in read (fd=0, ptr=ptr@entry=0xa000023a0, len=<optimized out>, len@entry=65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:64
#4  0x00000001801f59e3 in _read_r (ptr=ptr@entry=0x7ffffce00, fd=<optimized out>, buf=buf@entry=0xa000023a0, cnt=cnt@entry=65536) at /c/S/B/src/msys2-runtime/newlib/libc/reent/readr.c:49
#5  0x00000001801e1a44 in __sread (ptr=0x7ffffce00, cookie=0xa00001f98, buf=0xa000023a0 "", n=65536) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/stdio.c:47
#6  0x00000001801e0e45 in __srefill_r (ptr=0x7ffffce00, fp=fp@entry=0xa00001f98) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/refill.c:113
#7  0x00000001801e1010 in __srget_r (ptr=<optimized out>, ptr@entry=0x7ffffce00, fp=fp@entry=0xa00001f98) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/rget.c:47
#8  0x00000001801e015b in __sgetc_r (__p=0xa00001f98, __ptr=0x7ffffce00) at /c/S/B/src/msys2-runtime/newlib/libc/include/stdio.h:675
#9  getc_unlocked (fp=fp@entry=0xa00001f98) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/getc_u.c:85
#10 0x00000001801e02a0 in __getdelim (bufptr=0x10042d140, n=0x10042d138, delim=10, fp=0xa00001f98) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/getdelim.c:87
#11 0x000000018019a33b in _sigfe () at sigfe.s:36
sed 2
Thread 2 (Thread 16220.0x32fc "sig"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800c51f7 in wait_sig () at /c/S/B/src/msys2-runtime/winsup/cygwin/sigproc.cc:1378
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x1802464e0 <threads>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x1802464e0 <threads>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x10cce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x1802464e0 <threads>, buf=buf@entry=0x10ccd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 16220.0x4a04 "sed"):
#0  0x00007fffbb880274 in ntdll!ZwWaitForMultipleObjects () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8dedc23 in WaitForMultipleObjectsEx () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00007fffb8dedaf1 in WaitForMultipleObjects () from /c/WINDOWS/System32/KERNELBASE.dll
#3  0x00000001800461de in cygwait (object=<optimized out>, timeout=timeout@entry=0x7ffffc648, mask=mask@entry=5) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygwait.cc:75
#4  0x000000018012bd2a in cygwait (mask=5, howlong=<optimized out>, h=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygwait.h:45
#5  cygwait (howlong=<optimized out>, h=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygwait.h:51
#6  fhandler_pipe::raw_read (this=0x800021920, ptr=0xa000049f0, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/pipe.cc:397
#7  0x00000001800ff173 in fhandler_base::read (this=0x800021920, in_ptr=0xa000049f0, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/base.cc:839
#8  0x00000001800d4614 in read (fd=0, ptr=ptr@entry=0xa000049f0, len=<optimized out>, len@entry=65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:64
#9  0x00000001801f59e3 in _read_r (ptr=ptr@entry=0x7ffffce00, fd=<optimized out>, buf=buf@entry=0xa000049f0, cnt=cnt@entry=65536) at /c/S/B/src/msys2-runtime/newlib/libc/reent/readr.c:49
#10 0x00000001801e1a44 in __sread (ptr=0x7ffffce00, cookie=0xa00004678, buf=0xa000049f0 '\001' <repeats 200 times>..., n=65536) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/stdio.c:47
#11 0x00000001801e0e45 in __srefill_r (ptr=0x7ffffce00, fp=fp@entry=0xa00004678) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/refill.c:113
#12 0x00000001801e1010 in __srget_r (ptr=<optimized out>, ptr@entry=0x7ffffce00, fp=fp@entry=0xa00004678) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/rget.c:47
#13 0x00000001801e015b in __sgetc_r (__p=0xa00004678, __ptr=0x7ffffce00) at /c/S/B/src/msys2-runtime/newlib/libc/include/stdio.h:675
#14 getc_unlocked (fp=fp@entry=0xa00004678) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/getc_u.c:85
#15 0x00000001801e02a0 in __getdelim (bufptr=0x10042d140, n=0x10042d138, delim=10, fp=0xa00004678) at /c/S/B/src/msys2-runtime/newlib/libc/stdio/getdelim.c:87
#16 0x000000018019a33b in _sigfe () at sigfe.s:36
grep
Thread 3 (Thread 11964.0x441c):
#0  0x00007fffbb8833d4 in ntdll!ZwWaitForWorkViaWorkerFactory () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffbb771d7e in ntdll!TpCallbackMayRunLong () from /c/WINDOWS/SYSTEM32/ntdll.dll
#2  0x0000000180045fc3 in _cygtls::call2 (this=0x5fce00, func=0x7fffbb771a00 <ntdll!TpCallbackMayRunLong+5440>, arg=0x7d1a80, buf=buf@entry=0x5fcd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#3  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#4  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#5  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 11964.0x4460 "sig"):
#0  0x00007fffbb87f7e4 in ntdll!ZwReadFile () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8ded16d in ReadFile () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00000001800c51f7 in wait_sig () at /c/S/B/src/msys2-runtime/winsup/cygwin/sigproc.cc:1378
#3  0x0000000180044e20 in cygthread::callfunc (this=this@entry=0x1802464e0 <threads>, issimplestub=issimplestub@entry=false) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:48
#4  0x00000001800453b9 in cygthread::stub (arg=arg@entry=0x1802464e0 <threads>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygthread.cc:91
#5  0x0000000180045fc3 in _cygtls::call2 (this=0x10cce00, func=0x180045350 <cygthread::stub(void*)>, arg=0x1802464e0 <threads>, buf=buf@entry=0x10ccd50) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:41
#6  0x0000000180046074 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygtls.cc:28
#7  0x00007fffba9fe8d7 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8  0x00007fffbb7ffbcc in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 11964.0x1be0 "grep"):
#0  0x00007fffbb880274 in ntdll!ZwWaitForMultipleObjects () from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007fffb8dedc23 in WaitForMultipleObjectsEx () from /c/WINDOWS/System32/KERNELBASE.dll
#2  0x00007fffb8dedaf1 in WaitForMultipleObjects () from /c/WINDOWS/System32/KERNELBASE.dll
#3  0x00000001800461de in cygwait (object=<optimized out>, timeout=timeout@entry=0x7ffffc648, mask=mask@entry=5) at /c/S/B/src/msys2-runtime/winsup/cygwin/cygwait.cc:75
#4  0x000000018012bd2a in cygwait (mask=5, howlong=<optimized out>, h=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygwait.h:45
#5  cygwait (howlong=<optimized out>, h=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygwait.h:51
#6  fhandler_pipe::raw_read (this=0x800038c40, ptr=0xa00010000, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/pipe.cc:397
#7  0x00000001800ff173 in fhandler_base::read (this=0x800038c40, in_ptr=0xa00010000, len=@0x7ffffc800: 65536) at /c/S/B/src/msys2-runtime/winsup/cygwin/fhandler/base.cc:839
#8  0x00000001800d4614 in read (fd=0, ptr=0xa00010000, len=<optimized out>) at /c/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:64
--Type <RET> for more, q to quit, c to continue without paging--
#9  0x000000018019a33b in _sigfe () at sigfe.s:36

@dscho
Copy link
Collaborator Author

dscho commented Jan 4, 2025

I have a suspicion that it could be cygwin/cygwin@9c34e02, but I still have no reliable fast reproducer.

@tyan0
Copy link
Contributor

tyan0 commented Jan 5, 2025

@lazka, thanks for the stack trace.

sh:
Thread 5,4,3: perhaps waiting for child process (grep, sed, sh).
Thread 2: waiting for a signal
Thread 1: waiting for what?

sed1:
Thread 2: waiting for a signal
Thread 1: waiting for reading pipe.

sed2:
Thread 2: waiting for a signal
Thread 1: waiting for reading pipe.

grep:
Thread 3: no idea.
Thread 2: waiting for a signal
Thread 1: waiting for reading pipe.

As for thread 2, each process has a thread that waits for the signal. So, it is normal thing that this thread is stopping on ReadFile().
I wonder who the pipe writer is. Terminated already? If so, why fhandler_pipe::raw_read() does not get STATUS_PIPE_BROKEN?

Or is there non-msys2 process that writes to the pipe?

@dscho
Copy link
Collaborator Author

dscho commented Jan 5, 2025

Or is there non-msys2 process that writes to the pipe?

That's almost certainly the case in Git for Windows' hangs. This used to be a well-supported scenario, could we please keep it that way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants