Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use POSIX::Spawn::Child instead of #popen3 to avoid deadlocks #738

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nachokb
Copy link

@nachokb nachokb commented May 11, 2018

Problem

We are seeing No live threads left. Deadlock? in production a lot. This is not easy to reproduce, and it only happens under load.

After upgrading to Ruby 2.4.2 and Rails 5.1 we started seeing this much more often.

Cause

Incorrect use of #popen3 (whose API is badly designed). Particularly,

Open3.popen3(*command) do |_stdin, _stdout, stderr|
  stderr.read
end

This produces a deadlock in case the command filled the whole stdout buffer. In that case, the subprocess is waiting for that buffer to be consumed, and the parent process is waiting for stderr's buffer to have content.

Ruby core team's position is that this should be handled with IO.select (check ref 1).

Solution

First alternative would be to use IO.select. But that sounds error-prone to me.

I decided to use the posix-spawn gem (particularly POSIX::Spawn::Child which handles this correctly). This behavior is optional (but default). This may break Windows compatibility (check Notes)

References

  1. https://bugs.ruby-lang.org/issues/9082
  2. nicer explanation: http://coldattic.info/post/63/

Notes

  1. I'm not sure about posix-spawn Windows support so I decided to make this behavior the default, but optional. Setting WICKED_POPEN env variable to ruby uses the standard library's Open3 instead. Gem dependencies still install posix-spawn, so it may still fail on Windows.
  2. I couldn't make Travis run tests successfully yet. Not even master as it is today upstream.
  3. This is a WIP. My intention is to bring this to your attention. Any feedback is welcome.

@nachokb nachokb force-pushed the ic.1.posix-spawn branch from ca1ac14 to 7ed3f91 Compare May 11, 2018 05:27
@unixmonkey
Copy link
Collaborator

Thank you for this. It sounds great. Have you been running this fork in your own apps and seen an improvement?

For an introductory transition period, I think it's be great if we could add an option to WickedPdf.config to opt into this, rather than an ENV variable, and making this option the default in a later release.

I just fixed up the test suite. I had a dependency that wasn't pinned to a version update. If you rebase, you should be able to run the tests locally or on Travis-CI.

@nachokb
Copy link
Author

nachokb commented Jun 12, 2018

sorry I missed the notification; I'll do the config change in a few hours (thanks for the tip, I honestly didn't know);

and dependencies versions… it happens

@nachokb
Copy link
Author

nachokb commented Jun 12, 2018

sorry, I didn't answer: yes, I've been using it in production for ~20 days and no deadlocks (had 1 - 2 per week before)

@nachokb
Copy link
Author

nachokb commented Jun 12, 2018

had 1 - 2 per week before

actually, historically we were seeing 1-2 per month (we process ~1000 PDFs a month), say ~0.2%; after an upgrade (Rails 4.1 -> 5.1, Ruby 2.1 -> 2.4) and a ~20% growth we started seeing ~1-2 per week, say ~0.6% (every once in a while we had weird spikes, which is the reason that I had to do this)

@allenwu1973
Copy link

Just curious why not use Open3.capture3 as that ruby-lang ticket suggested?

@midnight-wonderer
Copy link

@allenwu1973 there are other advantages using posix-spawn.
See it's readme for more information.

@nachokb
Copy link
Author

nachokb commented Oct 4, 2018

@MidnightWonderer basically I missed it BUT not even their authors had noticed it. It may be useful as a fallback.

@jjb
Copy link

jjb commented Feb 22, 2022

would be great to see a switch to posix-spawn

perhaps terrapin could help make the code simpler

/~https://github.com/thoughtbot/terrapin

It gives access to stdin and stderr - it returns this object (not documented in readme i believe)

/~https://github.com/thoughtbot/terrapin/blob/107e14937fd9aa710a79ceb3f304e168dae20496/lib/terrapin/command_line/output.rb#L1

not sure how to get access to that if process doesn't return 0, because terrapin raises an error in that case

you can pass in a logger to get ongoing output. not sure if that's out, or out+err. see "You can see what's getting run" in readme

@midnight-wonderer
Copy link

Years passed, I am surprised the thread has some updates. I moved to puppeteer already, which is more mainstream.
It's weird, as a contemplation, that probably all the old posters here don't care about the issue anymore.

@pboling
Copy link

pboling commented Mar 9, 2024

@midnight-wonderer LOL, I just found your very helpful comment over on posix-spawn, and it looks like this PR should be closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants