-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow configurable child shutdown timeouts #565
feat: allow configurable child shutdown timeouts #565
Conversation
…rable-shutdown-timeouts mypy typing defs updated to match upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions
nox/_options.py
Outdated
group=options.groups["execution"], | ||
help=( | ||
"The number of seconds to wait for children to shut down cleanly after " | ||
"receiving a SIGTERM signal before sending them a SIGKILL signal." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I include the default value in the help here?
nox/_options.py
Outdated
try: | ||
return float(command_args.child_shutdown_timeout) | ||
except (ValueError, TypeError): | ||
try: | ||
return float(noxfile_args.child_shutdown_timeout) | ||
except (ValueError, TypeError): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better method to validate/cast object types?
nox/popen.py
Outdated
with contextlib.suppress(subprocess.TimeoutExpired): | ||
return proc.communicate(timeout=0.3) | ||
return proc.communicate(timeout=child_timeout or 0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this timeout (the first one) be the one controlled by the variable or should it be the second? I'm on the fence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting together this PR!
Generally, it's preferred to open an issue first. This gives us an opportunity to talk about the problem you're trying to solve, and investigate different approaches before getting into implementation details of the specific approach you favor.
I have a few questions:
-
Have you considered using pytest fixtures for your Docker containers? pytest should provide all the flexibility you need, and it's a more general solution as it does not require Nox. Also, have you considered modifying the Docker containers to improve their shutdown time? While you can always argue for more configurability, every little knob we add increases maintainance effort permanently, so I'd like to first consider alternative solutions.
-
Have you considered passing the timeout as an argument toEdit: Just saw that your PR already allows passing the timeout tosession.run
and friends? This is a more fine-grained approach. The shutdown timeout is specified at the point where the expensive process is spawned. That also makes this approach clearer in my mind. Finally, it means that you don't need to pass the timeout with every Nox invocation, but specify it once in the Noxfile.session.run
. -
The option help says that it's for the delay between SIGTERM and SIGKILL. But the implementation is for the delay between SIGINT and SIGTERM. Possible confusion?
-
There are two timeouts involved in the existing shutdown process, the delay before SIGTERM and the delay before SIGKILL. In tox 3, both are configurable, and have names that make their purpose clear (
interrupt_timeout
andterminate_timeout
). You chose the generic nameshutdown_timeout
, and wired that to the interrupt timeout. Why did you select this one of the two timeouts, and how should we handle future feature requests for configuring the other timeout?
I'm not 100% against making this configurable, but it would be good to talk these points over.
Thanks for the feedback!
I'm sorry, I would if I had known. I don't think there is a PR template on this project is there? I'll maybe open an issue about adding one of those with some guidelines then.
Yes, actually that is what is going on :) Pytest spins up and down containers, but when it tries to shut them down after receiving SIGINT noxs end up SIGKILLing pytest and we get containers left around on out CI hosts. This has become apparent after I switched the project in question over to a matrix strategy build on github so after the first test failure a SIGINT gets sent to the other running matrix tests as I'm sure you know.
I 100% understand and agree. If there is a workaround I'm willing to implement it, but I think this will probably effect more than a few other users of nox unless I happen to be doing things in a strange way (which I hope I'm not)
I'm not really sure what all I could speed up, I think it's just a problem of combined IO load on the CI hosts at some points. It can be as fast as 1-2 seconds to clean it up sometimes but it's always longer than 0.3 seconds. Docker-compose with databases always seems a little sluggish to me on cleanup for some reason.
Oh you're totally right, I think in this case the configurable timeout should be moved to the second one? I was reading the code wrong.
Good to know about tox, I should have looked there for naming standards to follow. In this case because tox has the functionality already then I would personally want to match tox's naming and probably provide both as options to the user. If you would only want one, this should probably be named I should have set this to draft mode to begin with, I didn't mean for you to accept this without a discussion :) Thanks again |
I do see you have a CONTRIBUTING file however, I apologize I should have looked for one. |
Should I start a related Issue for this? |
Thanks for explaining your use case further. And no worries about opening an issue now, I think we can take it from here. What are your thoughts about adding parameters We'll also need docs and a test. For the docs, you can update the docstrings of |
I agree, that would be helpful. |
That actually makes a lot more sense, CLi options aren't probably needed for this and having separate variables that match
For sure! Thank you for keeping the test coverage high, nox has been incredibly stable for me. |
IMO it should be passed explicitly as a keyword argument. I don't think we need a configurable global default, and |
Still need to check test coverage and update docs |
nox/popen.py
Outdated
interrupt_timeout: float, | ||
terminate_timeout: float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's allow waiting without timeout as well.
interrupt_timeout: float, | |
terminate_timeout: float, | |
interrupt_timeout: float | None, | |
terminate_timeout: float | None, |
nox/popen.py
Outdated
interrupt_timeout: float = 0.3, | ||
terminate_timeout: float = 0.2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
interrupt_timeout: float = 0.3, | |
terminate_timeout: float = 0.2, | |
interrupt_timeout: float | None = 0.3, | |
terminate_timeout: float | None = 0.2, |
tests/test_sessions.py
Outdated
"terminate_timeout_expected", | ||
), | ||
[ | ||
(None, None, 0.3, 0.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If None is passed, None is the expected value (see above).
For the defaults, pass a different sentinel instead of None and check for that further down. For example, the string "default" would work.
Sorry for the delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This still needs to be documented. Other than that, looking great.
For the docs, you can update the docstrings of session.run and session.run_always.
Nice! Yep, will do. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
FTR, a small noxfile.py to test both timeouts: """
Usage: nox -- [interrupt_timeout] [terminate_timeout]
Examples:
$ nox -- 1
$ nox -- 1 2
$ nox -- None None
"""
import ast
import nox
bash_program = """
trap '' SIGINT SIGTERM
sleep inf
"""
@nox.session
def test(session):
timeouts = dict(
zip(
["interrupt_timeout", "terminate_timeout"],
map(ast.literal_eval, session.posargs),
)
)
session.run("bash", "-c", bash_program, external=True, **timeouts) |
Thank you for the work done in #393! It did fix some odd issues but brought to light one more. We have a few tests that spin up docker containers and then clean them up after the pytest session exits but these take upwards of 10-15 seconds to fully clean up. I think that making the timeout configurable may be acceptable although I'm not sure if this is the best approach. Let me know what you think.
The goal of this change is to allow you to set the timeout on the CLI, in the
noxfile.py
, or individually on eachsession.run
invocation. If this is acceptable I'll work on updating docs and tests. Thanks!