Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

improve signal handling and worker cleanup #5378

Merged
merged 5 commits into from
Aug 28, 2021
Merged

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Aug 25, 2021

Fixes #5369. @dhruvdcoder could you please take a look and verify that this fixes your issues?

Changes proposed in this pull request:

  • allennlp will now catch SIGTERM signals and handle them the same way of SIGINT (keyboard interrupt).
  • The MultiProcessDataLoader will properly shutdown its workers when a SIGTERM is received.
  • This also adds an additional worker safety mechanism that allows MultiProcessDataLoader workers to constantly check if the main process is still alive and consuming. If they determine that the main process has exited, they can shut themselves down. This is useful if the main process exits abnormally. But unfortunately it only works when start_method is "spawn".

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.
  • codecov/patch reports high test coverage (at least 90%).
    You can find this under the "Actions" tab of the pull request once the other checks have finished.

@epwalsh epwalsh self-assigned this Aug 25, 2021
@dhruvdcoder
Copy link

@epwalsh Thank you for the quick fix. I will incorporate this in my full sweep setup today and get back to you.

@epwalsh epwalsh requested review from AkshitaB and dirkgr August 27, 2021 21:41
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK, though I'm wondering if you need two connections (the queue and tx). Can't you send a terminal value through the queue to tell workers to stop?

@epwalsh
Copy link
Member Author

epwalsh commented Aug 28, 2021

The existing queue moves data from the workers to the main process, not the other way around, so there needs to be a separate channel of some kind to move data in the opposite direction.

@epwalsh epwalsh merged commit 2895021 into main Aug 28, 2021
@epwalsh epwalsh deleted the mp-data-loader-improvement branch August 28, 2021 03:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants