Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: A different approach to warning users of fork() issues with Polars #19197

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

itamarst
Copy link
Contributor

@itamarst itamarst commented Oct 11, 2024

Just print a warning when fork() is used at the same time as Polars, which is what jax does. Users aren't as likely to see warnings, but it's something, and it's less intrusive hopefully than the previous attempt that got reverted.

@itamarst itamarst changed the title feature(python): A different approach to warning users of fork() issues with Polars feat: A different approach to warning users of fork() issues with Polars Oct 11, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Oct 11, 2024
@itamarst itamarst marked this pull request as ready for review October 12, 2024 12:27
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.

Project coverage is 79.97%. Comparing base (20ead46) to head (a65272f).
Report is 236 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/__init__.py 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19197      +/-   ##
==========================================
- Coverage   80.03%   79.97%   -0.06%     
==========================================
  Files        1528     1529       +1     
  Lines      209657   209822     +165     
  Branches     2416     2417       +1     
==========================================
+ Hits       167791   167809      +18     
- Misses      41317    41463     +146     
- Partials      549      550       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 9af7ccd into pola-rs:main Nov 14, 2024
13 checks passed
@itamarst itamarst deleted the warn-user-on-fork-take-2 branch November 21, 2024 23:02
@hassonofer
Copy link

The multiprocessing module warning is problematic when the module isn't directly used by the user.
In frameworks like PyTorch, it's used internally through DataLoaders, torchrun, and other components.
The different ways programs can run makes it complex to switch the default context.

Some related discussions are happening in the PyTorch community (like pytorch/pytorch#138957).
As a potential quick fix, would it make sense to add an environment variable to disable this warning?

@itamarst
Copy link
Contributor Author

Some related discussions are happening in the PyTorch community (like pytorch/pytorch#138957).
As a potential quick fix, would it make sense to add an environment variable to disable this warning?

fork() is not safe in the face of threads. It will deadlock mysteriously. People will waste days on this, I have written an article about this and keep hearing from people telling me this saved them from problems they were fundamentally unable to solve.

So using the fork() context is fundamentally broken, and Python 3.14 is switching to a different default for this reason. Having an environment variable saying "stop telling me I'm shooting myself in the foot" isn't stopping you from shooting yourself in the foot.

Some other options:

  1. Instead of having Polars help people break their runtime, those libraries and frameworks should be switching to the non-broken default.
  2. In addition, you can set the global multiprocessing default context to "spawn" by doing multiprocessing.set_start_method("spawn") early in program startup. This could be documented in Polars page on this.

@hassonofer
Copy link

I'm not arguing with fork being problematic, and I value your work toward helping people.

I was trying to convey, that setting multiprocessing.set_start_method("spawn") or forkserver for that matter, is not simple as one might think at some scenarios.
Also, the ability to influence large frameworks is limited.

The way I see it, my suggestion does not affect regular users who will get the warning and hopefully will go read on the subject.
But for people who are already aware of it, it can clean up the logs - which can get quite big when using stuff like submitit or torchrun (the warning repeats for each worker).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants