-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple PyTorch engines across threads appear to be sharing native instance #2825
Comments
PyTorch Our recommendation is to set both of them to 1 at the beginning. |
|
No, all are in the same class loader. I'm calling |
Can you initialize PtEngine before you start the thread? It looks like there is bug in getEngine() call. |
I created a PR to address your issue: #2826 |
That seems to have worked without issues. |
After upgrading to the latest and removing previously suggested workaround I'm getting a different error when I'm instantiating NDManager in each thread:
|
Description
Running on a very lager server with lots of cores and using PyTorch engine on CPU I'm trying to parallelize very much independent jobs across multiple instances of PtEngine/NDManager allocated one per thread.
I assumed each engine was independent of one another and was setting environment variable "ai.djl.pytorch.num_interop_threads" to limit number of threads to 1 and got the following error message on when creating subsequent NDManager instances.
It appears as if underlying PtEngine created using PyTorchLibrary is shared as subsequent creation of NDManager appears to throw an exception with below error.
I couldn't find any documentation on how exactly resources are shared across thread in the same JVM/ClassLoader and would appreciate some guidance on this.
Expected Behavior
Each PyTorch engine instance to be completely independent of one another
Error Message
How to Reproduce?
Set the following property and create NDManager per thread:
The text was updated successfully, but these errors were encountered: