This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733
[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733
Changes from all commits
c174b2d
38383e7
9fb797b
5674684
e825e79
ff3e15e
28e4c2e
3ca90c9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when doing zero-copy, should the ownership of the data pointer be transferred to the new ndarray? do we need to update the writable flag? what happens when the original data is updated by numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szha They share the ownership - which means numpy's modification is transparent to mxnet and vice versa. As requested by @reminisce in his review, I added a testcase for this transparency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it work when using the asynchronous engine? since numpy calls happen immediately, allowing shared ownership may give unexpected results. consider the following case:
given the asynchronous execution, it's hard to tell whether people should expect d to be based on the input before or after
expensive_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We actually have such discussion long time before, but it's good to make things open and achievable to the community)
My understanding is that your concern is a general question in multi-threading (see the code below). In any multi-threaded system, we may expect this issue to exist -- and MXNet has multi-threaded engine, so this issue exists, right?
We have discussion that how about the content of array is changed somewhere inside the system but user don't know that, should we set up the
WRITEABLE
flag to be true to prevent users from writing, or should we completely disallow users to access this numpy array?The question, in the high-level, is all about trade-off. Restricting the freedom of users may make things less error-prone, while giving users full freedom may make the system more powerful.
On one hand, in my humble opinion, restricting the freedom of users, like setting up
WRITEABLE
flag, does not completely resolve this issue, because users can still encounter pitfalls when they read the array using the numpy side API. The only helps in this case is to completely disable users from read or write the array, which grant them on access only the numpy side - This is more like guaranteeing safety by disallowing users to create threads/processes.On the other hand, giving users freedom enables them to do more things, and does not necessarily mean that they will definitely make a mistake, because we will encourage to use mxnet's numpy compatibility, which is on the way.
Anyway, I am open to any discussion and open to possible changes setting up several flags on the numpy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API should have clear semantics regardless of users’ knowledge on how MXNet execution works. If you worry about flexibility I’d recommend making it an option to not disable writable flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we add a zero-copy option to asnumpy where the ownership is completely transferred back to numpy, I think there wouldn't be any need for co-ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following case:
It's necessary to keep co-ownership.
We can add extra option in
from_numpy
andasnumpy
to prompt users to call the functionwait_to_read()
.Alternatively, can we add a hook into
numpy.ndarray
? When users accessnumpy.ndarray
,wait_to_read()
will be called automatically.Reference: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If zero-copy is on, then
a
should not be usable after theasnumpy
call. The point is to let the framework deal with all synchronization, and co-ownership doesn't allow that. And for the same case, you can simply copyc
fromb
(the new numpy array) instead ofa
(the old ndarray), so it's not necessary to keep co-ownership.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook in the context seems to refer to our own subclass of numpy array. Creating a new subclass seems like overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, @szha's suggestion was addressed in #14948