-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[RFC] Introducing NumPy-compatible coding experience into MXNet #14253
Comments
Hey, this is the MXNet Label Bot. |
+1 for this RFC. Numpy compatibility has been long existing desire from both developers and users. It is very meaningful if we could make it possible. |
+1 for this RFC. The inconsistent APIs even within MXNet operators itself caused much confusing for users. It will be a great improvement in usability if we can make MXNet APIs compatible with Numpy. I would suggest that we establish a formal review process for PRs that includes API change or addition to prevent from creating inconsistent APIs in the future. |
+1 for this RFC. I especially like the numpy namespace proposal, that will help cleaning up a lot of thing. My experience is that the major blocker for numpy compatibility (and bad user experience) is due to the lack of dynamic shape inference. I cannot wait to have that out. Anyways, since I wrote a handful of operators already I am very happy to lend a hand in getting fully numpy-compatible once dynamic shape inference is done. |
+1 for handling zero-size arrays. I'm not that concerned about numpy compatibility, but the lack of zero-size arrays is something that I would like to see fixed, since the current situation means that empty arrays have to be carefully padded to not cause any problems. |
+1 for this RFC. The consistent experience would also help JVM language binding to be in sync with Python. It reduce the bar for users familar with Python to write the same thing in Scala. |
+1 for this RFC. It will be more flexible to use MXNet, especially in slicing, and I hope mx.numpy could eliminate the divergence between mx.nd and mx.sym. : ) I wonder how to implement mx.numpy: using Python ast module to extract the abstract syntax tree then run them on JIT, or implement it on Python entirely? We should also focus on the deployment of mx.numpy. I do not think F.numpy.dot is a good idea, since it is confusing that mx.numpy, mx.nd.numpy and mx.sym.numpy all exist. We only need mx.numpy to support mx.numpy.dot(a_nd, b_nd) and mx.numpy.dot(a_sym, b_sym). |
@wkcn All of what you have said make sense. :) Gluon APIs, GluonNLP and GluonCV highly depend on the current MXNet infra. So we have to execute it in an organized and steady stream in order not to break backward compatibility. Current NNVM has its own limitations in expressing dynamic shapes and control flow operators. We will eventually need a new IR (Relay is an option) to do AST transformation. |
Thanks for the RFC!
Earlier mxnet.ndarray was supposed to give you the experience of writing pure imperative code. Why can't we add the operators under this namespace and make the interface changes for existing operators ? Is there a list of operators which have diverged APIs for numpy and ndarray and can it be timed with 2.0 release?
If I understand correctly, even when using numpy namespace you need to toggle this switch(probably an env variable?) to obtain the correct slicing ? Have you also considered implementing a seperating numpy ndarray from base with specific functions for slicing like |
We can. However, there exist some operators in mxnet.ndarray whose names are the same as numpy counterparts while the behavior are slightly different, this means they cannot exist in the same namespace if we want to preserve backward compatibility. On the other hand, 2.0 is a good opportunity for fixing many of the existing problems besides the operator behaviors, so we'd likely want to take the time. Thus, to start now, having a new namespace would be the most straightforward way to go.
Yes. Creating different array types means we'd start to see diverging user code, with some in ndarray and some in numpy ndarray, which would become harder to migrate later. |
@reminisce @szha NumPy has reference/view and stride in its NDArray structure whille MXNet.NDArray doesn't have. How does this impact the design of NumPy-compatible coding experience? |
@TaoLv In neural nets, once you do backprop, you cannot overwrite data because it destroys checkpointing. |
Not sure I understand the |
@TaoLv MXNet can have the same concept as in NumPy for view with the implementation of strides. But I think it's not the first priority for us to do so, because they are rarely useful in training (maybe useful in data preprocessing). @junrushao1994 's point is that in-place assignment is invalid in BP as it will wipe out pre-stored autograd information. This is consistent with other DL frameworks. |
Do we really have to carry this burden of backward compatibility all the way beyond 2.0? I feel existing operators are confusing enough that 2.0 maybe a good time for us to make the API clean and easy to use. Would adding a new name space |
@apeforest Because MXNet guarantees backward compatibility, those two namespaces have to be kept till 2.0. Adding namespace |
@reminisce I am fine with keeping those two namespaces till 2.0 for backward compatibility. Starting from 2.0, I feel we may want to just drop |
+1 for this RFC. |
What's the plan regarding: "Instead, users should be able to just write native Python code as the following and if required, let the framework serialize it into a computation graph for optimization and deployment." I would get the python AST and convert it to a computational graph, seems that part is not described into detail, I guess is a long-term phase. |
This feature has been made available as experimental feature 1.6 and will be supported in 2.0. Thanks to everyone who contributed to this major feature |
Motivation
Today deep learning scientists spend majority of their time on data processing, debugging tensor algorithms, and tuning model parameters, instead of architecting models from scratch by themselves as a result from the abundant pre-trained models existing in many deep learning model zoos. This has highlighted the usability of tensor APIs as a key factor for a framework to be widely adopted.
MXNet was firstly designed with the focus on memory efficiency, computation throughput and scalability. The usability problems begin to show up nowadays when more and more models demonstrate dynamic natures, e.g. unknown-shape tensors before runtime, control flow depending on a runtime result, etc. Here we highlight the most frequent complaints about usability from users.
a = [0, 1, 2]
,a[1]
will generate anNDArray
of shape(1,)
, instead of()
as in NumPy.(0, 16, 256)
cannot be passed to an operator, because our system currently treats 0, the first dimension size, as unknown, rather than a concrete number.nd.dot
vs.np.dot
,nd.concatenate
vs.np.concatenate
, etc.data[data < 0]
cannot run.mxnet.ndarray
andmxnet.symbol
.for
,while
,if/else
, etc.For example, we have learned (in a hard way) that it does not make a lot of sense to ask users to write code like the following to perform a cumulative sum.
Instead, users should be able to just write native Python code as the following and if required, let the framework serialize it into a computation graph for optimization and deployment.
It is not hard to figure out that all of the above pain points can be summarized as a result from lack of NumPy-compatible coding experience in MXNet. While addressing the problems of better support of control flow operators and a consolidated coding style for writing imperative and symbolic code with more flexibility requires introducing fundamental changes into the codebase for building new infrastructures, such as a new graph IR and executor, which is extremely non-trivial and should be executed with a long-term plan, we can, at the moment, improve the usability by fixing the issue of zero-dim/size tensors and implementing NumPy operators in MXNet. Please allow us to discuss how to achieve these short-term goals in the following.
Support of zero-dim and zero-size tensors
What's the problem?
Zero-dim and zero-size tensors are valid tensors in NumPy. The former, whose shapes are
()
, represent scalars innumpy.ndarray
format. The latter, which have one or multiple zero dimension sizes in shapes, can be useful as a placeholder for manyndarray
operations, such as concatenating a zero-sizendarray
with anotherndarray
. MXNet does not support them due to the reserved semantics of empty shape()
and shapes with zero dimension sizes indicating unknown shape information. Such information need to be filled out during the shape inference stage in order to move forward to tensor computations later.How to resolve the problem?
We can first change the current semantics to comply with NumPy definition.
ndim = 0
tondim = -1
inTShape
class.dim_size = 0
todim_size = -1
inTShape
class.After this, we need to scan all over the codebase to modify the code accordingly where
shape.ndim() == 0
andshape.Size() == 0
is used to perform unknown shape checks.Please note that although MXNet's shape is a type inheriting from
nnvm::Tuple
, which is often used to represent an list-like object, such asaxis=(1, 2, 3)
, we will not change the meaning of an empty tuple. This separation of definitions for empty shape and empty tuple keeps the their roles clearly decoupled.We propose to breakdown the efforts into the following steps.
tuple.h
from NNVM to MXNet and renamennvm::TShape
tomxnet::TShape
.nnvm::Tuple
andnnvm::TShape
are used withmxnet::Tuple
andmxnet::TShape
, respectively.TShape
intuple.h
to usendim = -1
to indicate unknown shapes anddim_size = -1
to indicate unknown shape dim sizes.ndim == 0
anddim_size == 0
is used to accommodate the above changes.InferShape
,PlanMemory
, andGradient
, wherennvm::TShape
is used, to accommodate the above changes.How is backward compatibility guaranteed?
By default, we do not change the original definition of output shapes in shape inference functions; we just change
ndim==0
tondim==-1
for unknown shape verification. No backward compatibility issues are expected for all but one case,NDArray
indexing. To elaborate, the current behavior determines thatx[i]
always returns a tensor withndim >= 1
. We can keep the current behavior unchanged and implement a global switch for users to turn on for expecting NumPy-compatible results.Previous discussion of this topic can be seen here.
Implementation of NumPy operators
What to do?
To address the problems of operator incompatibility with NumPy and alleviate the pain of diverged programming experience due to the operator namespace separation:
mxnet.ndarray
andmxnet.symbol
, we propose creating a new namespacemxnet.numpy
, adopting operator APIs from NumPy, and implementing those operator APIs under the namespace.mxnet.numpy
should provide the same imperative programming experience as NumPy and will gradually replace all the non-neural-network operators in the current codebase. While implementing NumPy operators in MXNet, it is possible for us to leverage TVM to generate high-performance kernels (ref.).Can
mxnet.numpy
operators be used in Gluon for hybridization?The newly implemented NumPy operators can still be accessed through the module (
ndarray
/symbol
) delegateF
in Gluon, e.g.F.numpy.dot
. This works because the new operators are still registered undermxnet.ndarray
andmxnet.symbol
behind the scene. It is just that users are encouraged to access NumPy operator APIs throughmxnet.numpy
to write pure imperative code and Gluon APIs for achieving hybrid coding experience.Where to contribute code?
A dev branch has been opened for this proposal.
/~https://github.com/apache/incubator-mxnet/tree/numpy
@junrushao1994 @szha @eric-haibin-lin @zheng-da @yzhliu
The text was updated successfully, but these errors were encountered: