Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[WIP][MXNET-918] Random api #12489

Closed
wants to merge 8 commits into from
Closed

Conversation

mdespriee
Copy link
Contributor

@mdespriee mdespriee commented Sep 8, 2018

Description

Introduce Symbol.random and NDArray.random modules in scala API

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Symbol.random api generated by macros
  • NDArray.random api generated by macros

Comments

  • Is there any JIRA mentioning this feature ?
  • Only implemented for Symbol. Adding the same for NDArray seems quite straightforward. Tell me if I should to it in the same PR or in a separate
  • I moved functions from Symbol.api to Symbol.random. Is it OK ?
  • The produced API is not strictly similar to the one in python:
    • in python, each function, eg random.normal, has 2 overrides, one scalar, one symbolic.
    • the generated scala api has different namings: Symbol.random.sample_normal for symbol inputs, and Symbol.random.random_normal for scalars. The arguments naming is also not fully consistent (mu & sigma, vs loc & scale).
      => What are your recommendations on this ?
  • The unit-test is poor. What should be tested for generated code like this ?

Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply these changes

// scalastyle:off
val absClassFunctions = getSymbolNDArrayMethods(isSymbol)
// TODO: Add Filter to the same location in case of refactor
val absFuncs = absClassFunctions.filter(f => f.name.startsWith("sample") || f.name.startsWith("random"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sample and _random?

@@ -97,34 +117,41 @@ private[mxnet] object SymbolImplMacros {
val newSymbolFunctions = {
if (isContrib) symbolFunctions.filter(
func => func.name.startsWith("_contrib_") || !func.name.startsWith("_"))
else symbolFunctions.filter(!_.name.startsWith("_"))
else symbolFunctions.filter(f => !f.name.startsWith("_") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think of how we can move the filter method out of this typedAPIImpl. So different method generator pass in different filters.

@lanking520
Copy link
Member

@mdespriee Firstly for unit test, there are tests covers in Python on these operators. In that case we decide not to test them on Scala end.

the generated scala api has different namings: Symbol.random.sample_normal for symbol inputs, and Symbol.random.random_normal for scalars. The arguments naming is also not fully consistent (mu & sigma, vs loc & scale).

For this question, let me consult with people working on Python.

Please also consider adding NDArray with these operators (in NDArrayAPI)

@lanking520
Copy link
Member

Hi @mdespriee , I have created a JIRA ticket for you: https://issues.apache.org/jira/browse/MXNET-918. You can put it on the title so people know what issue it is related.

@mdespriee mdespriee changed the title [WIP] Symbol Random api [WIP][MXNET-918] Symbol Random api Sep 10, 2018
Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, _random_ and random_ operators are different in backend, the major difference are the accepted input for the frontend, usually can either be a static type or NDArray.

@@ -91,69 +114,74 @@ private[mxnet] object NDArrayMacro {
val newNDArrayFunctions = {
if (isContrib) ndarrayFunctions.filter(
func => func.name.startsWith("_contrib_") || !func.name.startsWith("_"))
else ndarrayFunctions.filterNot(_.name.startsWith("_"))
else ndarrayFunctions.filterNot(f => f.name.startsWith("_") &&
!f.name.startsWith("sample_") && !f.name.startsWith("random_"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not try to avoid sample_ and random_ from generation:
This called by operator random_uniform.
and This called by operator _random_uniform

@kalyc
Copy link
Contributor

kalyc commented Sep 14, 2018

Thanks for your contribution @mdespriee
@mxnet-label-bot[pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Sep 14, 2018
Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly congratulation on your progress! I am pretty impressed on the progress you have made in here.

However, I also hold a little concern in here: Since the _random and _sample can take both Symbol/NDArray or a static type as param. This should catch the type diff. But it doesn't catch. I am curious what types does Macros generated in here.

ndarrayFunctions.filter(f => f.name.startsWith("_sample_") || f.name.startsWith("_random_"))

val functionDefs = rndFunctions.map(f => buildTypeSafeFunction(c)(f))
structGeneration(c)(functionDefs, annottees: _*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possible way to do it is maybe apply some regex pattern as a param to reconstruct the way you want to represent this.

@@ -97,34 +117,40 @@ private[mxnet] object SymbolImplMacros {
val newSymbolFunctions = {
if (isContrib) symbolFunctions.filter(
func => func.name.startsWith("_contrib_") || !func.name.startsWith("_"))
else symbolFunctions.filter(!_.name.startsWith("_"))
else symbolFunctions.filter(f => !f.name.startsWith("_"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .filterNot in here, Sync with NDArray

@mdespriee
Copy link
Contributor Author

mdespriee commented Sep 16, 2018

@lanking520. Underscore removed. We now have this kind of signatures:

def random_normal (loc : Option[org.apache.mxnet.Base.MXFloat] = None, scale : Option[org.apache.mxnet.Base.MXFloat] = None, shape : Option[org.apache.mxnet.Shape] = None, ctx : Option[String] = None, dtype : Option[String] = None, name : String = null, attr : Map[String, String] = null) : org.apache.mxnet.Symbol

and

def sample_normal (mu : Option[org.apache.mxnet.Symbol] = None, shape : Option[org.apache.mxnet.Shape] = None, dtype : Option[String] = None, sigma : Option[org.apache.mxnet.Symbol] = None, name : String = null, attr : Map[String, String] = null) : org.apache.mxnet.Symbol

I don't think we can go further. If remove both _random_ and _sample_ prefixes, we'll have collisions that the scala compiler can't resolve since all parameters have default values.
(multiple overloaded alternatives of method uniform define default arguments)

@mdespriee mdespriee changed the title [WIP][MXNET-918] Symbol Random api [WIP][MXNET-918] Random api Sep 16, 2018
@lanking520
Copy link
Member

@mdespriee Firstly I think we have a collide operators. _sample and _random looks like the same with random and sample. So it looks like we shouldn't need to generate a new set of API in order to get this work. Instead, we should try to use Macros to do something similar to python. Create a helper function that can choose between sample and random. Change the implementation a little bit more for Random Module:

  1. Set the args type as Any for Random API in order to take in NDArray/Float or Symbol/Float
  2. Do a instance check such as asInstanceOf[NDArray] to see if it is an object or numeric value
  3. Do random_ or sample_ in the function

@vandanavk
Copy link
Contributor

Thanks for your contribution @mdespriee. Please have a look at @lanking520's feedback.

@vrakesh
Copy link
Contributor

vrakesh commented Oct 9, 2018

@mdespriee Have you had a chance to look at @lanking520 's feedback? Requesting an update from you on this.

@mdespriee
Copy link
Contributor Author

mdespriee commented Oct 9, 2018 via email

@lanking520
Copy link
Member

@mdespriee Good to see you back!

@mdespriee
Copy link
Contributor Author

mdespriee commented Oct 18, 2018

Firstly I think we have a collide operators. _sample and _random looks like the same with random and sample. So it looks like we shouldn't need to generate a new set of API in order to get this work. Instead, we should try to use Macros to do something similar to python. Create a helper function that can choose between sample and random. Change the implementation a little bit more for Random Module:

  1. Set the args type as Any for Random API in order to take in NDArray/Float or Symbol/Float
  2. Do a instance check such as asInstanceOf[NDArray] to see if it is an object or numeric value
  3. Do random_ or sample_ in the function

@lanking520
I've noticed the following inconsistencies between sample and random:

  • multinomial only exists in sample-> This is catchable at runtime in case of mismatch
  • normal has different params in both worlds : loc&scale vs mu&sigma. -> How do we manage this ?

@lanking520
Copy link
Member

@mdespriee Firstly thanks for coming back to maintain this PR!

I see your point, is there a mapping can be applied to these operators? If yes as they are just the difference like that, we can apply something in Macros end to make it change.

Apart from that, I think you can extend /~https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/Random.scala this class to support Random with your Macros design

@andrewfayres
Copy link
Contributor

Can we add a gist of the generated code?

Also, is there a reason that we're adding a new NDArray.random as an api instead of extending NDArray.api (same for Symbol)?

@lanking520
Copy link
Member

@andrewfayres here is what we are trying to do: https://mxnet.incubator.apache.org/api/python/ndarray/random.html. In NDArray API we already have some random generation function, but a Random namespace would be great since it is compatible with both NDArray/Float.

@mdespriee
Copy link
Contributor Author

mdespriee commented Oct 26, 2018

@lanking520 Here is an updated PR.
Still unfinished (see below), but you can give me your feedback on what's done so far.

  org.apache.mxnet.MXNetError: Cannot find argument 'lam', Possible Arguments:
----------------
shape : Shape(tuple), optional, default=[]
    Shape to be sampled from each random distribution.
dtype : {'None', 'float16', 'float32', 'float64'},optional, default='None'
    DType of the output in case this can't be inferred. Defaults to float32 if not defined (dtype=None).
, in operator _sample_poisson(name="", shape="(2,2)", lam="1.0")
  at org.apache.mxnet.Base$.checkCall(Base.scala:131)
  at org.apache.mxnet.Symbol$.createFromNamedSymbols(Symbol.scala:1178)
  at org.apache.mxnet.Symbol$.createSymbolGeneral(Symbol.scala:1106)
  at org.apache.mxnet.SymbolRandomAPI$.poisson(SymbolAPI.scala:12)

I'm investigating that. Any hint welcome !

  • Also, we have compilation warnings (in macro-generated code) about code erasure.

@lanking520
Copy link
Member

lanking520 commented Oct 26, 2018

@mdespriee thanks for your effort to refactor the code and I really appreciate your time spent on it. The issue shown above is the mismatch on the parameter. It can be caused by function name do not have this parameter, or you put a param in the wrong type. I would recommend you to split the refactoring portion and Random generation portion into two PRs. The good thing:

  1. you can focus on each of them and make the refactoring and Random part even better.

  2. Reduce the change on your code, it's easier for you to find where the problem is and easy for us to help you find where you can make a mistake.

  3. The change you have introduced is too big, it both influence the old code generation and the new feature generation. A lot of tests and verification has to be done before we merge the code. This can take a while...

I know this may introduce more work to you but I think this could be the best solution.

@mdespriee
Copy link
Contributor Author

Following @lanking520 comment, I split the work in 2 PRs:

@mdespriee mdespriee closed this Oct 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress Scala
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants