[Roadmap] v0.2 release checklist #302

jermainewang · 2018-12-12T17:00:32Z

Thanks everyone for the hard work. We really did a lot for a smooth beta release. With the repo being opened and more community help in-coming, it is a good time to figure out the roadmap to v0.2 release. Here is a draft proposal, feel free to reply, comment and discuss about this. Note that the list is long but we could figure out the priority later. We'd like to hear your opinions and push DGL to a next stage.

Model examples

TreeLSTM w/ MXNet (PR [Model][MXNet] MXNet Tree LSTM example #279 by @szha )
GraphSage (@ZiyueHuang )
Improve GAT model speed (PR [Model] Improve GAT models #348 by @jermainewang )

Core system improvement

Tutorial/Blog

Batched graph classification in DGL (PR [Tutorial] Batched Graph Classification #360 by @mufeili )
Understanding GAT (@sufeidechabei )

Project improvement

Python lint check (PR [Lint] Pylint #330 by @jermainewang )
Win CI (PR [CI] Jenkins on Windows builds #324 by @BarclayII )
Auto doc build (by @VoVAllen )
Unify tests for different backends (PR [Test] Unify tests for different backends #333 by @BarclayII )

Deferred goals

(will not be included in this release unless someone takes over)

PinSage (@BarclayII )
More dataset:
Social networks: Reddit
Recommendation: Amazon product
Knowledge graph: YAGO, Freebase
Web page graph: CommonCrawl
Tensorflow backend
Kernel support max/min reducer.
Kernel support vector-shape edge features in src_mul_edge.
Kernel upport sparse src_mul_dst.
Improve scheduling.
- More friendly error messages and easier debugging.
- Other scheduling strategy: degree padding
- Optimize schedulers for pull.
- Cache the scheduling results for static graphs to improve performance.
Pytorch: Improve SPMM using coalesced indices.
MXNet: Support COO format.
MXNet: Speed up the conversion from COO to CSR, from numpy to CSR
MXNet: Support Gluon hybridization and optimize the computation graph to speed up.
Distributed training
- Simple RPC component (@aksnzhy )
- Distributed sampling (@aksnzhy )
- Simple KVStore for node embedding

The text was updated successfully, but these errors were encountered:

BarclayII · 2018-12-12T20:04:32Z

M2C.

Models:

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (Spatio-temporal stuff)
Large-Scale Learnable Graph Convolutional Networks (Uses CNN reducer)
PinSage (Large scale semi-supervised GCN for recommendation)
Source code learning, like bug detection and program translation. There should be a whole bunch of papers on this topic. The latter involves tree generation as well.
[EDIT] Graph State LSTMs.

Core system improvement:

Multigraph support: intelligently and efficiently check for duplicate edges on simple graphs, or whether the graph is a multigraph. Right now the burden is entirely on the users.
Fused operators: sparse softmax (nicer for GAT/transformer/etc.)
Group-applying on outbound or inbound edges of the same node (nicer for Capsule). We may need builtin functions such as group-softmax for this case.
Specialization support: add complete graphs.
Think of possible specialization support for "combination of regular graph components", such as the graph from transformer networks (a complete graph and a half-complete graph combined with a bipartite graph). We don't have to work on this in 0.2 if it sounds too complicated.
[EDIT] Node/edge removal support.

Project Improvement:

Pylint using flake8
Type-checking using mypy (they have comment-style type checking to work with Python 2)

Others:

See how far symbolic computation graphs can work in general (related to Tensorflow and MXNet)

zheng-da · 2018-12-13T01:43:37Z

I think we should support an operator that does dot(X1, X2.T) * adj. This is more general than spmv. When we generalize "multiply" and "addition", it'll be more general than generalized spmv. I think it's useful for transformer.

zheng-da · 2018-12-13T01:55:52Z

BTW, any action item for accelerating GAT?

jermainewang · 2018-12-13T02:40:14Z

I think we should support an operator that does dot(X1, X2.T) * adj. This is more general than spmv. When we generalize "multiply" and "addition", it'll be more general than generalized spmv. I think it's useful for transformer.

@zheng-da Is it similar to "sparse src_mul_dst" ?

zheng-da · 2018-12-13T03:40:27Z

I see what you mean by src_mul_dst. I think so. We can use this form of operations to accelerate other models such as GAT (actually, any models that use both source vertices and destination vertices in the edge computation).

How are we going to implement these operators? in DGL or in the backend? If we implement it in DGL, how to support async computation in MXNet?

BarclayII · 2018-12-13T04:34:24Z

BTW, any action item for accelerating GAT?

That will be sparse softmax I proposed?

How are we going to implement these operators? in DGL or in the backend? If we implement it in DGL, how to support async computation in MXNet?

Seems that PyTorch operators can be implemented externally (/~https://github.com/rusty1s/pytorch_scatter) so putting that in DGL repo should be fine.

I don't know if/how external operators can hook into MXNet; should we compile MXNet from source? Also I guess MXNet can implement these operators in their own repo regardless, since having these sparse operators should be always beneficial?

jermainewang · 2018-12-13T04:39:06Z

In terms of implementation, it's better to be in DGL so it can be used in every framework. In general, we should follow the guidance of each framework on implementing custom operators (such as this guide in pytorch). We should avoid dependencies on the framework's C++ libraries. This leaves us few choices including:
(1) Use python extension. Such as https://mxnet.incubator.apache.org/tutorials/gluon/customop.html .
(2) Use dynamic library. Such as https://pytorch.org/docs/stable/cpp_extension.html . Don't know about MX's solution yet. But we should investigate.

In terms of async, is MX's CustomOp async or not?

VoVAllen · 2018-12-13T05:16:34Z

Is there any plan for group_apply_edges API? I think this would be useful since we cannot do out-edges reduction at current stage.

zheng-da · 2018-12-13T14:17:14Z

Previously, we discussed caching the results from the schedulers. It helps us avoid the expensive scheduling. I just realized that there is a lot of data copy from CPU to GPU during the computation even though we have copied all data in Frame to GPU. The data copy occurs on Index (I suppose Index is always copied on CPUs first). Caching the scheduling result can also help avoid data copy from CPU to GPU.

jermainewang · 2018-12-13T19:37:18Z

@zheng-da , agree. This should be put on the roadmap.

Is there any plan for group_apply_edges API? I think this would be useful since we cannot do out-edges reduction at current stage.

This is somewhat related to the sparse softmax proposed by @BarclayII. In my mind, there are two levels. The lower-level is group_apply_edges that can be operated on both out-edges and in-edges. Built atop of it is the "sparse edge softmax" module that is widely used in many models. Agree this should be put in our roadmap.

BarclayII · 2018-12-13T21:42:44Z

I assume we also need a "sparse softmax" kernel (similar to TF)? What I was thinking is to have group_apply accept a node UDF with incoming/outgoing edges (similar to the ones for reduce functions). sparse_softmax could be one of such built-in UDFs.

zheng-da · 2018-12-14T02:43:31Z

We should add more MXNet tutorials in the website.

zheng-da · 2018-12-14T02:47:17Z

In terms of the implementation of the new operators, CustomOp in MXNet might not be a good way of implementing new operators. It's usually very slow. For performance, it's still best to implement them in the backend frameworks directly. At least, we can do it in MXNet. Not sure about Pytorch.

jermainewang · 2018-12-14T03:02:46Z

Do you know why is it slow? It might be a good chance to improve that part. Also we need to benchmark Pytorch's custom op to see how much overhead it has. We should try our best to have them in DGL. Otherwise, it will be really difficult to maintain them in every frameworks.

zheng-da · 2018-12-14T03:06:20Z

It calls Python code from C code. Because the operator is implemented in Python, it's expressiveness is limited. Implementing sparse softmax efficiently in Python is hard.

eric-haibin-lin · 2018-12-15T21:11:12Z

For sparse softmax I created a feature request at MXNet repo: apache/mxnet#12729

VoVAllen · 2018-12-16T08:55:01Z

Minor suggestion for project improvement:
Switch from nose to pytest for unittest. Mainly for two reasons:

pytest has test coverage report, which is useful to avoid bugs. And it's fully compatible with nose.
nose is deprecated since 2016. It has successor nose2 but seems more people chooses pytest.

AIwem · 2018-12-21T02:57:14Z

graph_nets?

jermainewang · 2018-12-21T19:01:22Z

graph_nets?

@alwem, could you elaborate?

AIwem · 2018-12-22T05:29:55Z

@jermainewang Have you consulted the idea of the graph_nets model? Some of their solutions seem to be good!

jermainewang · 2018-12-22T05:41:33Z

We did some investigation of graph_nets. We found that DGL could cover all the models in graph_nets. Maybe we miss something. Could you point out?

ghost · 2018-12-29T08:36:21Z

can node2vec with side information train on DGL, node2vec has a random walking to get the sequence.

In the future, GraphRNN will be add to DGL? it is more performance snap on large dataset

jermainewang · 2018-12-31T20:50:10Z

Hi @Huangzhanpeng , thank you for the suggestion. It will be great if you could help contribute node2vec and GraphRNN to DGL. From my understanding, the random walk can be done in networkx first and then used in DGL. GraphRNN is similar to the DGMG model (see our tutorials here) in that it is a generative model trained on a sequence of nodes/edges. I guess there will be many shared building blocks between the two.

ghost · 2019-01-01T04:16:38Z

@jermainewang Thank you for your response. In my actual work, node2vec's random walk on networkx is not available with large scale data. If there is time, I really want to try to implement graphrnn in dgl.

jermainewang · 2019-01-01T04:23:05Z

@Huangzhanpeng There is always time :). Please go ahead. If you encounter any problems during the implementation, feel free to raise questions on https://discuss.dgl.ai. The team is very responsive. About the random walk, @BarclayII is surveying the common random walk algorithm and we might include APIs for them in our next release.

jermainewang · 2019-02-18T21:09:15Z

Just updated the roadmap with a checklist. Our tentative date for this release is this month (02/28).

For all committers @zheng-da @szha @BarclayII @VoVAllen @ylfdq1118 @yzh119 @GaiYu0 @mufeili @aksnzhy @zzhang-cn @ZiyueHuang , please vote +1 if you agree with this plan.

BarclayII · 2019-02-19T01:56:36Z

I would rather reply with emoticon. +1 as reply would pollute the thread.

jermainewang · 2019-02-20T18:00:21Z

The release plan passed by voting.

lgalke · 2019-03-07T11:16:28Z

May I kindly ask whether there is an updated tentative date for the 0.2 release? I'm desperately waiting for some features and unfortunately cannot build dgl from-source on the server. Thanks for your efforts!

jermainewang · 2019-03-07T13:55:13Z

@lgalke Thanks for asking. Our release date is delayed for a week due to some performance issues found recently. We are waiting for the final PR to be merged #434 so you could expect a new release in 2 days !! It's our first major release after open source so we are still adapting to the release process. Thank you for your patience.

jermainewang · 2019-03-08T23:34:16Z

v0.2 has just been officially released. Thanks everyone for the support!

jermainewang mentioned this issue Dec 12, 2018

Undefined names have the potential to raise NameError #290

Closed

jermainewang added this to the v0.2 milestone Dec 12, 2018

jermainewang mentioned this issue Dec 14, 2018

[Doc] Auto-build proposal #201

Closed

lingfanyu mentioned this issue Jan 5, 2019

[BugFix] Make program thread-local to support multi-threading #338

Merged

7 tasks

jermainewang changed the title ~~[Roadmap] v0.2 release~~ [Roadmap] v0.2 release checklist Feb 18, 2019

szha pinned this issue Feb 19, 2019

zheng-da unpinned this issue Mar 7, 2019

lingfanyu mentioned this issue Mar 8, 2019

[Release] Update readme performance number #441

Merged

3 tasks

jermainewang closed this as completed Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] v0.2 release checklist #302

[Roadmap] v0.2 release checklist #302

jermainewang commented Dec 12, 2018 •

edited

Loading

BarclayII commented Dec 12, 2018 •

edited

Loading

zheng-da commented Dec 13, 2018

zheng-da commented Dec 13, 2018

jermainewang commented Dec 13, 2018 •

edited

Loading

zheng-da commented Dec 13, 2018 •

edited

Loading

BarclayII commented Dec 13, 2018

jermainewang commented Dec 13, 2018

VoVAllen commented Dec 13, 2018

zheng-da commented Dec 13, 2018

jermainewang commented Dec 13, 2018

BarclayII commented Dec 13, 2018

zheng-da commented Dec 14, 2018

zheng-da commented Dec 14, 2018

jermainewang commented Dec 14, 2018

zheng-da commented Dec 14, 2018

eric-haibin-lin commented Dec 15, 2018

VoVAllen commented Dec 16, 2018 •

edited

Loading

AIwem commented Dec 21, 2018

jermainewang commented Dec 21, 2018

AIwem commented Dec 22, 2018

jermainewang commented Dec 22, 2018

ghost commented Dec 29, 2018 •

edited by ghost

Loading

jermainewang commented Dec 31, 2018

ghost commented Jan 1, 2019 •

edited by ghost

Loading

jermainewang commented Jan 1, 2019

jermainewang commented Feb 18, 2019

BarclayII commented Feb 19, 2019

jermainewang commented Feb 20, 2019

lgalke commented Mar 7, 2019

jermainewang commented Mar 7, 2019

jermainewang commented Mar 8, 2019

[Roadmap] v0.2 release checklist #302

[Roadmap] v0.2 release checklist #302

Comments

jermainewang commented Dec 12, 2018 • edited Loading

Model examples

Core system improvement

Tutorial/Blog

Project improvement

Deferred goals

BarclayII commented Dec 12, 2018 • edited Loading

zheng-da commented Dec 13, 2018

zheng-da commented Dec 13, 2018

jermainewang commented Dec 13, 2018 • edited Loading

zheng-da commented Dec 13, 2018 • edited Loading

BarclayII commented Dec 13, 2018

jermainewang commented Dec 13, 2018

VoVAllen commented Dec 13, 2018

zheng-da commented Dec 13, 2018

jermainewang commented Dec 13, 2018

BarclayII commented Dec 13, 2018

zheng-da commented Dec 14, 2018

zheng-da commented Dec 14, 2018

jermainewang commented Dec 14, 2018

zheng-da commented Dec 14, 2018

eric-haibin-lin commented Dec 15, 2018

VoVAllen commented Dec 16, 2018 • edited Loading

AIwem commented Dec 21, 2018

jermainewang commented Dec 21, 2018

AIwem commented Dec 22, 2018

jermainewang commented Dec 22, 2018

ghost commented Dec 29, 2018 • edited by ghost Loading

jermainewang commented Dec 31, 2018

ghost commented Jan 1, 2019 • edited by ghost Loading

jermainewang commented Jan 1, 2019

jermainewang commented Feb 18, 2019

BarclayII commented Feb 19, 2019

jermainewang commented Feb 20, 2019

lgalke commented Mar 7, 2019

jermainewang commented Mar 7, 2019

jermainewang commented Mar 8, 2019

jermainewang commented Dec 12, 2018 •

edited

Loading

BarclayII commented Dec 12, 2018 •

edited

Loading

jermainewang commented Dec 13, 2018 •

edited

Loading

zheng-da commented Dec 13, 2018 •

edited

Loading

VoVAllen commented Dec 16, 2018 •

edited

Loading

ghost commented Dec 29, 2018 •

edited by ghost

Loading

ghost commented Jan 1, 2019 •

edited by ghost

Loading