Add MTGATConv (Paper: MTAG, NAACL 2021) - new GATConv version with node types, edge types and parallelism #3129

jedyang97 · 2021-09-14T03:43:58Z

jedyang97
Sep 14, 2021

Hi, I am the author of the NAACL 2021 paper: "MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences".

Paper: https://aclanthology.org/2021.naacl-main.79/
GitHub: /~https://github.com/jedyang97/MTAG

Let me first say, PyG is awesome and we love to see it grows in users and reputation.

One of our main contribution in the paper was that we designed and implemented a new graph convolution called MTGATConv. I am wondering if it would be helpful to contribute the MTGATConv operation (implemented here) to the public PyG repo?

In short, this MTGATConv operation is a version of GATConv that uses distinct learnable parameters for different node types and edge types. Compared to the vanilla GATConv, MTGATConv takes the following additional arguments: num_node_types: int, num_edge_types: int in __init__()and x_type: Tensor, edge_type: Tensor in forward(). This could be useful for heterogeneous graphs where each node type and edge type should be treated differently. In our paper, we were dealing with multimodal sequence data (video) and had success in improving sentiment analysis/emotion classification performance.

I also took a look at the implementation of hetero_conv PyG 2.0 (Again, this is a super exciting new feature!). This operation kind of subsume the MTGATConv operation, if you define multiple (src_type, edge_type, dst_type) and use GATConv as the operation. However, looking at the implementation here, one major difference is that this would use a for-loop to process each (src_type, edge_type, dst_type) sub-graph, which may break parallelism; whereas the MTGATConv implementation linked above would do this in a parallel manner. I am not sure if this statement about parallelism is true as I haven't spent too long reading the PyG 2.0 implementation. Another consideration is how would this MTGATConv implementation fit into the Heterogeneous graph design pattern provided by PyG 2.0 - I would like to hear your architectural opinions about it!

If this is something that you think would be worth adding to the repo, please let me know if you have any suggestions for the current implementation. Thanks!

rusty1s · 2021-09-14T06:42:35Z

rusty1s
Sep 14, 2021
Maintainer

You are correct regarding your statement about parallelism. Parallelism can only be achieved in case different node types share the same feature dimensionality, which may not always be the case for input features in heterogeneous graphs. Furthermore, one of the main benefits of HeteroConv is to utilize different GNN operators for different edge types.

Therefore, I'm super happy to take a PR for this one. I'm nonetheless not sure if x_type is desperately needed, as you can also pass in a dictionary of node features and transform it into a unified feature matrix sequentially. This is a memory/runtime trade-off, as the line

node_type_specific_lin_l = self.lin_l[x_type]  # shape (num_nodes, in_channels, heads * out_channels)

will materialize a really huge tensor.

0 replies

jedyang97 · 2021-09-14T13:22:50Z

jedyang97
Sep 14, 2021
Author

Glad to hear! In this case, I will work on a PR to merge this.

For the node's transformation, I agree with your suggestion to use a Dict. The only thing is that perhaps we need an OrderedDict, because later we probably need to gather back nodes of all types into one tensor since the edge_index is indexing over all node types.

Now that you mention the materialization problem, I just realized that this line that is selecting the alpha for each edge to use would also cause a high GPU mem consumption:

alpha = alpha.reshape(-1, num_heads).index_select(dim=0, index=(torch.arange(0, num_edges).to(alpha.device) * num_edge_types + edge_type))

But maybe this is unavoidable if we want to at least have parallelism across the edges? The alternative is to also use a Dict for the edge types, which is like how it is done in hetero_conv, but that uses for-loop.

So in short, I think there are three solutions:

Use one tensor for nodes, use one tensor for edge_types (current implementation, fast run time, high mem comsumption)
Use OrderedDict for nodes, use one tensor for edge_types (not yet implemented, the slightly slower run time (due to node-transforming being sequential), lower mem comsumption)
Use Dict for nodes, use Dict for edge_types (implemented in hetero_conv, even slower run time (due to node-transforming and edge-processing being sequential), even lower mem consumption)

I am leaning toward solution 2, as usually there are fewer node types compared to edge types, so the slower run time might be acceptable. Another consideration is that solution 2 might add a slight burden on the users when they are constructing the edge_index and x_dict, as they need to group the nodes while also keep the indexing of edge_index global across nodes of all types.

What are your thoughts? Thanks!

1 reply

rusty1s Sep 15, 2021
Maintainer

I think the alpha.index_select call is reasonable, as the number of heads is usually small. You are right in your observation that version 2 has a conflict in input arguments. Mh, I think we should go with version 1 one first, or alternatively use version 3 in which we merge dictionaries into single tensors first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MTGATConv (Paper: MTAG, NAACL 2021) - new GATConv version with node types, edge types and parallelism #3129

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Add MTGATConv (Paper: MTAG, NAACL 2021) - new GATConv version with node types, edge types and parallelism #3129

jedyang97 Sep 14, 2021

Replies: 2 comments · 1 reply

rusty1s Sep 14, 2021 Maintainer

jedyang97 Sep 14, 2021 Author

rusty1s Sep 15, 2021 Maintainer

jedyang97
Sep 14, 2021

Replies: 2 comments 1 reply

rusty1s
Sep 14, 2021
Maintainer

jedyang97
Sep 14, 2021
Author

rusty1s Sep 15, 2021
Maintainer