Add MTGATConv (Paper: MTAG, NAACL 2021) - new GATConv version with node types, edge types and parallelism #3129
Replies: 2 comments 1 reply
-
You are correct regarding your statement about parallelism. Parallelism can only be achieved in case different node types share the same feature dimensionality, which may not always be the case for input features in heterogeneous graphs. Furthermore, one of the main benefits of Therefore, I'm super happy to take a PR for this one. I'm nonetheless not sure if node_type_specific_lin_l = self.lin_l[x_type] # shape (num_nodes, in_channels, heads * out_channels) will materialize a really huge tensor. |
Beta Was this translation helpful? Give feedback.
-
Glad to hear! In this case, I will work on a PR to merge this. For the node's transformation, I agree with your suggestion to use a Dict. The only thing is that perhaps we need an Now that you mention the materialization problem, I just realized that this line that is selecting the alpha for each edge to use would also cause a high GPU mem consumption: alpha = alpha.reshape(-1, num_heads).index_select(dim=0, index=(torch.arange(0, num_edges).to(alpha.device) * num_edge_types + edge_type)) But maybe this is unavoidable if we want to at least have parallelism across the edges? The alternative is to also use a Dict for the edge types, which is like how it is done in So in short, I think there are three solutions:
I am leaning toward solution 2, as usually there are fewer node types compared to edge types, so the slower run time might be acceptable. Another consideration is that solution 2 might add a slight burden on the users when they are constructing the edge_index and x_dict, as they need to group the nodes while also keep the indexing of edge_index global across nodes of all types. What are your thoughts? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi, I am the author of the NAACL 2021 paper: "MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences".
Paper: https://aclanthology.org/2021.naacl-main.79/
GitHub: /~https://github.com/jedyang97/MTAG
Let me first say, PyG is awesome and we love to see it grows in users and reputation.
One of our main contribution in the paper was that we designed and implemented a new graph convolution called
MTGATConv
. I am wondering if it would be helpful to contribute theMTGATConv
operation (implemented here) to the public PyG repo?In short, this
MTGATConv
operation is a version of GATConv that uses distinct learnable parameters for different node types and edge types. Compared to the vanilla GATConv, MTGATConv takes the following additional arguments:num_node_types: int
,num_edge_types: int
in__init__()
andx_type: Tensor
,edge_type: Tensor
inforward()
. This could be useful for heterogeneous graphs where each node type and edge type should be treated differently. In our paper, we were dealing with multimodal sequence data (video) and had success in improving sentiment analysis/emotion classification performance.I also took a look at the implementation of
hetero_conv
PyG 2.0 (Again, this is a super exciting new feature!). This operation kind of subsume theMTGATConv
operation, if you define multiple(src_type, edge_type, dst_type)
and useGATConv
as the operation. However, looking at the implementation here, one major difference is that this would use a for-loop to process each(src_type, edge_type, dst_type)
sub-graph, which may break parallelism; whereas theMTGATConv
implementation linked above would do this in a parallel manner. I am not sure if this statement about parallelism is true as I haven't spent too long reading the PyG 2.0 implementation. Another consideration is how would thisMTGATConv
implementation fit into the Heterogeneous graph design pattern provided by PyG 2.0 - I would like to hear your architectural opinions about it!If this is something that you think would be worth adding to the repo, please let me know if you have any suggestions for the current implementation. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions