PyTorch version of GraphZoo.
Facilitating learning, using, and designing graph processing pipelines/models systematically.
We present a novel framework GraphZoo, that makes learning, using, and designing graph processing pipelines/models systematic by abstraction over the redundant components. The framework contains a powerful library that supports several hyperbolic manifolds and an easy-to-use modular framework to perform graph processing tasks which aids researchers in different components, namely, (i) reproduce evaluation pipelines of state-of-the-art approaches, (ii) design new hyperbolic or Euclidean graph networks and compare them against the state-of-the art approaches on standard benchmarks, (iii) add custom datasets for evaluation, (iv) add new tasks and evaluation criteria. For more details check out our paper and poster.
git clone /~https://github.com/AnoushkaVyas/GraphZoo.git
cd GraphZoo
python setup.py install
pip install graphzoo
To train a Hyperbolic Graph Convolutional Networks model for node classification task on Cora dataset, make use of GraphZoo APIs customized loss functions and evaluation metrics for this task.
Download data:
import graphzoo as gz
import torch
from graphzoo.config import parser
params = parser.parse_args(args=[])
params.download_folder ='./data/'
gz.dataloader.download_and_extract(params)
Prepare input data:
params.dataset='cora'
params.task='nc'
params.datapath='data/cora'
data = gz.dataloader.DataLoader(params)
Initialize the model and fine-tune the hyperparameters:
params.model='HGCN'
params.manifold='PoincareBall'
params.dim=128
model= gz.models.NCModel(params)
Trainer
is used to control the training flow:
optimizer = gz.optimizers.RiemannianAdam(params=model.parameters(), lr=params.lr, weight_decay=params.weight_decay)
trainer=gz.trainers.Trainer(params,model,optimizer,data)
trainer.run()
trainer.evaluate()
To train a Hyperbolic Graph Convolutional Networks model for node classification task on Cora dataset using command line:
cd GraphZoo
python graphzoo/trainers/train.py --task nc --dataset cora --datapath ./data/cora --download_folder ./data/ --model HGCN --lr 0.01 --dim 16 --num-layers 2 --act relu --bias 1 --dropout 0.5 --weight-decay 0.001 --manifold PoincareBall --log-freq 5 --cuda 0 --c None
Various flags can be modified in the graphzoo.config
module by the user.
"""
GraphZoo Download and Extract
Input Parameters
----------
'dataset': (None, 'which dataset to use, can be any of [cora, pubmed, airport, disease_nc, disease_lp] (type: str)')
'download_folder': (None, 'path to the folder for raw data (type: str)')
API Input Parameters
----------
args: list of above defined input parameters from `graphzoo.config`
"""
"""
GraphZoo Dataloader
Input Parameters
----------
'dataset': ('cora', 'which dataset to use, can be any of [cora, pubmed, airport, disease_nc, disease_lp, ppi, citeseer, webkb] (type: str)'),
'datapath': (None, 'path to raw data (type: str)'),
'val-prop': (0.05, 'proportion of validation edges for link prediction (type:float)'),
'test-prop': (0.1, 'proportion of test edges for link prediction (type: float)'),
'use-feats': (1, 'whether to use node features (1) or not (0 in case of Shallow methods) (type: int)'),
'normalize-feats': (1, 'whether to normalize input node features (1) or not (0) (type: int)'),
'normalize-adj': (1, 'whether to row-normalize the adjacency matrix (1) or not(0) (type: int)'),
'split-seed': (1234, 'seed for data splits (train/test/val) (type: int)')
API Input Parameters
----------
args: list of above defined input parameters from `graphzoo.config`
"""
"""
Base model for graph embedding tasks
Input Parameters
----------
'task': ('nc', 'which tasks to train on, can be any of [lp, nc] (type: str)'),
'model': ('HGCN', 'which encoder to use, can be any of [Shallow, MLP, HNN, GCN, GAT, HGCN,HGAT] (type: str)'),
'dim': (128, 'embedding dimension (type: int)'),
'manifold': ('PoincareBall', 'which manifold to use, can be any of [Euclidean, Hyperboloid, PoincareBall] (type: str)'),
'c': (1.0, 'hyperbolic radius, set to None for trainable curvature (type: float)'),
'r': (2.0, 'fermi-dirac decoder parameter for lp (type: float)'),
't': (1.0, 'fermi-dirac decoder parameter for lp (type: float)'),
'pretrained-embeddings': (None, 'path to pretrained embeddings (.npy file) for Shallow node classification (type: str)'),
'num-layers': (2, 'number of hidden layers in encoder (type: int)'),
'bias': (1, 'whether to use bias (1) or not (0) (type: int)'),
'act': ('relu', 'which activation function to use or None for no activation (type: str)'),
'n-heads': (4, 'number of attention heads for graph attention networks, must be a divisor dim (type: int)'),
'alpha': (0.2, 'alpha for leakyrelu in graph attention networks (type: float)'),
'use-att': (0, 'whether to use hyperbolic attention (1) or not (0) (type: int)'),
'local-agg': (0, 'whether to local tangent space aggregation (1) or not (0) (type: int)')
API Input Parameters
----------
args: list of above defined input parameters from `graphzoo.config`
"""
"""
GraphZoo Trainer
Input Parameters
----------
'lr': (0.05, 'initial learning rate (type: float)'),
'dropout': (0.0, 'dropout probability (type: float)'),
'cuda': (-1, 'which cuda device to use or -1 for cpu training (type: int)'),
'repeat': (10, 'number of times to repeat the experiment (type: int)'),
'optimizer': ('Adam', 'which optimizer to use, can be any of [Adam, RiemannianAdam, RiemannianSGD] (type: str)'),
'epochs': (5000, 'maximum number of epochs to train for (type:int)'),
'weight-decay': (0.0, 'l2 regularization strength (type: float)'),
'momentum': (0.999, 'momentum in optimizer (type: float)'),
'patience': (100, 'patience for early stopping (type: int)'),
'seed': (1234, 'seed for training (type: int)'),
'log-freq': (5, 'how often to compute print train/val metrics in epochs (type: int)'),
'eval-freq': (1, 'how often to compute val metrics in epochs (type: int)'),
'save': (0, '1 to save model and logs and 0 otherwise (type: int)'),
'save-dir': (None, 'path to save training logs and model weights (type: str)'),
'lr-reduce-freq': (None, 'reduce lr every lr-reduce-freq or None to keep lr constant (type: int)'),
'gamma': (0.5, 'gamma for lr scheduler (type: float)'),
'grad-clip': (None, 'max norm for gradient clipping, or None for no gradient clipping (type: float)'),
'min-epochs': (100, 'do not early stop before min-epochs (type: int)'),
'betas': ((0.9, 0.999), 'coefficients used for computing running averages of gradient and its square (type: Tuple[float, float])'),
'eps': (1e-8, 'term added to the denominator to improve numerical stability (type: float)'),
'amsgrad': (False, 'whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond` (type: bool)'),
'stabilize': (None, 'stabilize parameters if they are off-manifold due to numerical reasons every ``stabilize`` steps (type: int)'),
'dampening': (0,'dampening for momentum (type: float)'),
'nesterov': (False,'enables Nesterov momentum (type: bool)')
API Input Parameters
----------
args: list of above defined input parameters from `graphzoo.config`
optimizer: a :class:`optim.Optimizer` instance
model: a :class:`BaseModel` instance
"""
- Add the dataset files in the
data
folder of the source code. - To run this code on new datasets, please add corresponding data processing and loading in
load_data_nc
andload_data_lp
functions indataloader/dataloader.py
in the source code.
Output format for node classification dataloader is:
data = {'adj_train': adj, 'features': features, 'labels': labels, 'idx_train': idx_train, 'idx_val': idx_val, 'idx_test': idx_test}
Output format for link prediction dataloader is:
data = {'adj_train': adj_train, 'features': features, ‘train_edges’: train_edges, ‘train_edges_false’: train_edges_false, ‘val_edges’: val_edges, ‘val_edges_false’: val_edges_false, ‘test_edges’: test_edges, ‘test_edges_false’: test_edges_false, 'adj_train_norm':adj_train_norm}
- Attention layers can be added in
layers/att_layers.py
in the source code by adding a class in the file. - Hyperbolic layers can be added in
layers/hyp_layers.py
in the source code by adding a class in the file. - Other layers like a single GCN layer can be added in
layers/layers.py
in the source code by adding a class in the file.
- After adding custom layers, custom models can be added in
models/encoders.py
in the source code by adding a class in the file. - After adding custom layers, custom decoders to calculate the final output can be added in
models/decoders.py
in the source code by adding a class in the file.
The included datasets are the following and they need to be downloaded from the link:
- Cora
- Pubmed
- Disease
- Airport
- PPI
- Webkb
- Citeseer
- Shallow Euclidean
- Shallow Hyperbolic
- Multi-Layer Perceptron (MLP)
- Hyperbolic Neural Networks (HNN)
- Graph Convolutional Neural Networks (GCN)
- Graph Attention Networks (GAT)
- Hyperbolic Graph Convolutions (HGCN)
- Hyperbolic Graph Attention Networks (HGAT)
Some of the code was forked from the following repositories.
If you use GraphZooZoo in your research, please use the following BibTex entry.
@inproceedings{10.1145/3487553.3524241,
author = {Vyas, Anoushka and Choudhary, Nurendra and Khatir, Mehrdad and Reddy, Chandan K.},
title = {GraphZoo: A Development Toolkit for Graph Neural Networks with Hyperbolic Geometries},
year = {2022},
isbn = {9781450391306},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3487553.3524241},
doi = {10.1145/3487553.3524241},
booktitle = {Companion Proceedings of the Web Conference 2022},
keywords = {graph learning, graph neural network, hyperbolic models, software},
location = {Lyon, France},
series = {WWW '22}
}
Copyright (c) 2022 Anoushka Vyas