Skip to content

Global Reasoning module for visual recognition

License

Notifications You must be signed in to change notification settings

Jiaxuan-Li/GloRe

 
 

Repository files navigation

GloRe

Implementation for: Graph-Based Global Reasoning Networks (CVPR19)

Software

  • Image recognition experiments are in MXNet @92053bd
  • Video and segmentation experiments are in PyTorch (0.5.0a0+783f2c6)

Train & Evaluate

Train kinetics (single node):

./run_local.sh

Train kinetics (multiple nodes):

# please setup ./Host before running
./run_dist.sh

Evaluate the trained model on kinetics:

cd test
# check $ROOT/test/*.txt for the testing log
python test-single-clip.py

Note:

  • The code is adapted from MFNet (ECCV18).
  • ImageNet pretrained models (R50, R101) might be required. Please put it under $ROOT/network/pretrained/.
  • For image classification and segmentation tasks, please refer the code below.

Results

Image Recognition (ImageNet-1k)

Model Method Res3 Res4 Code & Model Top-1
ResNet50 Baseline link 76.2 %
ResNet50 w/ GloRe +3 link 78.4 %
ResNet50 w/ GloRe +2 +3 link 78.2 %
SE-ResNet50 Baseline link 77.2 %
SE-ResNet50 w/ GloRe +3 link 78.7 %
Model Method Res3 Res4 Code & Model Top-1
ResNet200 w/ GloRe +3 link 79.4 %
ResNet200 w/ GloRe +2 +3 link 79.7 %
ResNeXt101 (32x4d) w/ GloRe +2 +3 link 79.8 %
DPN-98 w/ GloRe +2 +3 link 80.2 %
DPN-131 w/ GloRe +2 +3 link 80.3 %

* We use pre-activation[1] and strided convolution[2] for all networks for simplicity and consistency.

Video Recognition (Kinetics-400)

Model input frames stride Res3 Res4 Model Clip Top-1
Res50 (3D) + Ours 8 8 +2 +3 link 68.0 %
Res101 (3D) + Ours 8 8 +2 +3 link 69.2 %

* ImageNet-1k pretrained models: R50(link), R101(link).

Semantic Segmentation (Cityscapes)

Method Backbone Code & Model IoU cla. iIoU cla. IoU cat. iIoU cat.
FCN + 1 GloRe unit ResNet50 link 79.5% 60.3% 91.3% 81.5%
FCN + 1 GloRe unit ResNet101 link 80.9% 62.2% 91.5% 82.1%

* All networks are evaluated on Cityscapes test set by the testing server without using extra “coarse” training set.

Other Resources

ImageNet-1k Training/Validation List:

ImageNet-1k category name mapping table:

Kinetics Dataset:

Cityscapes Dataset:

FAQ

Where can I find the code for image classification and segmentation?

  • The code is packed with the model within the same *.tar file.

Do I need to convert the raw videos to specific format?

  • The `dataiter' supports reading from raw videos.

How can I make the training faster?

  • Remove HLS augmentation (won't make much difference); Try to convert the raw videos to lower resolution to reduce the decoding cost (We use <=288p for all experiment).

For example:

# convet to sort_edge_length <= 288
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(288*iw)/min(iw\,ih)):-1" -b:v 640k -an ${DST_VID}
# or, convet to sort_edge_length <= 256
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}
# or, convet to sort_edge_length <= 160
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(160*iw)/min(iw\,ih)):-1" -b:v 240k -an ${DST_VID}

Reference

[1] He, Kaiming, et al. "Identity mappings in deep residual networks."
[2] /~https://github.com/facebook/fb.resnet.torch

Citation

@inproceedings{chen2019graph,
  title={Graph-based global reasoning networks},
  author={Chen, Yunpeng and Rohrbach, Marcus and Yan, Zhicheng and Shuicheng, Yan and Feng, Jiashi and Kalantidis, Yannis},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={433--442},
  year={2019}
}

License

The code and the models are MIT licensed, as found in the LICENSE file.

About

Global Reasoning module for visual recognition

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Shell 1.2%