This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021.
@inproceedings{joint_iccv_2021,
title={Joint Inductive and Transductive Learning for Video Object Segmentation},
author={Yunyao Mao, Ning Wang, Wengang Zhou, Houqiang Li},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year={2021}
}
git clone /~https://github.com/maoyunyao/JOINT.git
Please check the detailed installation instructions.
The whole network is trained with 8 NVIDIA GTX 1080Ti GPUs
conda activate pytracking
cd ltr
python run_training.py joint joint_stage1 # stage 1
python run_training.py joint joint_stage2 # stage 2
Note: We initialize the backbone ResNet with pre-trained Mask-RCNN weights as in LWL. These weights can be obtained from here. Before training, you need to download and save these weights in env_settings().pretrained_networks directory.
conda activate pytracking
cd pytracking
python run_tracker.py joint joint_davis --dataset_name dv2017_val # DAVIS 2017 Val
python run_tracker.py joint joint_ytvos --dataset_name yt2018_valid_all # YouTube-VOS 2018 Val
python run_tracker.py joint joint_ytvos --dataset_name yt2019_valid_all # YouTube-VOS 2019 Val
Note: Before evaluation, the pretrained networks (see model zoo) should be downloaded and saved into the directory set by "network_path" in "pytracking/evaluation/local.py". By default, it is set to pytracking/networks.
Model | YouTube-VOS 2018 (Overall Score) | YouTube-VOS 2019 (Overall Score) | DAVIS 2017 val (J&F score) | Links | Raw Results |
---|---|---|---|---|---|
JOINT_ytvos | 83.1 | 82.8 | -- | model | results |
JOINT_davis | -- | -- | 83.5 | model | results |
- Our JOINT segmentation tracker is implemented based on pytracking. We sincerely thank the authors Martin Danelljan and Goutam Bhat for providing such a great framework.
- We adopt the few-shot learner proposed in LWL as the Induction branch.