This document contains the models to reproduce of results by using the are searched archtectures from MMnas. The prototxt files for mmnas architecture for different tasks can be found in the arch
folder. We provide pretrained models on different tasks to reproduce the results reported in our paper. You can download these ckpt files, place them at logs/ckpts/
, and run the train_[vqa/vgd/itm].py
to evaluate the performance.
We train the mmnas_vqa on the train+val+vg
split and evaluated the model on the test-dev
split. The pretrained model can be downloaded here. For comparsion, we also report the results from previous state-of-the-art mcan model.
Model |
Base lr |
Overall (%) |
Yes/No (%) |
Number (%) |
Other (%) |
mcan |
1e-4 |
70.69 |
87.08 |
53.16 |
60.66 |
mmnas |
1e-4 |
71.25 |
87.20 |
55.63 |
61.15 |
We use the same mmnas_vgd archtecture for all the three datasets and then train the model for each dataset independently. The pretrained models on three datasets can be downloaded as follows.
RefCOCO | RefCOCO+ | RefCOCORg |
Val |
TestA |
TestB |
83.66% |
87.25% |
78.78 |
|
Val |
TestA |
TestB |
74.48% |
81.00% |
65.15% |
|
|
model |
model |
model |
Using the mmnas_itm archtecture, we obtain the model to report the following results. The pretrained model can be downloaded here.
Text Retrival | Image Retrival |
R@1 |
R@5 |
R@10 |
77.30 |
93.50 |
97.10 |
|
R@1 |
R@5 |
R@10 |
60.88 |
84.86 |
90.40 |
|