Nicola Ferro, Stefano Marchesin, Alberto Purpura and Gianmaria Silvello
This is the docker image of our implementation of Neural Vector Space Model (NVSM) conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub has been tested with the jig at commit ca31987 (6/5/2019).
- Supported test collections:
robust04
- Supported hooks:
init
,index
,train
,search
The following jig
command can be used to index TREC disks 4/5 for robust04
:
python run.py prepare \
--repo albep/nvsm \
--collections robust04=/path/to/disk45=trectext
The following jig
command can be used to train the retrieval model on the robust04
collection:
python run.py train \
--repo albep/nvsm \
--model_folder path/model/directory \
--topic topics/topics.robust04.txt \
--test_split sample_training_validation_query_ids/robust04_test.txt \
--validation_split sample_training_validation_query_ids/robust04_validation.txt \
--qrels qrels/qrels.robust04.txt \
--opts epochs=12 \
--collection Robust04
The following jig
command can be used to perform a retrieval run on the collection with the robust04
test collection.
python run.py search \
--repo albep/nvsm \
--output path/model/directory \
--qrels qrels/qrels.robust04.txt \
--topic topics/topics.robust04.txt \
--test_split sample_training_validation_query_ids/robust04_test.txt \
--collection robust04
MAP | NVSM CPU | NVSM GPU |
---|---|---|
Robust04 test split topics | 0.138 | 0.138* |
* Results with the NVSM GPU image may slightly vary. In fact, TensorFlow uses the Eigen library, which uses Cuda atomics to implement reduction operations, such as tf.reduce_sum etc. Those operations are non-deterministical and each operation can introduce small variations. See this Tensorflow issue for more details.
The paths path/to/model/directory
, passed to the train
script, and path/model/directory
, passed to the search
one, need to point to the same directory.
nvsm_gpu requires nvidia-docker (/~https://github.com/NVIDIA/nvidia-docker) installed on the host machine.