Add SentenceTransformersRanker with pre-trained Cross-Encoder #1209

julian-risch · 2021-06-18T09:51:03Z

In contrast to FARMRanker, SentenceTransformerRanker uses the logit as similarity score and not the classifier's probability of label "1"
see example here: https://www.sbert.net/docs/pretrained-models/ce-msmarco.html#usage-with-transformer

I tested with a subset of the nq_dev dataset. Here are the results of a pipeline with ElasticsearchRetriever and SentenceTransformerRanker with "cross-encoder/ms-marco-MiniLM-L-12-v2" as model:

EvalRetriever
-----------------
has_answer recall@2: 0.7200 (18/25)
no_answer recall@2:  1.00 (25/25) (no_answer samples are always treated as correctly retrieved)
has_answer mean_reciprocal_rank@2: 0.6200
no_answer mean_reciprocal_rank@2:  1.0000 (no_answer samples are always treated as correctly retrieved at rank 1)
recall@2: 0.8600 (43 / 50)
mean_reciprocal_rank@2: 0.8100

Retriever (Speed)
---------------
No indexing performed via Retriever.run()
Queries Performed: 50
Query time: 0.3390099899999086s
0.0067801997999981725 seconds per query

EvalRanker
-----------------
has_answer recall@2: 0.7600 (19/25)
no_answer recall@2:  1.00 (25/25) (no_answer samples are always treated as correctly retrieved)
has_answer mean_reciprocal_rank@2: 0.6600
no_answer mean_reciprocal_rank@2:  1.0000 (no_answer samples are always treated as correctly retrieved at rank 1)
recall@2: 0.8800 (44 / 50)
mean_reciprocal_rank@2: 0.8300

Ranker (Speed)
---------------
Queries Performed: 50
Query time: 161.3018365920002s
3.226036731840004 seconds per query

Limitations: documentation on the website has not been updated. It might be unclear/confusing for users at the moment whether to use FARMRanker or SentenceTransformerRanker.

closes #1129

tholor

I think the separation makes sense (at least for now). We might combine them later on and rather add an arg similarity_type to differentiate between the two different approaches.

Please add some more documentation (see comments) and a basic test case for both rankers that ensures the expected scores / sorting of some dummy docs (FARM + sentencetransformers).

haystack/ranker/sentence_transformers.py

julian-risch · 2021-07-07T13:40:48Z

@tholor Thank you for your feedback! I made the requested changes.

tholor

Looking good! Thx for the changes.

I added a minor sentence to the docstring (we should keep in mind that some user might not know what Re-ranking is and explain at least very briefly the "value / use case".

Also added the import to the init so that we can just import from haystack.ranker import SentenceTransformersRanker similar to our other building blocks.

Add SentenceTransformersRanker with pre-trained Cross-Encoder

191d1f7

julian-risch marked this pull request as ready for review July 2, 2021 12:49

julian-risch requested a review from tholor July 2, 2021 12:49

julian-risch changed the title ~~WIP: Add SentenceTransformersRanker with pre-trained Cross-Encoder~~ Add SentenceTransformersRanker with pre-trained Cross-Encoder Jul 5, 2021

tholor requested changes Jul 6, 2021

View reviewed changes

haystack/ranker/sentence_transformers.py Show resolved Hide resolved

haystack/ranker/sentence_transformers.py Outdated Show resolved Hide resolved

haystack/ranker/sentence_transformers.py Outdated Show resolved Hide resolved

Add test cases for Ranker nodes and update documentation

d6d47a9

julian-risch requested a review from tholor July 7, 2021 13:38

tholor added 4 commits July 7, 2021 17:09

update docstring

1d8d4d8

Update docstring

1f33fbe

Update __init__.py

403f184

update import for test

998d610

tholor approved these changes Jul 7, 2021

View reviewed changes

julian-risch merged commit dbb9efb into master Jul 7, 2021

julian-risch deleted the transformers_ranker branch July 7, 2021 15:31

julian-risch mentioned this pull request Jul 7, 2021

Input to FarmRanker model #1258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SentenceTransformersRanker with pre-trained Cross-Encoder #1209

Add SentenceTransformersRanker with pre-trained Cross-Encoder #1209

julian-risch commented Jun 18, 2021 •

edited

Loading

tholor left a comment

julian-risch commented Jul 7, 2021

tholor left a comment

Add SentenceTransformersRanker with pre-trained Cross-Encoder #1209

Add SentenceTransformersRanker with pre-trained Cross-Encoder #1209

Conversation

julian-risch commented Jun 18, 2021 • edited Loading

tholor left a comment

Choose a reason for hiding this comment

julian-risch commented Jul 7, 2021

tholor left a comment

Choose a reason for hiding this comment

julian-risch commented Jun 18, 2021 •

edited

Loading