-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove wrong retriever top_1 metrics from print_eval_report
#2510
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍 Before merging could you please have a look again at test_eval.py
to check whether there are other tests where TransformersReader
can be excluded?
@@ -785,6 +785,7 @@ def test_extractive_qa_eval_wrong_examples(reader, retriever_with_docs): | |||
|
|||
@pytest.mark.parametrize("retriever_with_docs", ["tfidf"], indirect=True) | |||
@pytest.mark.parametrize("document_store_with_docs", ["memory"], indirect=True) | |||
@pytest.mark.parametrize("reader", ["farm"], indirect=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This excludes the TransformersReader
from this test, which is a good change in my opinion. I was just surprised to find it here. Now that you are at it, please check also the other tests in test_eval.py
. For example, I see that the same change could be made to test_extractive_qa_eval_translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Currently top_1 metrics are shown for Retriever metrics too although they were calculated with
simulated_top_k_reader=1
. This is confusing and plain wrong. We should not show top_1 Retriever metrics at all.Proposed changes:
print_eval_report
n_wrong_examples
is0
Status (please check what you already did):
@Timoeller @neo-alex