-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EvaluationSetClient for deepset cloud to fetch evaluation sets and la… #2345
Conversation
…bels for one specific evaluation set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find the tests either, so please recheck for them.
The code looks already pretty good. Here and there you can make some improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests magically appeared again 🎉
haystack/utils/deepsetcloud.py
Outdated
for response in self.client.get_with_auto_paging( | ||
url=evaluation_set_url, query_params={"name": evaluation_set_name} | ||
): | ||
return response.json().get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to fit together. get_with_auto_paging
returns a generator of the objects within the "data" property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh true, let me change that :)
haystack/utils/deepsetcloud.py
Outdated
@@ -5,6 +5,8 @@ | |||
import time | |||
from typing import Any, Dict, Generator, List, Optional, Tuple, Union | |||
|
|||
from haystack import Label, Document, Answer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this import introduces a cyclic dependency causing all tests to fail. Could you please try from haystack.schema import ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my confusion. But let's use label_index
in DeepsetCloudDocumentStore
again. Besides that the imports in deepsetcloud
module seem to cause a cyclic dependency and we should return all the info about the data sets, not only the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! :) Let's add the requested changes from @tstadel and merge it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more things need to be improved: get_labels
and get_label_count
need to match the base class's abstract method signature. Additionally label_index
should work like in any other DocumentStore and be a init param. And I think the url of test_DeepsetCloudDocumentStore_fetches_lables_for_evaluation_set
's responses definition is not correct.
def get_all_labels( | ||
self, | ||
index: Optional[str] = None, | ||
label_index: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be index
to match the abstractmethod's signature. Besides that, I think we should accept label_index
as an init-param too and take that if no index is being passed to these methods. This is actually how all the other DocumentStores work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switch it to use the index
parameter from the init function, see line 68.
…djust tests for it, fix typos, make linter happy
…pset-ai/haystack into feature/fetch-evaluation-set-from-dc
…hen no evaluation set was found to count labels on
…pset-ai/haystack into feature/fetch-evaluation-set-from-dc
…fetch-evaluation-set-from-dc
…rename label_index to evaluation_set
…ponse as there is a name clash with the input variable
…pset-ai/haystack into feature/fetch-evaluation-set-from-dc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor changes to make it ready-to-merge left.
… string, rename label_index to evaluation_set in EvaluationSetClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Found some minor docstrings that we should improve. Besides that: ready to merge!
…pset-ai/haystack into feature/fetch-evaluation-set-from-dc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good ! 🚀 Nice PR!
…bels for one specific evaluation set
Proposed changes:
This PR adds a EvaluationSetClient class, that will handle the communication with deepset cloud, to allow the fetching of labels of an evaluation set uploaded to deepset cloud.
New functionality:
Status (please check what you already did):
closes #1985