Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EvaluationSetClient for deepset cloud to fetch evaluation sets and la… #2345

Merged
merged 37 commits into from
Mar 31, 2022

Conversation

FHardow
Copy link
Member

@FHardow FHardow commented Mar 22, 2022

…bels for one specific evaluation set

Proposed changes:
This PR adds a EvaluationSetClient class, that will handle the communication with deepset cloud, to allow the fetching of labels of an evaluation set uploaded to deepset cloud.
New functionality:

  • list all labels for a given index name
  • fetch the number of labels for a given index
  • fetch index names from deepset cloud

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

closes #1985

Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the tests either, so please recheck for them.
The code looks already pretty good. Here and there you can make some improvements.

Copy link
Member Author

@FHardow FHardow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests magically appeared again 🎉

Comment on lines 729 to 732
for response in self.client.get_with_auto_paging(
url=evaluation_set_url, query_params={"name": evaluation_set_name}
):
return response.json().get("data", [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to fit together. get_with_auto_paging returns a generator of the objects within the "data" property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh true, let me change that :)

@@ -5,6 +5,8 @@
import time
from typing import Any, Dict, Generator, List, Optional, Tuple, Union

from haystack import Label, Document, Answer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this import introduces a cyclic dependency causing all tests to fail. Could you please try from haystack.schema import ...?

Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my confusion. But let's use label_index in DeepsetCloudDocumentStore again. Besides that the imports in deepsetcloud module seem to cause a cyclic dependency and we should return all the info about the data sets, not only the names.

Copy link
Member

@ArzelaAscoIi ArzelaAscoIi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! :) Let's add the requested changes from @tstadel and merge it :)

@FHardow FHardow requested review from tstadel and ArzelaAscoIi March 25, 2022 13:33
@FHardow FHardow marked this pull request as ready for review March 25, 2022 13:36
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more things need to be improved: get_labels and get_label_count need to match the base class's abstract method signature. Additionally label_index should work like in any other DocumentStore and be a init param. And I think the url of test_DeepsetCloudDocumentStore_fetches_lables_for_evaluation_set's responses definition is not correct.

def get_all_labels(
self,
index: Optional[str] = None,
label_index: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be index to match the abstractmethod's signature. Besides that, I think we should accept label_index as an init-param too and take that if no index is being passed to these methods. This is actually how all the other DocumentStores work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switch it to use the index parameter from the init function, see line 68.

@FHardow FHardow requested a review from tstadel March 29, 2022 14:56
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor changes to make it ready-to-merge left.

FHardow and others added 2 commits March 30, 2022 16:03
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Found some minor docstrings that we should improve. Besides that: ready to merge!

Copy link
Member

@ArzelaAscoIi ArzelaAscoIi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ! 🚀 Nice PR!

@FHardow FHardow merged commit a273c3a into master Mar 31, 2022
@FHardow FHardow deleted the feature/fetch-evaluation-set-from-dc branch March 31, 2022 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow DCDocumentStore to read labels
3 participants