Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EvaluationSetClient for deepset cloud to fetch evaluation sets and la… #2345

Merged
merged 37 commits into from
Mar 31, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d0106db
EvaluationSetClient for deepset cloud to fetch evaluation sets and la…
FHardow Mar 22, 2022
382c2fb
make DeepsetCloudDocumentStore able to fetch uploaded evaluation set …
FHardow Mar 22, 2022
45fcf05
fix missing renaming of get_evaluation_set_names in DeepsetCloudDocum…
FHardow Mar 22, 2022
1534666
update documentation for evaluation set functionality in deepset clou…
FHardow Mar 22, 2022
3f559f0
DeepsetCloudDocumentStore tests for evaluation set functionality
FHardow Mar 22, 2022
21de609
rename index to evaluation_set_name for DeepsetCloudDocumentStore eva…
FHardow Mar 23, 2022
7168807
raise DeepsetCloudError when no labels were found for evaluation set
FHardow Mar 23, 2022
8e68bff
make use of .get_with_auto_paging in EvaluationSetClient
FHardow Mar 23, 2022
dc16792
Return result of get_with_auto_paging() as it parses the response alr…
FHardow Mar 25, 2022
2d5219b
Make schema import source more specific
FHardow Mar 25, 2022
9229110
fetch all evaluation sets for a workspace in deepset Cloud
FHardow Mar 25, 2022
164b8ec
Rename evaluation_set_name to label_index
FHardow Mar 25, 2022
ab22631
make use of generator functionality for fetching labels
FHardow Mar 25, 2022
68d7fbb
Update Documentation & Code Style
github-actions[bot] Mar 25, 2022
fdefaa5
Adjust function input for DeepsetCloudDocumentStore.get_all_labels, a…
FHardow Mar 25, 2022
4487b53
Merge branch 'feature/fetch-evaluation-set-from-dc' of github.com:dee…
FHardow Mar 25, 2022
56bdb6c
Match error message with pytest.raises
FHardow Mar 25, 2022
23bee8d
Update Documentation & Code Style
github-actions[bot] Mar 25, 2022
e2154b2
DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError w…
FHardow Mar 25, 2022
ab138a3
Merge branch 'feature/fetch-evaluation-set-from-dc' of github.com:dee…
FHardow Mar 25, 2022
6b6b8bf
remove unneeded import in tests
FHardow Mar 25, 2022
9604991
DeepsetCloudDocumentStore tests, make reponse bodies a string through…
FHardow Mar 25, 2022
86ee3f2
Merge branch 'master' of github.com:deepset-ai/haystack into feature/…
FHardow Mar 29, 2022
f3b03bc
DeepsetcloudDocumentStore.get_label_count - move raise to return
FHardow Mar 29, 2022
2e7059c
stringify uuid before json.dump as uuid is not serilizable
FHardow Mar 29, 2022
c75d5e6
DeepsetcloudDocumentStore - adjust response mocking in tests
FHardow Mar 29, 2022
5f3e4bb
DeepsetcloudDocumentStore - json dump response body in test
FHardow Mar 29, 2022
32b901a
DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient …
FHardow Mar 30, 2022
2553129
Update Documentation & Code Style
github-actions[bot] Mar 30, 2022
eb1054d
DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_res…
FHardow Mar 30, 2022
805b9ca
Merge branch 'feature/fetch-evaluation-set-from-dc' of github.com:dee…
FHardow Mar 30, 2022
f539c7c
DeepsetCloudDocumentStore - rename missed variable in test
FHardow Mar 30, 2022
a8b33aa
DeepsetCloudDocumentStore - rename missed label_index to index in doc…
FHardow Mar 30, 2022
a93e7f3
Update Documentation & Code Style
github-actions[bot] Mar 30, 2022
10eff9c
DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient
FHardow Mar 30, 2022
6638585
Merge branch 'feature/fetch-evaluation-set-from-dc' of github.com:dee…
FHardow Mar 30, 2022
709ac21
DeepsetCloudDocumentStore - fix typo in doc string
FHardow Mar 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/_src/api/api/document_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -4257,6 +4257,61 @@ exists.

None

<a id="deepsetcloud.DeepsetCloudDocumentStore.get_evaluation_sets"></a>

#### get\_evaluation\_sets

```python
def get_evaluation_sets() -> List[dict]
```

Returns a list of uploaded evaluation sets to deepset cloud.

**Returns**:

list of evaluation sets as dicts
These contain ("name", "evaluation_set_id", "created_at", "matched_labels", "total_labels") as fields.

<a id="deepsetcloud.DeepsetCloudDocumentStore.get_all_labels"></a>

#### get\_all\_labels

```python
def get_all_labels(index: Optional[str] = None, filters: Optional[Dict[str, Union[Dict, List, str, int, float, bool]]] = None, headers: Optional[Dict[str, str]] = None) -> List[Label]
```

Returns a list of labels for the given index name.

**Arguments**:

- `label_index`: Optional name of evaluation set for which labels should be searched.
If None, the DocumentStore's default label_index (self.label_index) will be used.
- `headers`: Not supported.

**Returns**:

list of Labels.

<a id="deepsetcloud.DeepsetCloudDocumentStore.get_label_count"></a>

#### get\_label\_count

```python
def get_label_count(label_index: Optional[str] = None, headers: Optional[Dict[str, str]] = None) -> int
```

Counts the number of labels for the given index and returns the value.

**Arguments**:

- `label_index`: Optional evaluation set name for which the labels should be counted.
If None, the DocumentStore's default index (self.index) will be used.
- `headers`: Not supported.

**Returns**:

number of labels for the given index

<a id="pinecone"></a>

# Module pinecone
Expand Down
38 changes: 35 additions & 3 deletions haystack/document_stores/deepsetcloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ def __init__(
f"{indexing_info['pending_file_count']} files are pending to be indexed. Indexing status: {indexing_info['status']}"
)

self.evaluation_set_client = DeepsetCloud.get_evaluation_set_client(
api_key=api_key, api_endpoint=api_endpoint, workspace=workspace, label_index=index
)

super().__init__()

def get_all_documents(
Expand Down Expand Up @@ -452,16 +456,44 @@ def write_documents(
"""
raise NotImplementedError("DeepsetCloudDocumentStore currently does not support writing documents.")

def get_evaluation_sets(self) -> List[dict]:
"""
Returns a list of uploaded evaluation sets to deepset cloud.

:return: list of evaluation sets as dicts
These contain ("name", "evaluation_set_id", "created_at", "matched_labels", "total_labels") as fields.
"""
return self.evaluation_set_client.get_evaluation_sets()

def get_all_labels(
self,
index: Optional[str] = None,
filters: Optional[Dict[str, Union[Dict, List, str, int, float, bool]]] = None,
headers: Optional[Dict[str, str]] = None,
) -> List[Label]:
raise NotImplementedError("DeepsetCloudDocumentStore currently does not support labels.")
"""
Returns a list of labels for the given index name.

def get_label_count(self, index: Optional[str] = None, headers: Optional[Dict[str, str]] = None) -> int:
raise NotImplementedError("DeepsetCloudDocumentStore currently does not support labels.")
:param label_index: Optional name of evaluation set for which labels should be searched.
If None, the DocumentStore's default label_index (self.label_index) will be used.
:filters: Not supported.
:param headers: Not supported.

:return: list of Labels.
"""
return self.evaluation_set_client.get_labels(label_index=index)

def get_label_count(self, label_index: Optional[str] = None, headers: Optional[Dict[str, str]] = None) -> int:
"""
Counts the number of labels for the given index and returns the value.

:param label_index: Optional evaluation set name for which the labels should be counted.
If None, the DocumentStore's default index (self.index) will be used.
:param headers: Not supported.

:return: number of labels for the given index
"""
return self.evaluation_set_client.get_labels_count(label_index=label_index)

def write_labels(
self,
Expand Down
134 changes: 134 additions & 0 deletions haystack/utils/deepsetcloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
import time
from typing import Any, Dict, Generator, List, Optional, Tuple, Union

from haystack.schema import Label, Document, Answer

try:
from typing import Literal
except ImportError:
Expand Down Expand Up @@ -637,6 +639,116 @@ def _build_workspace_url(self, workspace: Optional[str] = None):
return self.client.build_workspace_url(workspace)


class EvaluationSetClient:
def __init__(self, client: DeepsetCloudClient, workspace: Optional[str] = None, label_index: Optional[str] = None):
"""
A client to communicate with Deepset Cloud evaluation sets and labels.

:param client: Deepset Cloud client
:param workspace: workspace in Deepset Cloud
:param label_index: name of the label index

"""
self.client = client
self.workspace = workspace
self.label_index = label_index

def get_labels(self, label_index: Optional[str], workspace: Optional[str] = None) -> List[Label]:
"""
Searches for labels for a given evaluation set in deepset cloud. Returns a list of all found labels.
If no labels were found, raises DeepsetCloudError.

:param label_index: name of the evaluation set for which labels should be fetched
:param workspace: Optional workspace in Deepset Cloud
If None, the EvaluationSetClient's default workspace (self.workspace) will be used.

:return: list of Label
"""
try:
evaluation_set = next(self._get_evaluation_set(label_index=label_index, workspace=workspace))
except StopIteration:
raise DeepsetCloudError(f"No evaluation set found with the name {label_index}")

labels = self._get_labels_from_evaluation_set(
workspace=workspace, evaluation_set_id=evaluation_set["evaluation_set_id"]
)

return [
Label(
query=label_dict["query"],
document=Document(content=label_dict["context"]),
is_correct_answer=True,
is_correct_document=True,
origin="user-feedback",
answer=Answer(label_dict["answer"]),
id=label_dict["label_id"],
no_answer=False if label_dict.get("answer", None) else True,
pipeline_id=None,
created_at=None,
updated_at=None,
meta=label_dict["meta"],
filters={},
)
for label_dict in labels
]

def get_labels_count(self, label_index: Optional[str] = None, workspace: Optional[str] = None) -> int:
"""
Counts labels for a given evaluation set in deepset cloud.

:param label_index: Optional index in Deepset Cloud
If None, the EvaluationSetClient's default index (self.index) will be used.
:param workspace: Optional workspace in Deepset Cloud
If None, the EvaluationSetClient's default workspace (self.workspace) will be used.

:return: Number of labels for the given (or defaulting) index
"""
try:
evaluation_set = next(self._get_evaluation_set(label_index=label_index, workspace=workspace))
except StopIteration:
raise DeepsetCloudError(f"No evaluation set found with the name {label_index}")

return evaluation_set["total_labels"]

def get_evaluation_sets(self, workspace: Optional[str] = None) -> List[dict]:
"""
Searches for all evaluation set names in the given workspace in Deepset Cloud.

:param workspace: Optional workspace in Deepset Cloud
If None, the EvaluationSetClient's default workspace (self.workspace) will be used.

:return: List of dictionaries that represent deepset Cloud evaluation sets.
These contain ("name", "evaluation_set_id", "created_at", "matched_labels", "total_labels") as fields.
"""
evaluation_sets_response = self._get_evaluation_set(label_index=None, workspace=workspace)

return [eval_set for eval_set in evaluation_sets_response]

def _get_evaluation_set(self, label_index: Optional[str], workspace: Optional[str] = None) -> Generator:
if not label_index:
label_index = self.label_index

url = self._build_workspace_url(workspace=workspace)
evaluation_set_url = f"{url}/evaluation_sets"

for response in self.client.get_with_auto_paging(url=evaluation_set_url, query_params={"name": label_index}):
yield response

def _get_labels_from_evaluation_set(
self, workspace: Optional[str] = None, evaluation_set_id: Optional[str] = None
) -> Generator:
url = f"{self._build_workspace_url(workspace=workspace)}/evaluation_sets/{evaluation_set_id}"
labels = self.client.get(url=url).json()

for label in labels:
yield label

def _build_workspace_url(self, workspace: Optional[str] = None):
if workspace is None:
workspace = self.workspace
return self.client.build_workspace_url(workspace)


class DeepsetCloud:
"""
A facade to communicate with Deepset Cloud.
Expand Down Expand Up @@ -685,3 +797,25 @@ def get_pipeline_client(
"""
client = DeepsetCloudClient(api_key=api_key, api_endpoint=api_endpoint)
return PipelineClient(client=client, workspace=workspace, pipeline_config_name=pipeline_config_name)

@classmethod
def get_evaluation_set_client(
cls,
api_key: Optional[str] = None,
api_endpoint: Optional[str] = None,
workspace: str = "default",
label_index: str = "default",
) -> EvaluationSetClient:
"""
Creates a client to communicate with Deepset Cloud labels.

:param api_key: Secret value of the API key.
If not specified, will be read from DEEPSET_CLOUD_API_KEY environment variable.
:param api_endpoint: The URL of the Deepset Cloud API.
If not specified, will be read from DEEPSET_CLOUD_API_ENDPOINT environment variable.
:param workspace: workspace in Deepset Cloud
:param label_index: name of the evaluation set in Deepset Cloud

"""
client = DeepsetCloudClient(api_key=api_key, api_endpoint=api_endpoint)
return EvaluationSetClient(client=client, workspace=workspace, label_index=label_index)
Loading