DC SDK - load pipeline from deepset cloud #2013

ArzelaAscoIi · 2022-01-17T17:13:24Z

Proposed changes:
Deepset cloud SDK - load, run and evaluate pipelines from deepset cloud. We already have the possibility to load an existing pipeline from a yaml configuration. As another origin we can now load pipelines from deepset cloud.

Usage
Set environment variables:

DEEPSET_CLOUD_API_ENDPOINT=<future-deepset-cloud-url>
DEEPSET_CLOUD_API_KEY=<my-dc-api-key>

and run the following lines:

test_query = Pipeline.load_from_dc(pipeline_config_name="document_retrieval_1",pipeline_name="query")

Status (please check what you already did):

First draft (up for discussions & feedback)
Final code
Added tests
Updated documentation

closes #2002

tstadel · 2022-01-18T17:31:26Z

Hey @ArzelaAscoIi,
sorry for spoiling your PR. I should have made a pull request into your branch.
I introduced the concept of a pipeline_definition which is part of the pipeline_config (formerly known as yaml). pipeline_definition addresses exactly one pipeline.

I would suggest to keep the separation between pipeline_config and pipeline_definition in the load_from_dc method by specifying something like a pipeline_config_name or pipeline_config_id and the pipeline_name Then it feels much the same as load_from_yaml.
To emphasize the convention of having always one query and one indexing pipeline, we should mention this in the docstrings or even set the default of pipeline_name to "query". However this is just some API sugar. Under the hood Pipeline.load_from_config already takes the first pipeline if no pipeline_name has been specified and this should always be the query pipeline in DC.

tholor

Looking good. Left a few minor comments - mainly around documentation.

One bigger conceptual question is probably whether to stick in the future with the "two pipelines in one YAML" design. I see more and more cases where it would make more sense to define a Pipeline as the combination of 1x index and 1x query "pipeline/flow/graph".
In that case, you would always load both here, and maybe we split pipeline.run() into pipeline.run() and pipeline.index()?
Probably out of scope for this issue, but we should discuss and decide on this in the next weeks.

haystack/pipelines/base.py

CLAassistant · 2022-01-20T14:32:39Z

All committers have signed the CLA.

* minimal DCDocumentStore * support filters * implement get_documents_by_id * handle not existing documents * add docstrings * auth added * add tests * generate docs * Add latest docstring and tutorial changes * add responses to dev dependencies * fix tests * support query() and quey_by_embedding() * Add latest docstring and tutorial changes * query tests added * read api_key and api_endpoint from env * Add latest docstring and tutorial changes * support query() and quey_by_embedding() * query tests added * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * support dynamic similarity and return_embedding values * Add latest docstring and tutorial changes * adjust KeywordDocumentStore description * refactoring * Add latest docstring and tutorial changes * implement get_document_count and raise on all not implemented methods * Add latest docstring and tutorial changes * don't use abbreviation DC in comments and errors * Add latest docstring and tutorial changes * docstring added to KeywordDocumentStore * Add latest docstring and tutorial changes * enhanced api key set * split tests into two parts * change setup.py in order to work around build cache * added link * Add latest docstring and tutorial changes * rename DCDocumentStore to DeepsetCloudDocumentStore * Add latest docstring and tutorial changes * remove dc.py * reinsert link to docs * fix imports * Add latest docstring and tutorial changes * better test structure Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>

… into pipelines-from-to-dc

tstadel

Good to go!

initial load_from_dc

849a1fd

ArzelaAscoIi marked this pull request as draft January 17, 2022 17:13

ArzelaAscoIi and others added 5 commits January 17, 2022 18:14

typo

516efd8

adjusted api endpoint

b7aa7c8

removed kwargs

463bbbc

added _load_from_dict

96c51ed

refactor pipeline loading mechanism

db12138

ArzelaAscoIi added 3 commits January 19, 2022 14:53

renaming load_from_dc api

4859398

Merge branch 'master' into pipelines-from-to-dc

3911168

renaming

262472e

ArzelaAscoIi marked this pull request as ready for review January 19, 2022 14:41

fixed errors

9f1ce41

ArzelaAscoIi requested a review from tholor January 19, 2022 15:16

tholor reviewed Jan 19, 2022

View reviewed changes

haystack/pipelines/base.py Show resolved Hide resolved

haystack/pipelines/base.py Outdated Show resolved Hide resolved

haystack/pipelines/base.py Outdated Show resolved Hide resolved

tstadel and others added 4 commits January 20, 2022 11:26

fix comments and environment variable overrides

ff42508

Add latest docstring and tutorial changes

7f35798

fix outdated YAML examples

bbd8dfd

Add latest docstring and tutorial changes

ff9b706

tstadel and others added 9 commits January 26, 2022 18:19

introduce DeepsetCloudAdapter

0dd4039

Merge branch 'master' into pipelines-from-to-dc

5db9646

Add latest docstring and tutorial changes

5ccd3e6

introduce DeepsetCloudClient

3cb4d04

Add latest docstring and tutorial changes

9dffc2e

use json api for pipeline_config

b1b83cc

Merge branch 'pipelines-from-to-dc' of github.com:deepset-ai/haystack…

98bf639

… into pipelines-from-to-dc

indexing pipeline test added

865c6d5

tstadel approved these changes Jan 27, 2022

View reviewed changes

pseudo change to force cache eviction

6887197

tstadel mentioned this pull request Jan 27, 2022

Fixed the Search Field mapping in ElasticSearch DocumentStore #2080

Merged

4 tasks

tstadel added 4 commits January 28, 2022 15:18

revert pseudo change to force cache eviction

44d0ece

remove conftest duplicates

92a9a22

minor formatting and docstring fixes

a519c92

fix tests when MOCK_DC=False

0d0fadd

tstadel merged commit 7764b69 into master Jan 28, 2022

tstadel deleted the pipelines-from-to-dc branch January 28, 2022 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DC SDK - load pipeline from deepset cloud #2013

DC SDK - load pipeline from deepset cloud #2013

ArzelaAscoIi commented Jan 17, 2022 •

edited

Loading

tstadel commented Jan 18, 2022

tholor left a comment

CLAassistant commented Jan 20, 2022 •

edited

Loading

tstadel left a comment

DC SDK - load pipeline from deepset cloud #2013

DC SDK - load pipeline from deepset cloud #2013

Conversation

ArzelaAscoIi commented Jan 17, 2022 • edited Loading

tstadel commented Jan 18, 2022

tholor left a comment

Choose a reason for hiding this comment

CLAassistant commented Jan 20, 2022 • edited Loading

tstadel left a comment

Choose a reason for hiding this comment

ArzelaAscoIi commented Jan 17, 2022 •

edited

Loading

CLAassistant commented Jan 20, 2022 •

edited

Loading