-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Pipeline.components #2215
Fix Pipeline.components #2215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice improvement, thanks! 🙂
However, testing looks a bit shallow. I'd add a couple of dedicated unit tests for self.components
and self.get_document_store()
. Being separate, they highlight where exactly the problem comes from, if/when it comes. Like this, if the test fails it's harder to understand what's wrong.
I think there are other corners of Pipeline
that can benefit of this new method. Grepping base.py
for ["component"]
returns 13 places where it's used, probably many of them could be simplified. There's also an occurrence in standard_pipelines.py
:
self.pipeline.add_node(name=node, component=graph.nodes[node]["component"], inputs=previous_node_name) |
matches = list( | ||
set([retriever.document_store for retriever in self.get_nodes_by_class(class_type=BaseRetriever)]) | ||
) | ||
|
||
if len(matches) > 1: | ||
raise Exception(f"Multiple Document Stores found in Pipeline: {matches}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not part of your changes, but why having more than a docstore is a problem? I can imagine forked pipelines with two retrievers and a JoinDocuments node before the Reader could make sense.
In addition: if the exception should stay, then it's a PipelineError
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the exception still makes sense in case there are really two different docstores in the game. In such a case the user has to dig out the needed docstore herself as we cannot give any general infos to identify them easily. Most of the time this should not be an issue in multi-retriever settings anyway: the case that you depicted should use only one instance of the docstore. That should be safely handled by the implicit unique operation executed by set(). But I'm gonna write a test for that case to ensure :-)
Regarding the PipelineError, yes I can understand it, but let's introduce this in a separate PR. I think this will be a bigger one to get it clean.
self.components: dict = {} | ||
|
||
@property | ||
def components(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor naming question: why components
and not nodes
? Users aren't aware of networkx
's naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use both notations in pipeline config: components define the "parts" of the pipeline as such. Nodes define the interplay of these components. Semantically components is the better name here.
# TODO make this generic for other pipelines with different naming | ||
RETRIEVER = PIPELINE.get_node(name="Retriever") | ||
DOCUMENT_STORE = RETRIEVER.document_store if RETRIEVER else None | ||
DOCUMENT_STORE = PIPELINE.get_document_store() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Beautiful, thanks! 😄
I've added some tests. Regarding pipeline.component I decided to make it a convenience property for clients only: pipeline internal code should not make use of it as it is only a facade on top of the graph. Anything that deals with the graph and its nodes internally should better use the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are just beautiful 🤩 I wish all of our unit tests would look like that!
You may want to make a local fixture out of stuff like DummyDocumentStore
, DummyRetriever
and Node
before merge, but in either case it's good to go for me. 👍
Proposed changes:
components
property toPipeline
Pipeline.get_document_store()
by looking intoBaseRetriever
components'document_store
fieldStatus (please check what you already did):
fixes #1823