Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom id hashing on documentstore level #1910

Merged
merged 16 commits into from
Jan 3, 2022
7 changes: 4 additions & 3 deletions haystack/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,12 @@ def __init__(
self.content_type = content_type
self.score = score
self.meta = meta or {}
self.id_hash_keys = id_hash_keys

allowed_hash_key_attributes = ["content", "content_type", "score", "meta", "embedding" ]
if not set(self.id_hash_keys) <= set(allowed_hash_key_attributes):
raise ValueError(f"You passed custom strings {id_hash_keys} to id_hash_keys which is deprecated. Supply instead a list of Document's attribute names that the id should be based on (e.g. {allowed_hash_key_attributes}). See /~https://github.com/deepset-ai/haystack/pull/1910 for details)")
self.id_hash_keys = id_hash_keys
if self.id_hash_keys != None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is not is preferred over != when comparing to None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a brief explanation of the reasons: https://stackoverflow.com/a/2209781

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if not set(self.id_hash_keys) <= set(allowed_hash_key_attributes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this line is still failing with mypy I'd say we put a # type: ignore here at the end of the line and maybe another comment explaining that mypy fails because it thinks set(None) might be called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I added a comment at that line :)

raise ValueError(f"You passed custom strings {self.id_hash_keys} to id_hash_keys which is deprecated. Supply instead a list of Document's attribute names that the id should be based on (e.g. {allowed_hash_key_attributes}). See /~https://github.com/deepset-ai/haystack/pull/1910 for details)")


if embedding is not None:
Expand Down