Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Weaviate] Exit the while loop when we query less documents than available #2537

Merged
merged 8 commits into from
May 20, 2022

Conversation

masci
Copy link
Contributor

@masci masci commented May 12, 2022

fixes #2517

When a query returns less documents than available in the store (e.g. when there are more documents than QUERY_MAXIMUM_RESULTS (see Weaviate docs), the while loop paginating the results never exits. This is because the offset increases as we fetch more documents, but after QUERY_MAXIMUM_RESULTS docs, the query starts returning an empty list, the offset stays the same and we get stuck.

Proposed changes:

  • While paginating results, we exit the loop if the query returns an empty list of documents.

Status (please check what you already did):

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good 🙂

I wonder if a piece is missing though. Although the original issue is solved (get_all_documents work regardless of QUERY_MAXIMUM_RESULTS), still it would be confusing for users not to be able to query more than 10k docs no matter what. We could:

  • Add a very loud warning if get_documents_count is above QUERY_MAXIMUM_RESULTS, or
  • Add a very loud warning if a user tries to write more than QUERY_MAXIMUM_RESULTS into Weaviate.

In both cases, the user is at least aware that this might be an issue: I have the impression most Haystack users are not aware of this specific Weaviate quirk and would benefit a hint.

I think this is a small change worth adding here. Let me know if you prefer a separate PR for it, in this case I will approve this one.

@masci
Copy link
Contributor Author

masci commented May 18, 2022

We could:

  • Add a very loud warning if get_documents_count is above QUERY_MAXIMUM_RESULTS, or
  • Add a very loud warning if a user tries to write more than QUERY_MAXIMUM_RESULTS into Weaviate.

I went with the former because it looks less obtrusive.

@masci masci requested a review from ZanSara May 18, 2022 13:59
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! I left a couple of small comments but nothing important 😊

@masci masci force-pushed the massi/weaviate branch from 10c2612 to 9104353 Compare May 19, 2022 17:29
@masci masci merged commit a9a4156 into master May 20, 2022
@masci masci deleted the massi/weaviate branch May 20, 2022 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants