-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement proper FK in MetaDocumentORM
and MetaLabelORM
to work on PostgreSQL
#1990
Conversation
unique
to id
in ORMBase
to prevent PostgreSQL issues
…-ai/haystack into fix_postgresql_composite_index_2
unique
to id
in ORMBase
to prevent PostgreSQL issuesMetaDocumentORM
and MetaLabelORM
to work on PostgreSQL
Changes needed to fix the bug:
Changes required to make the tests runnable:
In addition, a few tests were relocated from |
…-ai/haystack into fix_postgresql_composite_index_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL part looks good! Just wondering if we can simplify the tests a bit. Seems to me that we are convoluting things even more with this PR and make the tests harder to read.
@pytest.mark.skipif(sys.platform in ['win32', 'cygwin'], reason="Test with tmp_path not working on windows runner") | ||
def test_faiss_index_save_and_load(tmp_path): | ||
def test_faiss_index_save_and_load(tmp_path, sql_url): | ||
document_store = FAISSDocumentStore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it simplify things if we use get_document_store()
here instead? Maybe we can get rid of the sql_url
fixture then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use a fixture than a direct function call. What could be the advantage of using get_document_store()
here? In general direct use of get_document_store()
in the tests is something that gave me a lot of pain in this PR, and left me quite puzzled, so I might be missing something crucial...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. In general, a fixture would be nicer.
I think the history of this goes like this: some tests require passing quite special parameters to the document store (here: progress_bar=False
). Instead of adding this as an "optional parameter" to the document_store
fixture, we instantiated the document store just within the test. The get_document_store
is probably something in the middle of the two approaches that avoids duplicate code across a few tests that require the same instantiation without polluting the document_store fixture.
However, if we find an elegant way to add all those optional params to the document store fixture itself, this would make things probably cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me. However I would prefer the global init/teardown of postgres instead of getting a lot of cluttered/ugly places in the tests.
…to faiss & milvus suite
Implement a proper composite FK key for
DocumentORM
andLabelORM
inMetaDocumentORM
andMetaLabelORM
respectively, to allow for the same ID in two different indices. This was not enforced by SQLite, but crashes "strict" databases like PostgreSQL.Also modifies the tests to make them runnable on PostgreSQL, although for now the tests will still run on SQLite on the CI.
It's a breaking change because it modifies the SQL schema. Previously saved SQLite databases might not work with the current implementation. It's recommended to re-create them from scratch.