-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mmr reranking #180
Add mmr reranking #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 0eb9cf7 in 22 seconds
More details
- Looked at
549
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. tests/test_graphiti_int.py:88
- Draft comment:
Consider adding a test case forCOMBINED_HYBRID_SEARCH_MMR
to ensure the new MMR reranking functionality is tested. - Reason this comment was not posted:
Confidence changes required:50%
The test functiontest_graphiti_init
usesCOMBINED_HYBRID_SEARCH_RRF
but the PR is about adding MMR reranking. It would be beneficial to add a test case forCOMBINED_HYBRID_SEARCH_MMR
to ensure the new functionality is tested.
2. graphiti_core/embedder/openai.py:46
- Draft comment:
Thecreate
method inOpenAIEmbedder
andVoyageAIEmbedder
is almost identical. Consider refactoring to avoid code duplication and adhere to the DRY principle. - Reason this comment was not posted:
Confidence changes required:80%
The code ingraphiti_core/embedder/openai.py
andgraphiti_core/embedder/voyage.py
is not DRY. Thecreate
method in both classes is almost identical, except for the client method used to get embeddings. This violates the DRY principle.
3. graphiti_core/graphiti.py:579
- Draft comment:
Thebuild_communities
method is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. Consider refactoring to adhere to the Single Responsibility Principle. - Reason this comment was not posted:
Confidence changes required:80%
Thebuild_communities
method ingraphiti_core/graphiti.py
is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. This violates the Single Responsibility Principle.
4. graphiti_core/search/search_utils.py:584
- Draft comment:
Themaximal_marginal_relevance
function is complex. Consider adding a comment to explain its purpose and how it works. - Reason this comment was not posted:
Confidence changes required:70%
Themaximal_marginal_relevance
function ingraphiti_core/search/search_utils.py
is a bit complex and could benefit from a comment explaining its purpose and how it works.
5. graphiti_core/search/search.py:118
- Draft comment:
Themaximal_marginal_relevance
function is complex. Consider adding a comment to explain its purpose and how it works. (Also applicable innode_search
andcommunity_search
functions) - Reason this comment was not posted:
Confidence changes required:70%
Themaximal_marginal_relevance
function ingraphiti_core/search/search_utils.py
is a bit complex and could benefit from a comment explaining its purpose and how it works.
Workflow ID: wflow_4X7Ine7q20Np36ot
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 5896c6a in 13 seconds
More details
- Looked at
26
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. graphiti_core/helpers.py:65
- Draft comment:
Theelse
after anif
that returns is unnecessary. You can remove theelse
and directly execute the code block, which improves readability. - Reason this comment was not posted:
Confidence changes required:33%
The functionnormalize_l2
is designed to normalize a list of floats using L2 normalization. The code correctly handles both 1D and 2D arrays, ensuring that division by zero is avoided. The changes made in the PR are mostly variable renaming for clarity, which is a good practice. However, the use ofelse
after anif
that returns is unnecessary and can be removed for cleaner code.
2. graphiti_core/helpers.py:58
- Draft comment:
Function namenormalize_l2
could be more descriptive. Consider renaming it to something likenormalize_embedding
for clarity. - Reason this comment was not posted:
Confidence changes required:50%
The functionnormalize_l2
has been updated to use more descriptive variable names, which is good. However, the function namenormalize_l2
is not very descriptive of its purpose. It would be better to use a name that clearly indicates what the function does, such asnormalize_embedding
.
Workflow ID: wflow_aGp68NGLlMKziIra
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@prasmussen15 it looks like we're normalizing vectors. Do we need to run a migration on existing installs? Would it now make sense to use dot product rather then cosine as distance metric? |
Neo4j only offers cosine similarity and not dot product so a migration on existing installs isn't necessary for the search. We could run a migration for MMR since I use dot product there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 7833adc in 39 seconds
More details
- Looked at
62
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. graphiti_core/embedder/openai.py:49
- Draft comment:
The removal ofnormalize_l2
contradicts the PR description, which mentions embedding normalization. Consider re-addingnormalize_l2
to ensure embeddings are normalized. - Reason this comment was not posted:
Comment did not seem useful.
2. graphiti_core/embedder/voyage.py:48
- Draft comment:
The removal ofnormalize_l2
contradicts the PR description, which mentions embedding normalization. Consider re-addingnormalize_l2
to ensure embeddings are normalized. - Reason this comment was not posted:
Marked as duplicate.
3. graphiti_core/search/search_utils.py:593
- Draft comment:
Normalization in the MMR function is incorrect. Normalize vectors before computing dot products to ensure correct similarity calculation. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code change seems to have addressed the issue mentioned in the comment by normalizing the vectors during the dot product calculation. This suggests that the comment might be outdated or already resolved. The normalization is now part of the calculation, which aligns with the comment's suggestion.
I might be missing some context about whether the normalization should occur outside the loop or if there's a specific reason for the current implementation. However, the change seems to align with the comment's suggestion.
The normalization within the loop seems to be a valid approach for calculating cosine similarity, which is likely the intended behavior. The comment might not be necessary anymore.
The comment is likely outdated as the code change seems to have resolved the issue by incorporating normalization into the dot product calculation.
4. graphiti_core/search/search_utils.py:588
- Draft comment:
Themaximal_marginal_relevance
function is complex and would benefit from comments explaining the logic, especially the calculation ofmax_sim
andmmr
. - Reason this comment was not posted:
Confidence changes required:70%
Themaximal_marginal_relevance
function insearch_utils.py
is quite complex and would benefit from comments explaining the logic, especially the calculation ofmax_sim
andmmr
.
Workflow ID: wflow_lhh1kcbQiz9009rN
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 6f69a61 in 27 seconds
More details
- Looked at
259
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
16
drafted comments based on config settings.
1. graphiti_core/embedder/openai.py:45
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
2. graphiti_core/embedder/voyage.py:44
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
3. graphiti_core/search/search_utils.py:59
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
4. graphiti_core/search/search_utils.py:74
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
5. graphiti_core/search/search_utils.py:97
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
6. graphiti_core/search/search_utils.py:120
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
7. graphiti_core/search/search_utils.py:165
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
8. graphiti_core/search/search_utils.py:215
- Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability. - Reason this comment was not posted:
Confidence changes required:50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
9. graphiti_core/embedder/openai.py:45
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
10. graphiti_core/embedder/voyage.py:44
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
11. graphiti_core/search/search_utils.py:59
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
12. graphiti_core/search/search_utils.py:74
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver, episodes: list[EpisodicNode]
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
13. graphiti_core/search/search_utils.py:97
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver, nodes: list[EntityNode]
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
14. graphiti_core/search/search_utils.py:120
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
15. graphiti_core/search/search_utils.py:165
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
16. graphiti_core/search/search_utils.py:215
- Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
- Reason this comment was not posted:
Confidence changes required:80%
The code has inconsistent indentation for function parameters, which is not idiomatic.
Workflow ID: wflow_6xc70n8ou5fbogcl
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Add MMR reranking to search, normalize embeddings, and update community building with group ID support.
search.py
for edges, nodes, and communities.search_config.py
andsearch_config_recipes.py
to include MMR reranking.maximal_marginal_relevance()
function tosearch_utils.py
.normalize_l2()
inopenai.py
andvoyage.py
to normalize embeddings.build_communities()
ingraphiti.py
andcommunity_operations.py
to acceptgroup_ids
.test_graphiti_int.py
to test new search configurations and community building.This description was created by
for 6f69a61. It will automatically update as commits are pushed.