Add mmr reranking #180

prasmussen15 · 2024-10-07T15:49:11Z

Important

Add MMR reranking to search, normalize embeddings, and update community building with group ID support.

Search Enhancements:
- Add MMR reranking method to search.py for edges, nodes, and communities.
- Update search_config.py and search_config_recipes.py to include MMR reranking.
- Add maximal_marginal_relevance() function to search_utils.py.
Embedding Normalization:
- Use normalize_l2() in openai.py and voyage.py to normalize embeddings.
Community Building:
- Update build_communities() in graphiti.py and community_operations.py to accept group_ids.
Testing:
- Update test_graphiti_int.py to test new search configurations and community building.

^{This description was created by}^{for 6f69a61. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 0eb9cf7 in 22 seconds

More details

Looked at 549 lines of code in 10 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. tests/test_graphiti_int.py:88

Draft comment:
Consider adding a test case for COMBINED_HYBRID_SEARCH_MMR to ensure the new MMR reranking functionality is tested.
Reason this comment was not posted:
Confidence changes required: 50%
The test function test_graphiti_init uses COMBINED_HYBRID_SEARCH_RRF but the PR is about adding MMR reranking. It would be beneficial to add a test case for COMBINED_HYBRID_SEARCH_MMR to ensure the new functionality is tested.

2. graphiti_core/embedder/openai.py:46

Draft comment:
The create method in OpenAIEmbedder and VoyageAIEmbedder is almost identical. Consider refactoring to avoid code duplication and adhere to the DRY principle.
Reason this comment was not posted:
Confidence changes required: 80%
The code in graphiti_core/embedder/openai.py and graphiti_core/embedder/voyage.py is not DRY. The create method in both classes is almost identical, except for the client method used to get embeddings. This violates the DRY principle.

3. graphiti_core/graphiti.py:579

Draft comment:
The build_communities method is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. Consider refactoring to adhere to the Single Responsibility Principle.
Reason this comment was not posted:
Confidence changes required: 80%
The build_communities method in graphiti_core/graphiti.py is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. This violates the Single Responsibility Principle.

4. graphiti_core/search/search_utils.py:584

Draft comment:
The maximal_marginal_relevance function is complex. Consider adding a comment to explain its purpose and how it works.
Reason this comment was not posted:
Confidence changes required: 70%
The maximal_marginal_relevance function in graphiti_core/search/search_utils.py is a bit complex and could benefit from a comment explaining its purpose and how it works.

5. graphiti_core/search/search.py:118

Draft comment:
The maximal_marginal_relevance function is complex. Consider adding a comment to explain its purpose and how it works. (Also applicable in node_search and community_search functions)
Reason this comment was not posted:
Confidence changes required: 70%
The maximal_marginal_relevance function in graphiti_core/search/search_utils.py is a bit complex and could benefit from a comment explaining its purpose and how it works.

Workflow ID: wflow_4X7Ine7q20Np36ot

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 5896c6a in 13 seconds

More details

Looked at 26 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. graphiti_core/helpers.py:65

Draft comment:
The else after an if that returns is unnecessary. You can remove the else and directly execute the code block, which improves readability.
Reason this comment was not posted:
Confidence changes required: 33%
The function normalize_l2 is designed to normalize a list of floats using L2 normalization. The code correctly handles both 1D and 2D arrays, ensuring that division by zero is avoided. The changes made in the PR are mostly variable renaming for clarity, which is a good practice. However, the use of else after an if that returns is unnecessary and can be removed for cleaner code.

2. graphiti_core/helpers.py:58

Draft comment:
Function name normalize_l2 could be more descriptive. Consider renaming it to something like normalize_embedding for clarity.
Reason this comment was not posted:
Confidence changes required: 50%
The function normalize_l2 has been updated to use more descriptive variable names, which is good. However, the function name normalize_l2 is not very descriptive of its purpose. It would be better to use a name that clearly indicates what the function does, such as normalize_embedding.

Workflow ID: wflow_aGp68NGLlMKziIra

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

danielchalef · 2024-10-08T13:30:24Z

@prasmussen15 it looks like we're normalizing vectors. Do we need to run a migration on existing installs? Would it now make sense to use dot product rather then cosine as distance metric?

prasmussen15 · 2024-10-08T15:41:37Z

@prasmussen15 it looks like we're normalizing vectors. Do we need to run a migration on existing installs? Would it now make sense to use dot product rather then cosine as distance metric?

Neo4j only offers cosine similarity and not dot product so a migration on existing installs isn't necessary for the search. We could run a migration for MMR since I use dot product there.

ellipsis-dev

👍 Looks good to me! Incremental review on 7833adc in 39 seconds

More details

Looked at 62 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. graphiti_core/embedder/openai.py:49

Draft comment:
The removal of normalize_l2 contradicts the PR description, which mentions embedding normalization. Consider re-adding normalize_l2 to ensure embeddings are normalized.
Reason this comment was not posted:
Comment did not seem useful.

2. graphiti_core/embedder/voyage.py:48

Draft comment:
The removal of normalize_l2 contradicts the PR description, which mentions embedding normalization. Consider re-adding normalize_l2 to ensure embeddings are normalized.
Reason this comment was not posted:
Marked as duplicate.

3. graphiti_core/search/search_utils.py:593

Draft comment:
Normalization in the MMR function is incorrect. Normalize vectors before computing dot products to ensure correct similarity calculation.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code change seems to have addressed the issue mentioned in the comment by normalizing the vectors during the dot product calculation. This suggests that the comment might be outdated or already resolved. The normalization is now part of the calculation, which aligns with the comment's suggestion.
I might be missing some context about whether the normalization should occur outside the loop or if there's a specific reason for the current implementation. However, the change seems to align with the comment's suggestion.
The normalization within the loop seems to be a valid approach for calculating cosine similarity, which is likely the intended behavior. The comment might not be necessary anymore.
The comment is likely outdated as the code change seems to have resolved the issue by incorporating normalization into the dot product calculation.

4. graphiti_core/search/search_utils.py:588

Draft comment:
The maximal_marginal_relevance function is complex and would benefit from comments explaining the logic, especially the calculation of max_sim and mmr.
Reason this comment was not posted:
Confidence changes required: 70%
The maximal_marginal_relevance function in search_utils.py is quite complex and would benefit from comments explaining the logic, especially the calculation of max_sim and mmr.

Workflow ID: wflow_lhh1kcbQiz9009rN

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 6f69a61 in 27 seconds

More details

Looked at 259 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 16 drafted comments based on config settings.

1. graphiti_core/embedder/openai.py:45

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

2. graphiti_core/embedder/voyage.py:44

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

3. graphiti_core/search/search_utils.py:59

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

4. graphiti_core/search/search_utils.py:74

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

5. graphiti_core/search/search_utils.py:97

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

6. graphiti_core/search/search_utils.py:120

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

7. graphiti_core/search/search_utils.py:165

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

8. graphiti_core/search/search_utils.py:215

Draft comment:
Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
Reason this comment was not posted:
Confidence changes required: 50%
The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.

9. graphiti_core/embedder/openai.py:45

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

        self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

10. graphiti_core/embedder/voyage.py:44

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

        self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

11. graphiti_core/search/search_utils.py:59

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver,

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

12. graphiti_core/search/search_utils.py:74

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver, episodes: list[EpisodicNode]

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

13. graphiti_core/search/search_utils.py:97

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver, nodes: list[EntityNode]

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

14. graphiti_core/search/search_utils.py:120

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver,

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

15. graphiti_core/search/search_utils.py:165

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver,

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

16. graphiti_core/search/search_utils.py:215

Draft comment:
Fix the indentation for function parameters to be consistent and idiomatic.

driver: AsyncDriver,

Reason this comment was not posted:
Confidence changes required: 80%
The code has inconsistent indentation for function parameters, which is not idiomatic.

Workflow ID: wflow_6xc70n8ou5fbogcl

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

prasmussen15 added 7 commits October 7, 2024 11:48

mmr start

87164fc

add mmr function

67f2f1a

normalize

a2b3670

add mmr options to search

fe35c8b

update communities

8e539ac

build communities

87d9090

format

0eb9cf7

prasmussen15 requested a review from paul-paliychuk October 7, 2024 15:49

prasmussen15 temporarily deployed to development October 7, 2024 15:49 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development October 7, 2024 15:49 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development October 7, 2024 15:49 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Oct 7, 2024

View reviewed changes

clean up normalization

5896c6a

prasmussen15 temporarily deployed to development October 7, 2024 15:53 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Oct 7, 2024

View reviewed changes

paul-paliychuk approved these changes Oct 8, 2024

View reviewed changes

normalize in mmr

7833adc

prasmussen15 temporarily deployed to development October 8, 2024 17:47 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development October 8, 2024 17:47 — with GitHub Actions Failure

ellipsis-dev bot reviewed Oct 8, 2024

View reviewed changes

update

6f69a61

prasmussen15 temporarily deployed to development October 8, 2024 17:50 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Oct 8, 2024

View reviewed changes

prasmussen15 merged commit 49aeaf7 into main Oct 8, 2024
7 checks passed

prasmussen15 deleted the add-mmr-reranking branch October 8, 2024 17:55

github-actions bot locked and limited conversation to collaborators Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mmr reranking #180

Add mmr reranking #180

prasmussen15 commented Oct 7, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

danielchalef commented Oct 8, 2024 •

edited

Loading

prasmussen15 commented Oct 8, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Add mmr reranking #180

Add mmr reranking #180

Conversation

prasmussen15 commented Oct 7, 2024 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

danielchalef commented Oct 8, 2024 • edited Loading

prasmussen15 commented Oct 8, 2024

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

prasmussen15 commented Oct 7, 2024 •

edited by ellipsis-dev bot

Loading

danielchalef commented Oct 8, 2024 •

edited

Loading