Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mmr reranking #180

Merged
merged 10 commits into from
Oct 8, 2024
Merged

Add mmr reranking #180

merged 10 commits into from
Oct 8, 2024

Conversation

prasmussen15
Copy link
Collaborator

@prasmussen15 prasmussen15 commented Oct 7, 2024

Important

Add MMR reranking to search, normalize embeddings, and update community building with group ID support.

  • Search Enhancements:
    • Add MMR reranking method to search.py for edges, nodes, and communities.
    • Update search_config.py and search_config_recipes.py to include MMR reranking.
    • Add maximal_marginal_relevance() function to search_utils.py.
  • Embedding Normalization:
    • Use normalize_l2() in openai.py and voyage.py to normalize embeddings.
  • Community Building:
    • Update build_communities() in graphiti.py and community_operations.py to accept group_ids.
  • Testing:
    • Update test_graphiti_int.py to test new search configurations and community building.

This description was created by Ellipsis for 6f69a61. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 0eb9cf7 in 22 seconds

More details
  • Looked at 549 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. tests/test_graphiti_int.py:88
  • Draft comment:
    Consider adding a test case for COMBINED_HYBRID_SEARCH_MMR to ensure the new MMR reranking functionality is tested.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test function test_graphiti_init uses COMBINED_HYBRID_SEARCH_RRF but the PR is about adding MMR reranking. It would be beneficial to add a test case for COMBINED_HYBRID_SEARCH_MMR to ensure the new functionality is tested.
2. graphiti_core/embedder/openai.py:46
  • Draft comment:
    The create method in OpenAIEmbedder and VoyageAIEmbedder is almost identical. Consider refactoring to avoid code duplication and adhere to the DRY principle.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code in graphiti_core/embedder/openai.py and graphiti_core/embedder/voyage.py is not DRY. The create method in both classes is almost identical, except for the client method used to get embeddings. This violates the DRY principle.
3. graphiti_core/graphiti.py:579
  • Draft comment:
    The build_communities method is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. Consider refactoring to adhere to the Single Responsibility Principle.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The build_communities method in graphiti_core/graphiti.py is performing multiple tasks: clearing existing communities, building new communities, generating embeddings, and saving nodes and edges. This violates the Single Responsibility Principle.
4. graphiti_core/search/search_utils.py:584
  • Draft comment:
    The maximal_marginal_relevance function is complex. Consider adding a comment to explain its purpose and how it works.
  • Reason this comment was not posted:
    Confidence changes required: 70%
    The maximal_marginal_relevance function in graphiti_core/search/search_utils.py is a bit complex and could benefit from a comment explaining its purpose and how it works.
5. graphiti_core/search/search.py:118
  • Draft comment:
    The maximal_marginal_relevance function is complex. Consider adding a comment to explain its purpose and how it works. (Also applicable in node_search and community_search functions)
  • Reason this comment was not posted:
    Confidence changes required: 70%
    The maximal_marginal_relevance function in graphiti_core/search/search_utils.py is a bit complex and could benefit from a comment explaining its purpose and how it works.

Workflow ID: wflow_4X7Ine7q20Np36ot


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 5896c6a in 13 seconds

More details
  • Looked at 26 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. graphiti_core/helpers.py:65
  • Draft comment:
    The else after an if that returns is unnecessary. You can remove the else and directly execute the code block, which improves readability.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The function normalize_l2 is designed to normalize a list of floats using L2 normalization. The code correctly handles both 1D and 2D arrays, ensuring that division by zero is avoided. The changes made in the PR are mostly variable renaming for clarity, which is a good practice. However, the use of else after an if that returns is unnecessary and can be removed for cleaner code.
2. graphiti_core/helpers.py:58
  • Draft comment:
    Function name normalize_l2 could be more descriptive. Consider renaming it to something like normalize_embedding for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function normalize_l2 has been updated to use more descriptive variable names, which is good. However, the function name normalize_l2 is not very descriptive of its purpose. It would be better to use a name that clearly indicates what the function does, such as normalize_embedding.

Workflow ID: wflow_aGp68NGLlMKziIra


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@danielchalef
Copy link
Member

danielchalef commented Oct 8, 2024

@prasmussen15 it looks like we're normalizing vectors. Do we need to run a migration on existing installs? Would it now make sense to use dot product rather then cosine as distance metric?

@prasmussen15
Copy link
Collaborator Author

@prasmussen15 it looks like we're normalizing vectors. Do we need to run a migration on existing installs? Would it now make sense to use dot product rather then cosine as distance metric?

Neo4j only offers cosine similarity and not dot product so a migration on existing installs isn't necessary for the search. We could run a migration for MMR since I use dot product there.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 7833adc in 39 seconds

More details
  • Looked at 62 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. graphiti_core/embedder/openai.py:49
  • Draft comment:
    The removal of normalize_l2 contradicts the PR description, which mentions embedding normalization. Consider re-adding normalize_l2 to ensure embeddings are normalized.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. graphiti_core/embedder/voyage.py:48
  • Draft comment:
    The removal of normalize_l2 contradicts the PR description, which mentions embedding normalization. Consider re-adding normalize_l2 to ensure embeddings are normalized.
  • Reason this comment was not posted:
    Marked as duplicate.
3. graphiti_core/search/search_utils.py:593
  • Draft comment:
    Normalization in the MMR function is incorrect. Normalize vectors before computing dot products to ensure correct similarity calculation.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The code change seems to have addressed the issue mentioned in the comment by normalizing the vectors during the dot product calculation. This suggests that the comment might be outdated or already resolved. The normalization is now part of the calculation, which aligns with the comment's suggestion.
    I might be missing some context about whether the normalization should occur outside the loop or if there's a specific reason for the current implementation. However, the change seems to align with the comment's suggestion.
    The normalization within the loop seems to be a valid approach for calculating cosine similarity, which is likely the intended behavior. The comment might not be necessary anymore.
    The comment is likely outdated as the code change seems to have resolved the issue by incorporating normalization into the dot product calculation.
4. graphiti_core/search/search_utils.py:588
  • Draft comment:
    The maximal_marginal_relevance function is complex and would benefit from comments explaining the logic, especially the calculation of max_sim and mmr.
  • Reason this comment was not posted:
    Confidence changes required: 70%
    The maximal_marginal_relevance function in search_utils.py is quite complex and would benefit from comments explaining the logic, especially the calculation of max_sim and mmr.

Workflow ID: wflow_lhh1kcbQiz9009rN


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 6f69a61 in 27 seconds

More details
  • Looked at 259 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 16 drafted comments based on config settings.
1. graphiti_core/embedder/openai.py:45
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
2. graphiti_core/embedder/voyage.py:44
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
3. graphiti_core/search/search_utils.py:59
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
4. graphiti_core/search/search_utils.py:74
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
5. graphiti_core/search/search_utils.py:97
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
6. graphiti_core/search/search_utils.py:120
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
7. graphiti_core/search/search_utils.py:165
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
8. graphiti_core/search/search_utils.py:215
  • Draft comment:
    Inconsistent indentation for function parameters. Align with the rest of the codebase for better readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The indentation in the function definitions is inconsistent with the rest of the codebase. This can lead to readability issues and should be corrected.
9. graphiti_core/embedder/openai.py:45
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
        self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
10. graphiti_core/embedder/voyage.py:44
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
        self, input: str | List[str] | Iterable[int] | Iterable[Iterable[int]]
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
11. graphiti_core/search/search_utils.py:59
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
12. graphiti_core/search/search_utils.py:74
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver, episodes: list[EpisodicNode]
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
13. graphiti_core/search/search_utils.py:97
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver, nodes: list[EntityNode]
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
14. graphiti_core/search/search_utils.py:120
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
15. graphiti_core/search/search_utils.py:165
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.
16. graphiti_core/search/search_utils.py:215
  • Draft comment:
    Fix the indentation for function parameters to be consistent and idiomatic.
driver: AsyncDriver,
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code has inconsistent indentation for function parameters, which is not idiomatic.

Workflow ID: wflow_6xc70n8ou5fbogcl


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@prasmussen15 prasmussen15 merged commit 49aeaf7 into main Oct 8, 2024
7 checks passed
@prasmussen15 prasmussen15 deleted the add-mmr-reranking branch October 8, 2024 17:55
@github-actions github-actions bot locked and limited conversation to collaborators Oct 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants