-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed broken ainsert_custom_kg() #811
Conversation
lightrag/lightrag.py
Outdated
"content": chunk_content, | ||
"source_id": source_id, | ||
"tokens": len(encode_string_by_tiktoken(chunk_content)), | ||
"chunk_order_id": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about chunk_order_id? it does not exist in the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is that the user defines the chunks manually, so they are already chunked and it is not traceable what source documents they were part of and should be treated as individual parts.
Otherwise, we'd have to make it possible for the user to specify the chunk_order_id manually in the custom_kg dictionary passed to the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing.
I didn’t fully understand.
If I understand correctly, chunk_order_id is not necessary because the chunks are manually defined by the user and treated as independent parts, without a link to their original source documents. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarFii Could you have a look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I couldn't make it clear in my previous message. As far as I understand, the chunk_order_id
is to provide an ordering for chunks created from a full document in case it needs to be chunked. If that's the case, we don't have this information in the custom kg insertion as per example in the README:
custom_kg = {
"entities": [
...
],
"relationships": [
...
],
"chunks": [
{
"content": "ProductX, developed by CompanyA, has revolutionized the market with its cutting-edge features.",
"source_id": "Source1",
},
{
"content": "PersonA is a prominent researcher at UniversityB, focusing on artificial intelligence and machine learning.",
"source_id": "Source2",
},
{
"content": "None",
"source_id": "UNKNOWN",
},
],
}
rag.insert_custom_kg(custom_kg)
So a solution would be to either ask the user to pass a chunk_order_id
in the custom_kg
dict or assume all chunks are individual documents. I'd be happy to hear if I made a correct assumption or there is something I'm missing about the functionality of chunk_order_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
Now it’s clear.
What do you think about changing custom_kg: dict[str, Any]
into a structured format, such as a dataclass? This way, we can provide chunk_order_id
along with all the necessary information more effectively.
Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArnoChenFx or @LarFii Could help me in this issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the fact it's called chunk_order_index
and not chunk_order_id
. It is fixed now. I proposed a solution where the user can specify a chunk_order_index in the custom_kg and if not it is assumed to be always 0.
@da-luggas Could please apply the lint ? Thanks |
@da-luggas Could you please update the documentation base and your changes ? And update the code sample. Thanks ! |
Super thanks for sharing ! |
Super Thanks for sharing ! |
Fix for #745
Custom knowledge graph insertion fails due to required properties not provided (tokens, chunk_order_id, full_doc_id, status).