-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Handle the scenario where the metadata between underlying sources and Graviton is inconsistent #250
Comments
This was referenced Aug 21, 2023
This was referenced Sep 4, 2023
jerryshao
added a commit
that referenced
this issue
Sep 6, 2023
…rwriteable (#330) ### What changes were proposed in this pull request? This PR proposes to change the requirements of the `AuditInfo` fields to make them optional and overwriteable. ### Why are the changes needed? This is the first change of #250, the change is going to address two problems: 1. If the `AuditInfo` is not existed in both Graviton store and underlying source, we should support the empty `AuditInfo`, or only several fields are set in `AuditInfo`. 2. If the `AuditInfo` are both set in the Graviton store and underlying source, we should support `AuditInfo` mergeable. Fix: #317 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Modify and add the UTs to test.
jerryshao
added a commit
that referenced
this issue
Sep 8, 2023
… id for this field (#348) ### What changes were proposed in this pull request? This PR proposes to make some changes to entity's ID field. 1. Assign a unique ID for each entity when creating. 2. Remove the parent ID field for catalog, schema and table entity (since it is useless currently). ### Why are the changes needed? This is the subtask for #250 . With unique ID assigned to each entity, we could leverage this unique ID as a "record" or "watermark" to be bound between the underlying sources and graviton store, which can guarantee the SSOT of entities. Fix: #168 ### Does this PR introduce _any_ user-facing change? 1. This change removes the parent id field of catalog, schema, and table entity's proto definition. ### How was this patch tested? With the existing UTs.
xunliu
pushed a commit
that referenced
this issue
Sep 24, 2023
…guarantee SSOT (#403) ### What changes were proposed in this pull request? This is the final work of #250 , with this PR there're several major refactorings: 1. Removing all the entity store operations in HiveCatalogOperation, which makes each CatalogOperation only focus on its own logic. 2. Processing all the additional metadata information in CatalogOperationDispatcher, also guarantees the SSOT. 3. Refactor the BaseXXX (BaseTable, BaseSchema and BaseColumn), to separate the metadata logics from entity information. 4. With all the above changes, changing the UTs accordingly. ### Why are the changes needed? With this PR, we have several advantages: 1. No need to handle entity store operations in each catalog, unify all of them in core module. 2. Remove the complex transaction semantics, using SSOT best effort mechanism. Fix: #318 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Adding new UTs to cover the code
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What would you like to be improved?
Graviton is a federated metadata lake, it manages the metadata from underlying sources, as well as the additional metadata stored in Graviton. The users will get complete metadata by combining both two parts together. With this, it will potentially have several problems:
So basically, the problem is that users could both manipulate the metadata through Graviton and directly from underlying sources, the inconsistency is unavoidable.
How should we improve?
How do we handle this inconsistency?
As I mentioned above, inconsistency is unavoidable, whether caused by operation failure, or introduced by operating different systems or starting from scratch.
Here, I think:
Details will be posted here continuously.
The related issues are:
Subtasks:
The text was updated successfully, but these errors were encountered: