Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Add readers and writers for the internal object model #11904
base: main
Are you sure you want to change the base?
Parquet: Add readers and writers for the internal object model #11904
Changes from 4 commits
fe2c208
a2b449b
cd8edea
3eaf3bc
3bb323b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has to return a generic fixed reader here to match the existing behavior. Because this class is public, people may have created subclasses for other object models, and
iceberg-parquet
is a module that is covered byrevapi
so we make guarantees about API and behavior stability.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class is not widely used, could you also mark it deprecated so that we can make it package-private in 1.9.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do it because I can't move the date/time classes to
GenericReader
in that case. It has to be in this class. But yeah, I got your point about behavior change. I will revert that date/time classes movement.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajantha-bhat, why can't the reader classes be moved if this returns an instance? The classes are currently
private
so there is no direct dependency on them. As long as an instance is returned, wouldn't it work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all return
ParquetValueReader<?>
rather than constrain the implementation to a subclass ofPrimitiveReader
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not return
Optional
:null
over usingOptional
createStructReader
) do not wrap return types inOptional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the following 2 methods are the only changes between the implementations of this class, so a lot of code is duplicated. In addition, this already introduces abstract factory methods for some readers -- including timestamps. I think it would be much cleaner to reuse this and call factory methods instead:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change, but please point these kinds of changes out for reviewers.
The old version worked because all of the supported logical type annotations had an equivalent
ConvertedType
(which is whatOriginalType
is called in Parquet format and the logical type docs).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this call
timestampReader(desc, NANOS, ((TimestampType) primitive).isAdjustedToUTC())
instead? The primitive type is available fromdesc
so that method can check and we don't need to have a factory method for INT96.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should assume
isAdjustedToUTC
to be true? because primitive don't haveTimestampType
and the reader was usingOffsetDateTime
,my understanding is if
OffsetDateTime
is used it is with timezone andLocalDateTime
is without timezone.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Because the reader produces and
OffsetDateTime
, it should only be used when the expected type is compatible.