Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct parquet row group pruning logic #25186

Merged
merged 2 commits into from
Feb 28, 2025

Conversation

chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Feb 28, 2025

Description

Fix #25151

I found the logic for pruning in 2ef6dc1c94 has correctness issue which caused the problems raised in
#25151

The reason is when rowGroup is skipped, the fileRowCount in PredicateUtils can not be updated correctly,
this pr add the fileRowCountOffset into BlockMetadata to correctly hold this semantic of data.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Fix incorrect results for reads on iceberg tables with deletes. ({issue}`25151`)

@cla-bot cla-bot bot added the cla-signed label Feb 28, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Feb 28, 2025
@chenjian2664 chenjian2664 force-pushed the fix_parquet branch 2 times, most recently from 8546c4c to 60553c3 Compare February 28, 2025 10:21
When filtering row groups based on split boundaries in `ParquetMetadata.getBlocks()`,
we need to maintain an accurate cumulative row count even for skipped row groups
@raunaqmorarka raunaqmorarka merged commit 1df7d66 into trinodb:master Feb 28, 2025
62 checks passed
@github-actions github-actions bot added this to the 472 milestone Feb 28, 2025
@chenjian2664 chenjian2664 deleted the fix_parquet branch February 28, 2025 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Something wrong with MOR on Iceberg tables after deletes. 468 -> 469, 470, 471
3 participants