-
Notifications
You must be signed in to change notification settings - Fork 224
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1149 +/- ##
==========================================
- Coverage 83.62% 83.62% -0.01%
==========================================
Files 366 366
Lines 35906 35911 +5
==========================================
+ Hits 30028 30032 +4
- Misses 5878 5879 +1
Continue to review full report at Codecov.
|
acc | ||
}, | ||
); | ||
projection.sort_unstable(); | ||
|
||
// check unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we already sort the projection, we don't need to collect into a HashSet
to check uniqueness.
@@ -337,14 +331,25 @@ pub fn prepare_projection( | |||
let map = indices.iter().copied().enumerate().fold( | |||
HashMap::default(), | |||
|mut acc, (index, new_index)| { | |||
if !acc.contains_key(&new_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch could lead to less mappings than there were projections. I believe this was a bug, because all columns need to be mapped.
Chunk::new(arrays) | ||
let arrays = chunk.into_arrays(); | ||
let mut new_arrays = arrays.clone(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapping could fail depending of the order of swappend indices. We need to make a copy to ensure we don't swap too early.
I could not create the ipc files so I did not create a test. The correctness is tested against polars tests, but I am happy to add one here as well. |
An IPC file is composed by columns written one after the other. The types of those columns is stored in a vector on the header. We have a |
Depending of the ordering of the hashmap, the result of projecting columns could be different. This PR fixes that.
Not something I want to include in this PR, but I am interested why we want to sort the projections? I saw that we had an iterator that had that assumption, but it is my understanding that we also could keep a count on that iterator and stop when we reached that count. If so, I am happy to make a follow up PR that leaves projection as is.