Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed IPC projection #1149

Merged
merged 1 commit into from
Jul 8, 2022
Merged

Conversation

ritchie46
Copy link
Collaborator

Depending of the ordering of the hashmap, the result of projecting columns could be different. This PR fixes that.

Not something I want to include in this PR, but I am interested why we want to sort the projections? I saw that we had an iterator that had that assumption, but it is my understanding that we also could keep a count on that iterator and stop when we reached that count. If so, I am happy to make a follow up PR that leaves projection as is.

@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

Merging #1149 (734805e) into main (e4947cc) will decrease coverage by 0.00%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##             main    #1149      +/-   ##
==========================================
- Coverage   83.62%   83.62%   -0.01%     
==========================================
  Files         366      366              
  Lines       35906    35911       +5     
==========================================
+ Hits        30028    30032       +4     
- Misses       5878     5879       +1     
Impacted Files Coverage Δ
src/io/ipc/read/common.rs 94.92% <88.23%> (-0.30%) ⬇️
src/array/utf8/mod.rs 85.62% <0.00%> (-0.96%) ⬇️
src/bitmap/immutable.rs 83.95% <0.00%> (-0.62%) ⬇️
src/io/ipc/read/array/boolean.rs 98.14% <0.00%> (+7.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4947cc...734805e. Read the comment docs.

@ritchie46 ritchie46 added the bug Something isn't working label Jul 8, 2022
acc
},
);
projection.sort_unstable();

// check unique
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we already sort the projection, we don't need to collect into a HashSet to check uniqueness.

@@ -337,14 +331,25 @@ pub fn prepare_projection(
let map = indices.iter().copied().enumerate().fold(
HashMap::default(),
|mut acc, (index, new_index)| {
if !acc.contains_key(&new_index) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch could lead to less mappings than there were projections. I believe this was a bug, because all columns need to be mapped.

Chunk::new(arrays)
let arrays = chunk.into_arrays();
let mut new_arrays = arrays.clone();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping could fail depending of the order of swappend indices. We need to make a copy to ensure we don't swap too early.

@ritchie46
Copy link
Collaborator Author

I could not create the ipc files so I did not create a test. The correctness is tested against polars tests, but I am happy to add one here as well.

@jorgecarleitao
Copy link
Owner

Not something I want to include in this PR, but I am interested why we want to sort the projections?

An IPC file is composed by columns written one after the other. The types of those columns is stored in a vector on the header. We have a VecDeque with them and we pop from the deque as we read the file. Thus, need to read the file under the original order, thus the re-ordering when reading.

@jorgecarleitao jorgecarleitao merged commit 1f2116e into jorgecarleitao:main Jul 8, 2022
@jorgecarleitao jorgecarleitao changed the title fix ipc projection Fixed IPC projection Jul 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants