Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Made Buffer::offset public #652

Merged
merged 1 commit into from
Dec 8, 2021

Conversation

ritchie46
Copy link
Collaborator

I want to know if a certain primitive array has been sliced. This seems easiest to me by checking if the backing buffer has got an offset.

@codecov
Copy link

codecov bot commented Dec 1, 2021

Codecov Report

Merging #652 (40b0d3b) into main (021a8e3) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #652   +/-   ##
=======================================
  Coverage   69.56%   69.56%           
=======================================
  Files         299      299           
  Lines       16738    16738           
=======================================
  Hits        11643    11643           
  Misses       5095     5095           
Impacted Files Coverage Δ
src/buffer/immutable.rs 97.72% <100.00%> (ø)
src/compute/arithmetics/time.rs 25.68% <0.00%> (-0.92%) ⬇️
src/io/parquet/read/nested_utils.rs 78.43% <0.00%> (+0.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 021a8e3...40b0d3b. Read the comment docs.

@jorgecarleitao
Copy link
Owner

I made it on purpose to not expose since offset is an implementation detail of this crate (so far). Could you just briefly describe the use-case? Curious how you need this ^_^

@ritchie46
Copy link
Collaborator Author

When I build an array of dtype Categorical and call n_unique or unique, I can return the hashmaps length or keys respectively.

This is valid as long a the array is not sliced. To see if it is sliced I check the offset. If it is non zero, we cannot take that fast path and we compute the unique values.

I could modify the slice operation if needed. But it is an implementation detail that has useful information for downstream crates.

Another possibility would maybe be an function that returns the memory size of the original buffer (without offset and len taken into account)?

@jorgecarleitao
Copy link
Owner

Categorical <-> to a dictionary, right?

@ritchie46
Copy link
Collaborator Author

No, in polars the categorical itself. It is backend by a UInt32Array and an internal hashmap. The length of the hashmap is most cases the length of unique(array). Until we slice. :)

@jorgecarleitao
Copy link
Owner

ah, and if the array is sliced, the length of the hashmap is not by itself equal to the unique because some items may have been "sliced away" even though they are still part of the HashMap?

@ritchie46
Copy link
Collaborator Author

ah, and if the array is sliced, the length of the hashmap is not by itself equal to the unique because some items may have been "sliced away" even though they are still part of the HashMap?

Indeed :)

@jorgecarleitao
Copy link
Owner

could you rebase to get a green CI?

@jorgecarleitao jorgecarleitao merged commit 6ec9cf5 into jorgecarleitao:main Dec 8, 2021
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Dec 8, 2021
@jorgecarleitao jorgecarleitao changed the title make Buffer::offset public Made Buffer::offset public Dec 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants