-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support non streamable arrow file binary format #7025
feat: support non streamable arrow file binary format #7025
Conversation
requesting review - @albertvillanova @lhoestq |
8be0e3f
to
c75c4c3
Compare
2e3af68
to
a3412c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you ! this will be pretty useful :)
Before we merge could you also add a test in tests/packaged_modules/test_arrow.py
?
I noticed it's pretty empty right now compared to test_json.py or test_csv.py though, maybe I can take care of it next week if needed
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
b497b7d
to
c257792
Compare
@lhoestq rebased the PR, It would be really helpful to have this feature into datasets, please let me know if there is anything pending on this PR, thanks. |
c257792
to
bd6546c
Compare
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
bd6546c
to
a30a66a
Compare
@lhoestq any update on this thread? Thanks |
Timely PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the useful enhancement and the test!
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
* feat: support non streamable arrow file binary format Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * use generator Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * feat: add unit test to load data in both arrow formats Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* feat: support non streamable arrow file binary format Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * use generator Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * feat: add unit test to load data in both arrow formats Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* feat: support non streamable arrow file binary format Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * use generator Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * feat: add unit test to load data in both arrow formats Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Support Arrow files (
.arrow
) that are in non streamable binary file formats.