You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
I'd like to work on implementing MutableStructArray. Any thoughts on how its API should look? The arrow-rs equivalent StructBuilder requires calling append on each child array, followed by append on the parent. Should we just follow the same pattern with push?
(Feel free to move this to Discussions, if that's a better place.)
The text was updated successfully, but these errors were encountered:
Hi @hohav , thanks for the suggestion. Could you describe what would be the use case?
I never seen the need for it other than the "struct of arrays" idiom. For that I would recommend this crate: /~https://github.com/DataEngineeringLabs/arrow2-derive - it allows deriving an arrow-compatible array of structs and use the struct directly. It has the advantage of making everything typed.
That's exactly my use case, but I can't currently use arrow2-derive because my structs are nested. And even if you added support for nested structs, I also need to omit certain fields conditionally, which would be hard to justify in a general-purpose macro.
I'd implemented this stuff in my own derive macro using arrow-rs before I came across arrow2. I was looking at porting things over more-or-less directly, but I'm open to another approach if that one's not going to be idiomatic for arrow2.
It is not that it would not be idiomatic, it is just because we need to downcast on every push. Essentially, the hot loops (i.e. when we push will be filled with conditionals branches, one per field). The equivalent typed approach would be equivalent to:
Now that I think about this, we do have a use-case already also, I did not realize until now that it was the same: when reading from avro where the schema is only known at runtime, we use Vec<Box<dyn MutableArray>> over all fields to read from a row to a Vec<Arc<dyn Array>> (which is basically a MutableStructArray.
So, in summary: if the schema is known at compile time, we can derive it and avoid a MutableStructArray; otherwise, we need one. Therefore, I agree with you that this is a valid use-case.
I'd like to work on implementing
MutableStructArray
. Any thoughts on how its API should look? The arrow-rs equivalentStructBuilder
requires callingappend
on each child array, followed byappend
on the parent. Should we just follow the same pattern withpush
?(Feel free to move this to Discussions, if that's a better place.)
The text was updated successfully, but these errors were encountered: