Implement `MutableStructArray` #703

hohav · 2021-12-23T17:04:37Z

I'd like to work on implementing MutableStructArray. Any thoughts on how its API should look? The arrow-rs equivalent StructBuilder requires calling append on each child array, followed by append on the parent. Should we just follow the same pattern with push?

(Feel free to move this to Discussions, if that's a better place.)

The text was updated successfully, but these errors were encountered:

jorgecarleitao · 2021-12-25T06:23:22Z

Hi @hohav , thanks for the suggestion. Could you describe what would be the use case?

I never seen the need for it other than the "struct of arrays" idiom. For that I would recommend this crate: /~https://github.com/DataEngineeringLabs/arrow2-derive - it allows deriving an arrow-compatible array of structs and use the struct directly. It has the advantage of making everything typed.

hohav · 2021-12-25T16:03:09Z

That's exactly my use case, but I can't currently use arrow2-derive because my structs are nested. And even if you added support for nested structs, I also need to omit certain fields conditionally, which would be hard to justify in a general-purpose macro.

I'd implemented this stuff in my own derive macro using arrow-rs before I came across arrow2. I was looking at porting things over more-or-less directly, but I'm open to another approach if that one's not going to be idiomatic for arrow2.

jorgecarleitao · 2021-12-30T05:25:45Z

Thanks a lot for the insight.

It is not that it would not be idiomatic, it is just because we need to downcast on every push. Essentially, the hot loops (i.e. when we push will be filled with conditionals branches, one per field). The equivalent typed approach would be equivalent to:

let mut f1: MutablePrimitiveArray<i32> = ...
let mut f2: MutableUtf8Array<i32> = ...

fn push(v1: i32, v2: Option<&str>) {
     f1.push(Some(v1));
     f2.push(v2);
}

Now that I think about this, we do have a use-case already also, I did not realize until now that it was the same: when reading from avro where the schema is only known at runtime, we use Vec<Box<dyn MutableArray>> over all fields to read from a row to a Vec<Arc<dyn Array>> (which is basically a MutableStructArray.

So, in summary: if the schema is known at compile time, we can derive it and avoid a MutableStructArray; otherwise, we need one. Therefore, I agree with you that this is a valid use-case.

jorgecarleitao added the feature A new feature label Dec 30, 2021

hohav mentioned this issue Jul 31, 2022

Added MutableStructArray #1196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `MutableStructArray` #703

Implement `MutableStructArray` #703

hohav commented Dec 23, 2021

jorgecarleitao commented Dec 25, 2021

hohav commented Dec 25, 2021

jorgecarleitao commented Dec 30, 2021

Implement MutableStructArray #703

Implement MutableStructArray #703

Comments

hohav commented Dec 23, 2021

jorgecarleitao commented Dec 25, 2021

hohav commented Dec 25, 2021

jorgecarleitao commented Dec 30, 2021

Implement `MutableStructArray` #703

Implement `MutableStructArray` #703