-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct layout eval with sub-expression slicing and push down #1893
Conversation
# Conflicts: # Cargo.lock # Cargo.toml # vortex-expr/Cargo.toml # vortex-expr/src/lib.rs
vortex-expr/src/lib.rs
Outdated
@@ -19,7 +19,7 @@ mod project; | |||
pub mod pruning; | |||
mod row_filter; | |||
mod select; | |||
mod transform; | |||
pub mod transform; | |||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, how did this land on develop!
@@ -5,6 +5,10 @@ use crate::{ExprRef, GetItem, Pack}; | |||
|
|||
pub struct ExprSimplify(); | |||
|
|||
pub fn simplify(e: ExprRef) -> VortexResult<ExprRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, no. You can decide if you want a function called simplify
, or ExprSimplify::simplify
. Now you have both!
todo!() | ||
impl ExprEvaluator for StructReader { | ||
async fn evaluate_expr(&self, row_mask: RowMask, expr: ExprRef) -> VortexResult<ArrayData> { | ||
// TODO: apply validity mask to row_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what this means?
|
||
/// Given an expression, an identity-type and a list of n fields return n optional expressions | ||
/// ones containing only references to the corresponding field and an expression defined in terms of | ||
/// the n expression which combines them back into a single expression. | ||
fn split_expression(expr: ExprRef, dtype: &DType) -> VortexResult<(ExprRef, ExpressionSplits)> { | ||
pub fn split_expression(expr: ExprRef, dtype: &DType) -> VortexResult<(ExprRef, ExpressionSplits)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this file be called split_expression
?
@@ -8,12 +8,12 @@ use vortex_error::{vortex_err, VortexExpect, VortexResult}; | |||
use crate::traversal::{FoldChildren, FoldDown, FoldUp, Folder, FolderMut, Node}; | |||
use crate::{get_item, ident, pack, ExprRef, GetItem, Identity, Select, SelectField}; | |||
|
|||
type ExpressionSplits = HashMap<Field, (FieldName, ExprRef)>; | |||
pub type ExpressionSplits = HashMap<Field, (FieldName, ExprRef)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather a struct ExpressionSplits(HashMap<...>)
with named accessor fields. Type aliases can get messy pretty fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the value is a tuple and not just ExprRef
?
Couldn't you make the value Pack(vec![FieldName], vec![ExprRef])
, and then Merge together the results?
results.push( | ||
self.child(field)? | ||
.evaluate_expr(row_mask.clone(), expr.clone()) | ||
.boxed_local(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you box these? They're all the same type being pushed to a local vec
result_field_name.into(), | ||
arrays, | ||
usize::try_from(row_count)?, | ||
// TODO: handle validity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TODO(jisaacs):
or some identifier. It's more robust than git blame
let arrays = try_join_all(results).await?; | ||
|
||
let row_count = self.layout().row_count(); | ||
assert!(arrays.iter().all(|a| a.len() as u64 == row_count)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the array lengths should equal row_mask.true_count()
no? Can you add a test for a partial row mask
layout: LayoutData, | ||
ctx: ContextRef, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this newline, then it groups together all the args that are passed in, vs the internal state below.
@@ -63,6 +63,7 @@ impl LayoutWriter for StructLayoutWriter { | |||
self.row_count += struct_array.len() as u64; | |||
|
|||
for i in 0..struct_array.nfields() { | |||
// TODO(joe): handle struct validity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you at least return an error if the struct is nullable and isn't AllValid?
Defines a transform on expression with an scope type of struct(..). The transform splits the expr in to a series of expressions that can be evaluated only using a single field of the struct, and a combination expr which combines the series of smaller expr into the original expr. This is used #1893. --------- Co-authored-by: Nicholas Gates <nick@nickgates.com>
.map(|(i, name)| (name.clone(), i)) | ||
.collect(); | ||
|
||
// TODO(joe): Think about only expanding the fields when we need them -- lazily in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can't do this. It forces eager evaluation of all fields, which goes against the point of lazy dtypes
No description provided.