Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct layout eval with sub-expression slicing and push down #1893

Merged
merged 30 commits into from
Jan 11, 2025

Conversation

joseph-isaacs
Copy link
Member

No description provided.

@joseph-isaacs joseph-isaacs changed the base branch from develop to ji/rescope-exprs January 10, 2025 16:04
@joseph-isaacs joseph-isaacs changed the title [wip] Struct layout eval with sub-expression slicing and push down Struct layout eval with sub-expression slicing and push down Jan 10, 2025
@joseph-isaacs joseph-isaacs requested a review from gatesn January 10, 2025 17:37
@@ -19,7 +19,7 @@ mod project;
pub mod pruning;
mod row_filter;
mod select;
mod transform;
pub mod transform;
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, how did this land on develop!

@@ -5,6 +5,10 @@ use crate::{ExprRef, GetItem, Pack};

pub struct ExprSimplify();

pub fn simplify(e: ExprRef) -> VortexResult<ExprRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, no. You can decide if you want a function called simplify, or ExprSimplify::simplify. Now you have both!

todo!()
impl ExprEvaluator for StructReader {
async fn evaluate_expr(&self, row_mask: RowMask, expr: ExprRef) -> VortexResult<ArrayData> {
// TODO: apply validity mask to row_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what this means?


/// Given an expression, an identity-type and a list of n fields return n optional expressions
/// ones containing only references to the corresponding field and an expression defined in terms of
/// the n expression which combines them back into a single expression.
fn split_expression(expr: ExprRef, dtype: &DType) -> VortexResult<(ExprRef, ExpressionSplits)> {
pub fn split_expression(expr: ExprRef, dtype: &DType) -> VortexResult<(ExprRef, ExpressionSplits)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file be called split_expression?

@@ -8,12 +8,12 @@ use vortex_error::{vortex_err, VortexExpect, VortexResult};
use crate::traversal::{FoldChildren, FoldDown, FoldUp, Folder, FolderMut, Node};
use crate::{get_item, ident, pack, ExprRef, GetItem, Identity, Select, SelectField};

type ExpressionSplits = HashMap<Field, (FieldName, ExprRef)>;
pub type ExpressionSplits = HashMap<Field, (FieldName, ExprRef)>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather a struct ExpressionSplits(HashMap<...>) with named accessor fields. Type aliases can get messy pretty fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the value is a tuple and not just ExprRef?

Couldn't you make the value Pack(vec![FieldName], vec![ExprRef]), and then Merge together the results?

results.push(
self.child(field)?
.evaluate_expr(row_mask.clone(), expr.clone())
.boxed_local(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you box these? They're all the same type being pushed to a local vec

result_field_name.into(),
arrays,
usize::try_from(row_count)?,
// TODO: handle validity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use TODO(jisaacs): or some identifier. It's more robust than git blame

let arrays = try_join_all(results).await?;

let row_count = self.layout().row_count();
assert!(arrays.iter().all(|a| a.len() as u64 == row_count));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the array lengths should equal row_mask.true_count() no? Can you add a test for a partial row mask

layout: LayoutData,
ctx: ContextRef,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this newline, then it groups together all the args that are passed in, vs the internal state below.

@@ -63,6 +63,7 @@ impl LayoutWriter for StructLayoutWriter {
self.row_count += struct_array.len() as u64;

for i in 0..struct_array.nfields() {
// TODO(joe): handle struct validity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you at least return an error if the struct is nullable and isn't AllValid?

gatesn added a commit that referenced this pull request Jan 11, 2025
Defines a transform on expression with an scope type of struct(..). The
transform splits the expr in to a series of expressions that can be
evaluated only using a single field of the struct, and a combination
expr which combines the series of smaller expr into the original expr.

This is used #1893.

---------

Co-authored-by: Nicholas Gates <nick@nickgates.com>
Base automatically changed from ji/rescope-exprs to develop January 11, 2025 15:00
.map(|(i, name)| (name.clone(), i))
.collect();

// TODO(joe): Think about only expanding the fields when we need them -- lazily in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can't do this. It forces eager evaluation of all fields, which goes against the point of lazy dtypes

@gatesn gatesn enabled auto-merge (squash) January 11, 2025 16:03
@gatesn gatesn merged commit cedcb24 into develop Jan 11, 2025
21 checks passed
@gatesn gatesn deleted the ji/-expr-proj-struct-layout branch January 11, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants