-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Added input validation for explode
operation in the array namespace
#19163
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19163 +/- ##
==========================================
+ Coverage 79.65% 80.00% +0.35%
==========================================
Files 1532 1527 -5
Lines 209101 209221 +120
Branches 2417 2415 -2
==========================================
+ Hits 166557 167387 +830
+ Misses 41998 41286 -712
- Partials 546 548 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @barak1412.
Adding an Explode
to the Array
methods is the good approach. We only don't have to add other physical explode implementations as we already have a working one.
Upon convert the DSL
to IR
we should translate Expr::Array(Explode) -> Expr::Explode
and during that conversion we should validate if the input type is indeed of DataType::Array
.
@@ -170,6 +170,10 @@ pub trait ArrayNameSpace: AsArray { | |||
}; | |||
Ok(out.into_series()) | |||
} | |||
fn array_explode(&self) -> PolarsResult<Series> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it on the namespace.
@@ -249,3 +253,7 @@ pub(super) fn shift(s: &[Column]) -> PolarsResult<Column> { | |||
|
|||
ca.array_shift(n.as_materialized_series()).map(Column::from) | |||
} | |||
|
|||
pub(super) fn explode(s: &Column) -> PolarsResult<Column> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed.
@ritchie46 Thanks. So correct me if I am wrong - What do I miss? |
@@ -56,6 +57,7 @@ impl ArrayFunction { | |||
#[cfg(feature = "array_count")] | |||
CountMatches => mapper.with_dtype(IDX_DTYPE), | |||
Shift => mapper.with_same_dtype(), | |||
Explode => mapper.map_to_list_and_array_inner_dtype(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do the dtype validation here - e.g.
Explode => mapper.try_map_to_array_inner_dtype()?,
Where try_map_to_array_inner_dtype
returns an error if the dtype isn't Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we could maybe even add a check before this match block to ensure the dtype is Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's needed. As that check would be required on every get_dtype
. We already have this
fn check_namespace(function: &FunctionExpr, first_dtype: &DataType) -> PolarsResult<()> { |
So I think it will resolve itself. During simplification of expressions we can then rewrite Array(Explode)
to Explode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - then adding Explode
under ArrayFunction
should already add input dtype validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nameexhaustion
Thanks, I was thinking about this solution, but I prefered one validation to all namespaces operations.
@ritchie46
It's not going to wok because we don't hit check_namespace
due to an extra condition in the match - if options.cast_to_supertypes.is_some()
:
/~https://github.com/pola-rs/polars/blob/main/crates/polars-plan/src/plans/conversion/type_coercion/mod.rs#L275
In addition, if we convert to Expr:Explode
, we may not hit that match at all, I need to verify it.
So I propose -
- Make
check_namespace
hit allFunctionExpr
. - If the
Array(Explode)
->Explode
conversion won't hit, we will implement @nameexhaustion 's idea.
What do you think?
Edit:
@ritchie46 I saw you added to Expr
the possibility to know its output type, so I will use it during conversion.
On Separate PR I will make all AExpr::Function
hit the namespace validation.
I think only make |
Almost there @barak1412, you need to set the branch in |
I am still little stuck on the type validation, and I will be glad for direction -
|
explode
part of the array namespaceexplode
operation in the array namespace
@ritchie46 I went for @nameexhaustion solution, due to what I wrote in the previous comment. How does it look like? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @barak1412. Sorry for the late reply, but it looks great!
Partly fixes #19049.
Now it has validation, but not in the place we want -
I want in seperate PR handle validation for all namespace operations, which suffer from same problem.