Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix select bug with copycols=false on SubDataFrame #3231

Merged
merged 5 commits into from
Nov 28, 2022
Merged

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Nov 25, 2022

Fixes the following bug:

julia> df = DataFrame(a=1:3)
3×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> dfv = view(df, :, :)
3×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> select(dfv, x -> true, copycols=false) # bug - this should error
3×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> select(dfv, x -> true) # correct behavior in case of copying, but impossible to achieve without copying when working on a view
3×1 DataFrame
 Row │ x1
     │ Bool
─────┼──────
   1 │ true
   2 │ true
   3 │ true

@bkamins bkamins added the bug label Nov 25, 2022
@bkamins bkamins added this to the patch milestone Nov 25, 2022
@bkamins bkamins requested a review from nalimilan November 25, 2022 16:52
@bkamins
Copy link
Member Author

bkamins commented Nov 25, 2022

Also fixed:

julia> select(df, 1.0)
ERROR: ArgumentError: Unrecognized column selector 1.0 in AsTable constructor
... (here removed a long stack-trace) ...
caused by: MethodError: no method matching getindex(::DataFrames.Index, ::Float64)

by

julia> select(df, 1.0)
ERROR: ArgumentError: invalid index: 1.0 of type Float64

which I think is nicer

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break code that incorrectly relies on getting no error, so maybe it could wait for 1.5 to limit disruption?

@bkamins
Copy link
Member Author

bkamins commented Nov 26, 2022

This will break code that incorrectly relies on getting no error

I do not see what correct code would be broken here.

The only possibility that code worked that currently fails is select(dfv, fun, copycols=false), where fun is just a function (taking a data frame as an input). In such a case no transformation is done but the function currently is wrapped in Cols.
This happens in this line view(dfv, :, Cols(newinds...)), which makes sense ONLY if newinds is actually a vector of column selectors only.
However, it sometimes works in the old code, as in the example:

julia> select(dfv, x -> true, copycols=false) # bug - this should error
3×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

because Cols accepts taking a function as its argument (which probably was not the case when this code was written 😞, as we added this feature later to Cols is I remember correctly). But it changes its meaning. Now the function is not passed SubDataFrame, but is a predicate indicating which column should be kept (in the above example x -> true is interpreted as keep all columns selector).

For me it is a bug that never can produce what user wanted. And that is why I thought to make a patch release for it.

@nalimilan
Copy link
Member

OK, fair enough!

@bkamins bkamins merged commit cea2619 into main Nov 28, 2022
@bkamins bkamins deleted the bk/fix_selection branch November 28, 2022 13:37
@bkamins
Copy link
Member Author

bkamins commented Nov 28, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants