Skip to content

Commit

Permalink
add support for getproperty broadcasting (#2655)
Browse files Browse the repository at this point in the history
  • Loading branch information
bkamins authored Mar 25, 2021
1 parent 94e17b8 commit 34307bc
Show file tree
Hide file tree
Showing 5 changed files with 73 additions and 7 deletions.
11 changes: 11 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,23 @@
additional column to be added in the last position in the resulting data frame
that will identify the source data frame.
([#2649](/~https://github.com/JuliaData/DataFrames.jl/pull/2649))
* since Julia 1.7 using broadcasting assignment on a `DataFrame` column
selected as a property (e.g. `df.col .= 1`) is allowed when column does not
exist and it allocates a fresh column
([#2655](/~https://github.com/JuliaData/DataFrames.jl/pull/2655))

## Deprecated

* in `leftjoin`, `rightjoin`, and `outerjoin` the `indicator` keyword argument
is deprecated in favor of `source` keyword argument; `indicator` will be removed
in 2.0 release ([2649](/~https://github.com/JuliaData/DataFrames.jl/pull/2649))
* Using broadcasting assignment on a `SubDataFrames` column selected as a property
(e.g. `sdf.col .= 1`) is deprecated; it will be disallowed in the future.
([#2655](/~https://github.com/JuliaData/DataFrames.jl/pull/2655))
* Broadcasting assignment to an existing column of a `DataFrame`
selected as a property (e.g. `df.col .= 1`) being an in-place
operation is deprecated. It will allocate a fresh column in the future
([#2655](/~https://github.com/JuliaData/DataFrames.jl/pull/2655))
* all deprecations present in 0.22 release now throw an error
([#2554](/~https://github.com/JuliaData/DataFrames.jl/pull/2554))

Expand Down
15 changes: 10 additions & 5 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,19 +183,24 @@ In such an operation `AbstractDataFrame` is considered as two-dimensional and `D
`DataFrameRow` is considered to be column-oriented.

Additional rules:
* in the `df[CartesianIndex(row, col)] .= v`, `df[row, col] .= v` syntaxes `v` is broadcasted into the contents of `df[row, col]` (this is consistent with Julia Base);
* in the `df[CartesianIndex(row, col)] .= v`, `df[row, col] .= v` syntaxes `v` is
broadcasted into the contents of `df[row, col]` (this is consistent with Julia Base);
* in the `df[row, cols] .= v` syntaxes the assignment to `df` is performed in-place;
* in the `df[rows, col] .= v` and `df[rows, cols] .= v` syntaxes the assignment to `df` is performed in-place;
if `rows` is `:` and `col` is `Symbol` or `AbstractString` and it is missing from `df` then a new column is allocated and added;
* in the `df[rows, col] .= v` and `df[rows, cols] .= v` syntaxes the assignment to
`df` is performed in-place; if `rows` is `:` and `col` is `Symbol` or `AbstractString`
and it is missing from `df` then a new column is allocated and added;
the length of the column is always the value of `nrow(df)` before the assignment takes place;
* in the `df[!, col] .= v` syntax column `col` is replaced by a freshly allocated vector;
if `col` is `Symbol` or `AbstractString` and it is missing from `df` then a new column is allocated added;
the length of the column is always the value of `nrow(df)` before the assignment takes place;
* the `df[!, cols] .= v` syntax replaces existing columns `cols` in data frame `df` with freshly allocated vectors;
* `df.col .= v` syntax is allowed and performs in-place assignment to an existing vector `df.col`.
* `df.col .= v` syntax currently performs in-place assignment to an existing vector `df.col`;
this behavior is deprecated and a new column will be allocated in the future.
Starting from Julia 1.7 if `:col` is not present in `df` then a new column will be created in `df`.
* in the `sdf[CartesianIndex(row, col)] .= v`, `sdf[row, col] .= v` and `sdf[row, cols] .= v` syntaxes the assignment to `sdf` is performed in-place;
* in the `sdf[rows, col] .= v` and `sdf[rows, cols] .= v` syntaxes the assignment to `sdf` is performed in-place;
* `sdf.col .= v` syntax is allowed and performs in-place assignment to an existing vector `sdf.col`.
* `sdf.col .= v` syntax is performs an in-place assignment to an existing vector `sdf.col` and is deprecated;
in the future this operation will not be allowed.
* `dfr.col .= v` syntax is allowed and performs in-place assignment to a value extracted by `dfr.col`.

Note that `sdf[!, col] .= v` and `sdf[!, cols] .= v` syntaxes are not allowed as `sdf` can be only modified in-place.
Expand Down
22 changes: 22 additions & 0 deletions src/other/broadcasting.jl
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,28 @@ end
Base.dotview(df::SubDataFrame, ::typeof(!), idxs) =
throw(ArgumentError("broadcasting with ! row selector is not allowed for SubDataFrame"))


# TODO: remove the deprecations when Julia 1.7 functionality is commonly used
# by the community
if isdefined(Base, :dotgetproperty)
function Base.dotgetproperty(df::DataFrame, col::SymbolOrString)
if columnindex(df, col) == 0
return LazyNewColDataFrame(df, Symbol(col))
else
Base.depwarn("In the future this operation will allocate a new column" *
"instead of performing an in-place assignment.", :dotgetproperty)
return getproperty(df, col)
end
end

function Base.dotgetproperty(df::SubDataFrame, col::SymbolOrString)
Base.depwarn("broadcasting getproperty is deprecated for SubDataFrame and " *
"will be disallowed in the future. Use `df[:, $(repr(col))] .= ... instead",
:dotgetproperty)
return getproperty(df, col)
end
end

function Base.copyto!(lazydf::LazyNewColDataFrame, bc::Base.Broadcast.Broadcasted{T}) where T
if bc isa Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}}
bc_tmp = Base.Broadcast.Broadcasted{T}(bc.f, bc.args, ())
Expand Down
23 changes: 21 additions & 2 deletions test/broadcasting.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1458,8 +1458,13 @@ end
@test v1 == [100.0, 100.0, 100.0]

df = copy(refdf)
@test_throws ArgumentError df.newcol .= 'd'
@test df == refdf
if isdefined(Base, :dotgetproperty)
df.newcol .= 'd'
@test df == [refdf DataFrame(newcol=fill('d', 3))]
else
@test_throws ArgumentError df.newcol .= 'd'
@test df == refdf
end

df = view(copy(refdf), :, :)
v1 = df[!, 1]
Expand Down Expand Up @@ -1842,4 +1847,18 @@ end
@test_throws DimensionMismatch df[:, "z"] .= z
end

@testset "broadcasting of getproperty" begin
if isdefined(Base, :dotgetproperty)
df = DataFrame(a=1:4)
df.b .= 1
df.c .= 4:-1:1
# TODO: enable this in the future when the deprecation period is finished
# df.a .= 'a':'d'
# @test df.a isa Vector{Char}
# @test df == DataFrame(a='a':'d', b=1, c=4:-1:1)
# dfv = view(df, 2:3, 2:3)
# @test_throws ArgumentError dfv.b .= 0
end
end

end # module
9 changes: 9 additions & 0 deletions test/deprecated.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ const ≅ = isequal
@test_throws ArgumentError aggregate()
end

@testset "deprecated broadcasting assignment" begin
df = DataFrame(a=1:4, b=1, c=2)
df.a .= 'a':'d'
@test df == DataFrame(a=97:100, b=1, c=2)
dfv = view(df, 2:3, 2:3)
dfv.b .= 0
@test df.b == [1, 0, 0, 1]
end

@testset "All indexing" begin
df = DataFrame(a=1, b=2, c=3)

Expand Down

0 comments on commit 34307bc

Please sign in to comment.