Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByRow without columns #2457

Closed
matthieugomez opened this issue Sep 28, 2020 · 6 comments · Fixed by #2476
Closed

ByRow without columns #2457

matthieugomez opened this issue Sep 28, 2020 · 6 comments · Fixed by #2476
Labels
Milestone

Comments

@matthieugomez
Copy link
Contributor

matthieugomez commented Sep 28, 2020

Would it be possible at some point to use DataFrames ByRow without columns? This would allow one to do things like:

transform(df, Any[] => ByRow(rand) => :z)
@bkamins
Copy link
Member

bkamins commented Sep 29, 2020

It was considered.

The reason why it is not allowed is that ByRow is essentially a shorthand for broadcasting currently. To keep consistency here broadcasting over an empty collection would give us an empty collection. If you pass at least one column then you are good as this column defines you the dimensions of the output.

Actually your example was the only use-case I could come-up with that would be useful and I thought that it is not worth to break the design to handle it.

Technically it would be possible to work around this and add this support but I was not sure if it is worth it. What do you think?

@bkamins bkamins added this to the 1.x milestone Sep 29, 2020
@pdeffebach
Copy link
Contributor

I would like it. I was working on a problem set the other day where I wanted to generate random numbers for error terms.

This actually seems quite hard to fix! we have to catch the combination of Any[] and ByRow and then use some sort sort of struct for dispatch that constains the number of rows in a DataFrame to call [f() for i in 1:N]

@pdeffebach
Copy link
Contributor

It will also make DataFramesMeta a lot easier in case you see something with the yet-to-be-implemented @row macro

@transform(df, @row y = 7)

This will get lowered to

transform(df, [] => ByRow(_f) => :y)

which will error.

@bkamins
Copy link
Member

bkamins commented Sep 29, 2020

This actually seems quite hard to fix!

If we add it (I will add it then) we have to hard-code this in every function that potentially might call ByRow.

Note that ByRow(sin) is just a function. It knows nothing about data frames existence at all and if x is a vector ByRow(sin)(x) is just the same as writing sin.(x) (and has nothing to do with data frames).

@matthieugomez
Copy link
Contributor Author

matthieugomez commented Sep 29, 2020 via email

@bkamins
Copy link
Member

bkamins commented Sep 29, 2020

I have added it to the PR fixing #2410.
In a few days I will open an issue for #2410 for AbstractDataFrame. Then looking at it will be welcome.
When we finalize that one I will open another PR for GroupedDataFrame (where we also need to change this in particular) and filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants