-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of bind_rows on frames with a Date column #5327
Comments
All the time is spent duplicating vectors in @DavisVaughan We might have to rethink the ownership assumptions, and maybe consider reverting to a simple ownership parameter. |
Slightly more minimal vctrs reprex library(vctrs)
df <- data.frame(date = rep(as.Date("2020-01-01"), 100))
lst <- rep(list(df), 1000)
lst_rbind <- function(x) {
vec_rbind(!!!x)
}
bench::mark(lst_rbind(lst))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 lst_rbind(lst) 243ms 277ms 3.61 764MB 59.6 Created on 2020-06-15 by the reprex package (v0.3.0) |
It seems like what is happening here is:
The problem is that So then when We might be able to avoid this with the following alternative approach to
@lionel- reminded me of r-lib/vctrs#1107, which this seems to be related to |
Beginning in
dplyr 1.0.0
, runningbind_rows
on frames with aDate
column is slow and memory intensive.Created on 2020-06-11 by the reprex package (v0.3.0)
Possibly related, running
tidyr::unnest
on a nested frame with aDate
column yields similar issues. I've reproduced this issue withdplyr 0.8.5
,tibble 2.1.3
, andtidyr 1.0.2
.The text was updated successfully, but these errors were encountered: