Revert to a simpler ownership model - but always duplicate ALTREP #1151

DavisVaughan · 2020-06-15T14:42:58Z

Closes tidyverse/dplyr#5327
Closes #1124
Closes #1122

Reverts back to a slightly simpler ownership model of VCTRS_OWNED_true and VCTRS_OWNED_false.

When we own the object, we only ever attempt to duplicate it if it is ALTREP. If vec_init() ever creates ALTREP objects in the future (#837), this will be required. When doing assignment, we have to duplicate ALTREP objects before dereferencing even if we own them, because we need access to the actual data that it is representing, not the ALTREP object's internals.

When we don't own the object, we use r_clone_referenced() to determine if we need to duplicate or not.

This fixed the repeated duplication that was occurring in df_assign()'s calls to vec_proxy_assign_opts(), but I then discovered that we still had repeated duplication in vec_restore(). To fix that, I had to borrow from the ideas in #1124 and pass through owned to vec_restore() as well. This allows us to avoid attempting duplication here as well.

With both of those in place, tidyverse/dplyr#5327 is fixed:

library(dplyr)

date_frame <- tibble(date = rep(lubridate::date("2020-01-01"), 100))
date_frames_to_bind <- rep(list(date_frame), 10000)
bench::mark(bind_rows(date_frames_to_bind))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression                          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 bind_rows(date_frames_to_bind)    133ms    142ms      6.87    8.39MB     25.8

It is worth noting that @lionel- and I both think that eventually we should probably have recursive proxy and restore functions (#1107). This would allow df_assign() to not have to proxy and restore the output columns repeatedly, which is where many of these duplication issues arise. That would also allow us to remove the owned argument from vec_restore() again, which I am currently viewing as a temporary fix.

This also closes #1124, because it fixes the original problem there where columns of df-cols were being duplicated.

lionel-

Looks good! I'm glad we're starting a set of performance tests.

src/owned.h

tests/testthat/helper-performance.R

DavisVaughan marked this pull request as ready for review June 15, 2020 15:01

DavisVaughan requested a review from lionel- June 15, 2020 15:06

lionel- approved these changes Jun 16, 2020

View reviewed changes

src/owned.h Show resolved Hide resolved

tests/testthat/helper-performance.R Outdated Show resolved Hide resolved

DavisVaughan added 6 commits June 24, 2020 11:16

Revert to a simpler ownership model - but always duplicate ALTREP

919ee1a

Reverse enum style

3858dbf

Thread owned through vec_restore() to prevent duplication

565c9a1

Add performance regression tests for significant time differences

696d34e

Unconditionally clone ALTREP objects

5eb14eb

time_expr() -> time_of()

e3a9840

DavisVaughan force-pushed the simple-ownership branch from 29886eb to e3a9840 Compare June 24, 2020 15:19

DavisVaughan merged commit a518ead into r-lib:master Jun 24, 2020

DavisVaughan deleted the simple-ownership branch June 24, 2020 15:46

DavisVaughan mentioned this pull request Oct 9, 2023

ALTREP list performance fix: Never clone in vec_clone_referenced() when owned #1884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert to a simpler ownership model - but always duplicate ALTREP #1151

Revert to a simpler ownership model - but always duplicate ALTREP #1151

DavisVaughan commented Jun 15, 2020 •

edited

Loading

lionel- left a comment

Revert to a simpler ownership model - but always duplicate ALTREP #1151

Revert to a simpler ownership model - but always duplicate ALTREP #1151

Conversation

DavisVaughan commented Jun 15, 2020 • edited Loading

lionel- left a comment

Choose a reason for hiding this comment

DavisVaughan commented Jun 15, 2020 •

edited

Loading