-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial conversion to CppRef<T>
everywhere
#1429
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Jan 24, 2025
This is unfortunate IMHO. |
These changes were made as part of the prototyping for use of the new arbitrary self types v2 Rust feature (#1429) but are worth promoting into main even before that's landed. Some tweaks: * We no longer depend on the `arbitrary_self_types_pointers` nightly feature but instead just `arbitrary_self_types`. * CppRef<T> no longer has an associated lifetime. Instead there's a separate CppLtRef<T> type.
Background: Rust references have certain rules, most notably that the underlying data cannot be changed while an immutable reference exists. That's essentially impossible to promise for any C++ data; C++ may retain references or pointers to data any modify it at any time. This presents a problem for Rust/C++ interop tooling. Various solutions or workarounds are possible: 1) All C++ data is represented as zero-sized types. This is the approach taken by cxx for opaque types. This sidesteps all of the Rust reference rules, since those rules only apply to areas of memory that are referred to. This doesn't really work well enough for autocxx since we want to be able to keep C++ data on the Rust stack, using all the fancy moveit shenanigans, and that means that Rust must know the true size and alignment of the type. 2) All C++ data is represented as UnsafeCell<MaybeUninit<T>>. This also sidesteps the reference rules. This would be a valid option for autocxx. 3) Have a sufficiently simple language boundary that humans can reasonably guarantee there are no outstanding references on the C++ side which could be used to modify the underlying data. This is the approach taken by cxx for cxx::kind::Trivial types. It's just about possible to cause UB using one of these types, but you really have to work at it. In practice such UB is unlikely. 4) Never allow Rust references to C++ types. Instead use a special smart pointer type in Rust, representing a C++ reference. This is the direction in this PR. More detail on this last approach here: https://medium.com/@adetaylor/are-we-reference-yet-c-references-in-rust-72c1c6c7015a This facility is already in autocxx, by adopting the safety policy "unsafe_references_wrapped". However, it hasn't really been battle tested and has a bunch of deficiencies. It's been awaiting formal Rust support for "arbitrary self types" so that methods can be called on such smart pointers. This is now [fairly close to stabilization](rust-lang/rust#44874 (comment)); this PR is part of the experimentation required to investigate whether that rustc feature should go ahead and get stabilized. This PR essentially converts autocxx to only operate in this mode - there should no longer ever be Rust references to C++ data. This PR is incomplete: * There are still 60 failing integration tests. Mostly these relate to subclass support, which isn't yet converted. * `ValueParam` and `RValueParam` need to support taking `CppPin<T>`, and possibly `CppRef<T: CopyNew>` etc. * Because we can't implement `Deref` for `cxx::UniquePtr<T>` to emit a `CppRef<T>`, unfortunately `cxx::UniquePtr<T>` can't be used in cases where we want to provide a `const T&`. It's necessary to call `.as_cpp_ref()` on the `UniquePtr`. This is sufficiently annoying that it might be necessary to implement a trait `ReferenceParam` like we have for `ValueParam` and `RValueParam`. (Alternatives include upstreaming `CppRef<T>` into cxx, but per reason 4 listed above, the complexity probably isn't merited for statically-declared cxx interfaces; or separating from cxx entirely.) This also shows up a [Rustc problem which is fixed here](rust-lang/rust#135179). Ergonomic findings: * The problem with `cxx::UniquePtr` as noted above. * It's nice that `Deref` coercion allows methods to be called on `CppPin` as well as `CppRef`. * To get the same benefit for parameters being passed in by reference, you need to pass in `&my_cpp_pin_wrapped_thing` which is weird given that the whole point is we're trying to avoid Rust references. Obviously, calling `.as_cpp_ref()` works too, so this weirdness can be avoided. * When creating some C++ data `T`, in Rust, it's annoying to have to decide a-priori whether it'll be Rust or C++ accessing the data. If the former, you just create a new `T`; if the latter you need to wrap it in `CppPin::new`. This is only really a problem when creating a C++ object on which you'll call methods. It feels like it'll be OK in practice. Possibly this can be resolved by making the method receiver some sort of `impl MethodReceiver<T>` generic; an implementation for `T` could be provided which auto-wraps it into a `CppPin` (consuming it at that point). This sounds messy though. A bit more thought required, but even if this isn't possible it doesn't sound like a really serious ergonomics problem, especially if we can use `#[diagnostic::on_unimplemented]` somehow to guide. Next steps here: * Stabilize arbitrary self types. This PR has gone far enough to show that there are no really serious unexpected issues there. * Implement `ValueParam` and `RValueParam` as necessary for `CppRef` and `CppPin` types. * Work on those ergonomic issues to the extent possible. * Make a bold decision about whether autocxx should shift wholesale away from `&` to `CppRef<T>`. If so, this will be a significant breaking change.
79fcd3b
to
389c868
Compare
Merged
Some of this has been moved into #1463. The rest probably needs to wait. Closing for now. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background:
Rust references have certain rules, most notably that the underlying data cannot be changed while an immutable reference exists. That's essentially impossible to promise for any C++ data; C++ may retain references or pointers to data any modify it at any time. This presents a problem for Rust/C++ interop tooling. Various solutions or workarounds are possible:
All C++ data is represented as zero-sized types.
This is the approach taken by cxx for opaque types. This sidesteps all of the
Rust reference rules, since those rules only apply to areas of memory that
are referred to. This doesn't really work well enough for autocxx since
we want to be able to keep C++ data on the Rust stack, using all the fancy
moveit shenanigans, and that means that Rust must know the true size and
alignment of the type.
All C++ data is represented as
UnsafeCell<MaybeUninit<T>>
.This also sidesteps the reference rules. This would be a valid option for
autocxx.
Have a sufficiently simple language boundary that humans can
reasonably guarantee there are no outstanding references on the C++ side
which could be used to modify the underlying data.
This is the approach taken by cxx for
cxx::kind::Trivial
types. It'sjust about possible to cause UB using one of these types, but you really
have to work at it. In practice such UB is unlikely.
Never allow Rust references to C++ types. Instead use a special
smart pointer type in Rust, representing a C++ reference.
This is the direction in this PR.
More detail on this last approach here:
https://medium.com/@adetaylor/are-we-reference-yet-c-references-in-rust-72c1c6c7015a
This facility is already in autocxx, by adopting the safety policy "unsafe_references_wrapped". However, it hasn't really been battle tested and has a bunch of deficiencies.
It's been awaiting formal Rust support for "arbitrary self types" so that methods can be called on such smart pointers. This is now
fairly close to stabilization; this PR is part of the experimentation required to investigate whether that rustc feature should go ahead and get stabilized.
This PR essentially converts autocxx to only operate in this mode - there should no longer ever be Rust references to C++ data.
This PR is incomplete:
ValueParam
andRValueParam
need to support takingCppPin<T>
, and possiblyCppRef<T: CopyNew>
etc.Deref
forcxx::UniquePtr<T>
to emit aCppRef<T>
, unfortunatelycxx::UniquePtr<T>
can't be used in cases where we want to provide aconst T&
. It's necessary to call.as_cpp_ref()
on theUniquePtr
. This is sufficiently annoying that it might be necessary to implement a traitReferenceParam
like we have forValueParam
andRValueParam
. (Alternatives include upstreamingCppRef<T>
into cxx, but per reason 4 listed above, the complexity probably isn't merited for statically-declared cxx interfaces; or separating from cxx entirely.)This also shows up a Rustc problem which is fixed here.
Ergonomic findings:
cxx::UniquePtr
as noted above.Deref
coercion allows methods to be called onCppPin
as well asCppRef
.&my_cpp_pin_wrapped_thing
which is weird given that the whole point is we're trying to avoid Rust references. Obviously, calling.as_cpp_ref()
works too, so this weirdness can be avoided.T
, in Rust, it's annoying to have to decide a-priori whether it'll be Rust or C++ accessing the data. If the former, you just create a newT
; if the latter you need to wrap it inCppPin::new
. This is only really a problem when creating a C++ object on which you'll call methods. It feels like it'll be OK in practice. Possibly this can be resolved by making the method receiver some sort ofimpl MethodReceiver<T>
generic; an implementation forT
could be provided which auto-wraps it into aCppPin
(consuming it at that point). This sounds messy though. A bit more thought required, but even if this isn't possible it doesn't sound like a really serious ergonomics problem, especially if we can use#[diagnostic::on_unimplemented]
somehow to guide.Next steps here:
ValueParam
andRValueParam
as necessary forCppRef
andCppPin
types.&
toCppRef<T>
. If so, this will be a significant breaking change.Fixes #<issue_number_goes_here>