Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include struct layout in callbacks. #3143

Closed
wants to merge 2 commits into from

Conversation

adetaylor
Copy link
Contributor

This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks.

The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods.

In order to make these allocable on the stack, autocxx does need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this:

#[repr(align(16),packed)]
pub struct TheType {
_data: UnsafeCell<MaybeUninit<[u8;4]>> // etc.
} // the actual struct is quite a bit more complex

The extra information provided in the callback within this PR allows that.

For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this.

  1. We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of Pluggable output backends #2943.

  2. That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback.

  3. Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity.

Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows.

Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback.

Part of google/autocxx#124

This PR reports the memory layout information (size, alignment, packed-ness)
for each struct into the ParseCallbacks.

The use-case for this is fairly complex. Postprocessors such as autocxx
currently do not necessarily pass all bindgen's output through: autocxx is
opinionated in deciding what C++ types are "reasonable" to expose to Rust. For
example types containing C++ strings aren't represented to Rust
by-value. Instead, these types are represented as "opaque" types in the cxx
sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is
concerned, these types have no fields and can only be poked at using methods.

In order to make these allocable on the stack, autocxx _does_ need to know
the layout of these structs, so it can make an "opaque" fake struct in lieu
of the real one. That's what is enabled by this callback. autocxx and other
tools can post-process the bindgen output to replace bindgen's generated
type with an opaque data-only struct like this:

  #[repr(align(16),packed)]
  pub struct TheType {
    _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc.
  } // the actual struct is quite a bit more complex

The extra information provided in the callback within this PR allows that.

For completeness, there are multiple ways that the whole stack could
be rearchitected to avoid the need for this.

1. We could teach bindgen all the rules for when a struct should be
   "opaque", then bindgen could generate it that way in the first place.
   This suggests major extensibility of bindgen to allow highly
   opinionated backends, along the lines of rust-lang#2943.

2. That decision could be delegated to the ParseCallbacks with a new
      fn make_struct_opaque(&self, some_struct_info) -> bool
   call. Although this sounds simple, the decisions required here are
   very complex and depend (for instance) upon all the different
   constructors, ability to represent template params, etc., so
   in practice this would require post-processors to run bindgen twice,
   once to gather all that information and then again to give
   truthful answers to that callback.

3. Post-processors such as autocxx could instead generate opaque types
   like this:
     #[repr(transparent)]
     pub struct TheType {
       _hidden_field:
         UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>>
     } // it's more complex in reality
   or similar. This turns out to be very complex because
   TheTypeAsGeneratedByBindgen might include complex STL structures
   such as std::string which autocxx currently instructs bindgen to
   replace with a simpler type, precisely so it can avoid all this
   complexity.

Any of these options are months of work either in bindgen or autocxx
or both, so I'm hoping that it's OK to simply emit the layout
information which bindgen already knows.

Compatibility: this is a further compatibility break to the
DiscoveredItem enum. However, it seems to be accepted that we're going
to grow that enum over time. If the compatibility break is a concern
this information could be reported via a separate callback.

Part of google/autocxx#124
@adetaylor adetaylor mentioned this pull request Feb 20, 2025
8 tasks
@adetaylor
Copy link
Contributor Author

This may not be needed - if we alter autocxx per google/autocxx#1456, which we probably will. Review this one as lowest priority!

@adetaylor
Copy link
Contributor Author

This is not needed after all.

@adetaylor adetaylor closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant