Include struct layout in callbacks. #3143

adetaylor · 2025-02-20T22:35:23Z

This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks.

The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods.

In order to make these allocable on the stack, autocxx does need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this:

#[repr(align(16),packed)]
pub struct TheType {
_data: UnsafeCell<MaybeUninit<[u8;4]>> // etc.
} // the actual struct is quite a bit more complex

The extra information provided in the callback within this PR allows that.

For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this.

We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of Pluggable output backends #2943.
That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback.
Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity.

Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows.

Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback.

Part of google/autocxx#124

This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks. The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods. In order to make these allocable on the stack, autocxx _does_ need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this: #[repr(align(16),packed)] pub struct TheType { _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc. } // the actual struct is quite a bit more complex The extra information provided in the callback within this PR allows that. For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this. 1. We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of rust-lang#2943. 2. That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback. 3. Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity. Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows. Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback. Part of google/autocxx#124

adetaylor · 2025-02-26T19:28:42Z

This may not be needed - if we alter autocxx per google/autocxx#1456, which we probably will. Review this one as lowest priority!

adetaylor · 2025-02-26T22:54:19Z

This is not needed after all.

adetaylor added 2 commits February 20, 2025 22:33

Clippy format.

c49635e

adetaylor mentioned this pull request Feb 20, 2025

Unfork bindgen google/autocxx#124

Open

8 tasks

adetaylor closed this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include struct layout in callbacks. #3143

Include struct layout in callbacks. #3143

adetaylor commented Feb 20, 2025

adetaylor commented Feb 26, 2025

adetaylor commented Feb 26, 2025

Include struct layout in callbacks. #3143

Include struct layout in callbacks. #3143

Conversation

adetaylor commented Feb 20, 2025

adetaylor commented Feb 26, 2025

adetaylor commented Feb 26, 2025