-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfork bindgen #124
Comments
Not sure I understand -- I always thought a struct was exactly identical to a class, just that it has public visibility by default? And I believe it's legal for forward declarations of a |
This is #54 - let's discuss over there. |
Thanks for the clarification -- interesting! |
For #414 and #102 many types have 'annotations' attached (the |
Note from #426: The Rust function signature for copy constructors and move constructors is the same, so bindgen needs to add an annotation that allows us to distinguish between them. |
I've renamed this issue to track the hopes of unforking I intend to work on this somewhat over the next couple of weeks to see how far I can get. Even if I can reduce the scale of the fork, that's still beneficial. Right now, the need to roll bindgen is the major ongoing maintenance headache for autocxx, and we need to zap that. Ways forward:
|
This is a major change to autocxx-bindgen which aims to improve maintainability by reducing divergence from upstream bindgen. Specifically, it: * Significantly reduces the textual diffence * Makes the remaining changes less invasive, such that merge conflicts should be easier to resolve * Redoes the changes such that they're more attuned to the current evolution of upstream bindgen, so it may be possible to upstream some or ideally all of these changes. (The ultimate goal is to unfork bindgen!) See google/autocxx#124 and rust-lang#2943 for the background here. Specifically this change: * Removes the #[bindgen_semantic_attributes] which were added to all sorts of items. Instead, * Much more is communicated via the existing ParseCallbacks mechansim. * In some cases, it's still necessary to annotate individual types - in this case we generate a newtype wrapper instead of attributes. This commit also re-enables the bindgen test suite. It does not yet add tests for all the above new functionality; that's yet to come.
autocxx-bindgen has just been changed substantially to have less dramatic forking from bindgen. This PR makes the corresponding changes in autocxx. It's not quite perfect yet (e.g. some O(n^2) behavior) but should be good enough to run tests. Part of #124.
ParseCallbacks previously reported structs but not enums. Enhance it to do so. At the moment, little information is provided about enums - but bindgen doesn't handle (rare) anonymous enums so this seems the right amount of information to report. At the moment, effectively this just provides a mapping between name and DiscoveredItemId. One of a number of PRs I'll be raising for google/autocxx#124. In future PRs I'll be hoping to add further callbacks which report more information based on DiscoveredItemId, so having the DiscoveredItemId for each enum is an important pre-requisite.
ParseCallbacks previously reported structs but not enums. Enhance it to do so. At the moment, little information is provided about enums - but bindgen doesn't handle (rare) anonymous enums so this seems the right amount of information to report. At the moment, effectively this just provides a mapping between name and DiscoveredItemId. One of a number of PRs I'll be raising for google/autocxx#124. In future PRs I'll be hoping to add further callbacks which report more information based on DiscoveredItemId, so having the DiscoveredItemId for each enum is an important pre-requisite.
These files aspects of bindgen behavior which may not be generally useful to most consumers but are more important to downstream postprocessors such as autocxx. One of them tests enums embedded within classes, and the other tests various types of C++ constructor. Part of google/autocxx#124.
With a new command-line option, this ensures that char16_t is distinct from uint16_t in generated bindings. On some platforms these are distinct types, so it can be important for downstream post processors to spot the difference. See the documentation on the new command-line option for expected behavior and usage here. Part of google/autocxx#124.
No functional change - just deduplicating the logic which calls this callback, which will make it easier to make further changes in future. Part of google/autocxx#124
This adds more information to ParseCallbacks which indicates the location in the original source code at which a given item was found. This has proven to be useful in downstream code generators in providing diagnostics to explain why a given item can't be represented in Rust. (There are lots of reasons why this might not be the case - autocxx has around 100 which can be found here - /~https://github.com/google/autocxx/blob/d85eac76c9b3089d0d86249e857ff0e8c36b988f/engine/src/conversion/convert_error.rs#L39 - but irrespective of the specific reasons, it's useful to be able to point to the original location when emitting diagnostics). Should we make this a new callback or include this information within the existing callback? Pros of making it a new callback: * No compatibility breakage. Pros of including it in this existing callback: * No need to specify and test a policy about whether such callbacks always happen together, or may arrive individually * Easier for recipients (including bindgen's own test suite) to keep track of the source code location received. * Because we add new items to the DiscoveryItem enum anyway, we seem to have accepted it's OK to break compatibility in this callback (for now at least). Therefore I'm adding it as a parameter to the existing callback. If it's deemed acceptable to break compatibility in this way, I will follow the same thought process for some other changes too. Part of google/autocxx#124.
In its default behavior, bindgen does not emit information for any C++ operatorXYZ function (unless operatorXYZ is a valid identifier, which it isn't for C++ overloaded operators). This PR introduces a new command line option to represent all operators. This is not useful for those who directly consume the output of bindgen, but is useful for post-processors. Specifically, consumers will need to implement the callback to rename these items to something more useful. Part of google/autocxx#124
For virtual functions, bindgen has traditionally emitted a void* pointer because that's the least bad option for Rust code directly consuming the bindings. For downstream postprocessors and code generators, this throws away useful information about the actual type of the receiver. Provide a command-line option to put that back. Part of google/autocxx#124
For virtual functions, bindgen has traditionally emitted a void* pointer because that's the least bad option for Rust code directly consuming the bindings. For downstream postprocessors and code generators, this throws away useful information about the actual type of the receiver. Provide a command-line option to put that back. Part of google/autocxx#124
This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks. The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods. In order to make these allocable on the stack, autocxx _does_ need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this: #[repr(align(16),packed)] pub struct TheType { _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc. } // the actual struct is quite a bit more complex The extra information provided in the callback within this PR allows that. For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this. 1. We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of rust-lang#2943. 2. That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback. 3. Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity. Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows. Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback. Part of google/autocxx#124
This makes two complementary improvements to the ParseCallbacks. The first is that Mods are now announced, as a new type of DiscoveredItem. The second is that the parentage of each item is announced. The parent of an item is often a mod (i.e. a C++ namespace) but not necessarily - it might be a struct within a struct, or similar. The reported information here is dependent on two pre-existing bindgen options: * whether to report C++ namespaces at all * whether to report inline namespaces conservatively. For that reason, the test suite gains two new tests. Part of google/autocxx#124
This makes two complementary improvements to the ParseCallbacks. The first is that Mods are now announced, as a new type of DiscoveredItem. The second is that the parentage of each item is announced. The parent of an item is often a mod (i.e. a C++ namespace) but not necessarily - it might be a struct within a struct, or similar. The reported information here is dependent on two pre-existing bindgen options: * whether to report C++ namespaces at all * whether to report inline namespaces conservatively. For that reason, the test suite gains two new tests. Part of google/autocxx#124
This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. This PR is not yet ready for merge, since we need to merge rust-lang#3139 and then enhance the tests to cover those cases too. Part of google/autocxx#124
This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. A follow up PR should enhance the tests once rust-lang#3139 is merged. Part of google/autocxx#124
Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. Part of google/autocxx#124
Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. An alternative approach here is to provide a mode to bindgen where it *always* uses all template params, by adding additional PhantomData fields to structs where those params are not currently used. This is being prototyped in google/autocxx#1425 but is unlikely to be successful, on the assumption that lots of the templated types can't actually be properly represented by bindgen/Rust, so the current strategy of discarding them is more likely to work in the broad strokes. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
Provide an option to add a line to every mod generated by bindgen. Part of google/autocxx#124, though this is only in fact necessary if we make the changes given in google/autocxx#1456.
This uses an existing callback to allow overriding of constructor name. Part of google/autocxx#124, though only necessary if we also do google/autocxx#1456.
bindgen doesn't pass on quite all the information we need in order to be able to generate autocxx bindings. Options in the future might be to fork bindgen, ask very nicely if they mind us upstreaming patches to add metadata, or switch away from bindgen and use llvm directly.
In any case this bug will be a live tracker of all known cases where we don't have all the information that we require about the underlying C++.
struct
s in the bindgen output. When we then generate extra C++ we have to plump for one or the other. In some ABIs this will result in a binary incompatibility or failure to compile. Per Classes result in build warnings; build failure on Windows #54.a(uint32_t)
anda(uint8_t)
will be output asa
anda1
. There is no way for us to distinguish that from methods really calleda
anda1
. Worse, this is across all the types in a namespace, soMyStruct::a
andMyOtherStruct::a
becomeMyStruct::a
andMyOtherStruct::a1
.struct
nested inside anotherstruct
is not noted in the bindgen output, so we can't generate compatible C++ code - Nested types don't work #115.The text was updated successfully, but these errors were encountered: