Unfork bindgen #124

adetaylor · 2020-11-24T19:20:37Z

bindgen doesn't pass on quite all the information we need in order to be able to generate autocxx bindings. Options in the future might be to fork bindgen, ask very nicely if they mind us upstreaming patches to add metadata, or switch away from bindgen and use llvm directly.

In any case this bug will be a live tracker of all known cases where we don't have all the information that we require about the underlying C++.

Classes vs structs. Both present as structs in the bindgen output. When we then generate extra C++ we have to plump for one or the other. In some ABIs this will result in a binary incompatibility or failure to compile. Per Classes result in build warnings; build failure on Windows #54.
Overloaded methods vs methods ending in digits. Two methods a(uint32_t) and a(uint8_t) will be output as a and a1. There is no way for us to distinguish that from methods really called a and a1. Worse, this is across all the types in a namespace, so MyStruct::a and MyOtherStruct::a become MyStruct::a and MyOtherStruct::a1.
Nested types. A struct nested inside another struct is not noted in the bindgen output, so we can't generate compatible C++ code - Nested types don't work #115.
Private constructors. Per Types without constructors do not gain a make_unique #122.
Pointers vs references. Per Handle pointers #102.
Virtual functions. Per Support calling virtual functions #195 and Allow calling pure virtual functions #305.
When template params are unused, per Gracefully avoid cases where template parameters are skipped #414 and Skip cases where references to template parameters are skipped #416.
Deleted functions, per Don't generate bindings for deleted functions #426.

The text was updated successfully, but these errors were encountered:

martinboehme · 2021-04-16T10:22:41Z

Classes vs structs. Both present as structs in the bindgen output. When we then generate extra C++ we have to plump for one or the other. In some ABIs this will result in a binary incompatibility or failure to compile.

Not sure I understand -- I always thought a struct was exactly identical to a class, just that it has public visibility by default? And I believe it's legal for forward declarations of a struct to use class and vice versa (see this StackOverflow answer). I'm pretty sure there isn't an ABI difference either. Or am I misunderstanding what you're referring to here?

adetaylor · 2021-04-16T14:14:31Z

This is #54 - let's discuss over there.

martinboehme · 2021-04-16T14:19:28Z

Thanks for the clarification -- interesting!

adetaylor · 2021-04-25T00:31:30Z

For #414 and #102 many types have 'annotations' attached (the RustTy type in our fork of bindgen). Such annotations are currently discarded using an ignore_annotations method. Each time that happens, it's probably a bug. For example this means we would try (and fail) to generate POD struct fields using types with template parameters which bindgen has dropped. Once a plan presents itself here, we should ensure we never drop information like this.

martinboehme · 2021-04-29T04:28:42Z

Note from #426: The Rust function signature for copy constructors and move constructors is the same, so bindgen needs to add an annotation that allows us to distinguish between them.

bsilver8192 · 2022-02-17T09:48:24Z

#799 is also on this list, and I'm pretty sure #815 is too.

adetaylor · 2025-02-11T12:05:19Z

I've renamed this issue to track the hopes of unforking autocxx-bindgen and using plain old bindgen here.

I intend to work on this somewhat over the next couple of weeks to see how far I can get. Even if I can reduce the scale of the fork, that's still beneficial. Right now, the need to roll bindgen is the major ongoing maintenance headache for autocxx, and we need to zap that.

Ways forward:

bindgen's ParseCallbacks now receive quite a bit more information than they used to. Specifically, new_item_found probably enables us to eliminate a few of the bindgen_original_name annotations already, and I hope bindgen will accept PRs to expand the callbacks here — it seems that this callbacks mechanism is exactly intended for Fancy Postprocessing Use-Cases like ours.
A fair amount of the forking is attempting to identify unused template parameters. Perhaps bindgen could add a mode where it, err, doesn't — it just ensures all template parameters are used, if necessary by adding in extra PhantomData. I don't know if this is actually possible but this would greatly increase the percentage of C++ types which can be handled by autocxx. I'm trying this in Attempts to include all type params #1425.
I had a chat with one of the bindgen maintainers, and it resulted in this issue - they were open in principle to the idea of "pluggable output backends". Maybe that's not necessary given the callback mechanism.

This is a major change to autocxx-bindgen which aims to improve maintainability by reducing divergence from upstream bindgen. Specifically, it: * Significantly reduces the textual diffence * Makes the remaining changes less invasive, such that merge conflicts should be easier to resolve * Redoes the changes such that they're more attuned to the current evolution of upstream bindgen, so it may be possible to upstream some or ideally all of these changes. (The ultimate goal is to unfork bindgen!) See google/autocxx#124 and rust-lang#2943 for the background here. Specifically this change: * Removes the #[bindgen_semantic_attributes] which were added to all sorts of items. Instead, * Much more is communicated via the existing ParseCallbacks mechansim. * In some cases, it's still necessary to annotate individual types - in this case we generate a newtype wrapper instead of attributes. This commit also re-enables the bindgen test suite. It does not yet add tests for all the above new functionality; that's yet to come.

autocxx-bindgen has just been changed substantially to have less dramatic forking from bindgen. This PR makes the corresponding changes in autocxx. It's not quite perfect yet (e.g. some O(n^2) behavior) but should be good enough to run tests. Part of #124.

ParseCallbacks previously reported structs but not enums. Enhance it to do so. At the moment, little information is provided about enums - but bindgen doesn't handle (rare) anonymous enums so this seems the right amount of information to report. At the moment, effectively this just provides a mapping between name and DiscoveredItemId. One of a number of PRs I'll be raising for google/autocxx#124. In future PRs I'll be hoping to add further callbacks which report more information based on DiscoveredItemId, so having the DiscoveredItemId for each enum is an important pre-requisite.

These files aspects of bindgen behavior which may not be generally useful to most consumers but are more important to downstream postprocessors such as autocxx. One of them tests enums embedded within classes, and the other tests various types of C++ constructor. Part of google/autocxx#124.

With a new command-line option, this ensures that char16_t is distinct from uint16_t in generated bindings. On some platforms these are distinct types, so it can be important for downstream post processors to spot the difference. See the documentation on the new command-line option for expected behavior and usage here. Part of google/autocxx#124.

No functional change - just deduplicating the logic which calls this callback, which will make it easier to make further changes in future. Part of google/autocxx#124

This adds more information to ParseCallbacks which indicates the location in the original source code at which a given item was found. This has proven to be useful in downstream code generators in providing diagnostics to explain why a given item can't be represented in Rust. (There are lots of reasons why this might not be the case - autocxx has around 100 which can be found here - /~https://github.com/google/autocxx/blob/d85eac76c9b3089d0d86249e857ff0e8c36b988f/engine/src/conversion/convert_error.rs#L39 - but irrespective of the specific reasons, it's useful to be able to point to the original location when emitting diagnostics). Should we make this a new callback or include this information within the existing callback? Pros of making it a new callback: * No compatibility breakage. Pros of including it in this existing callback: * No need to specify and test a policy about whether such callbacks always happen together, or may arrive individually * Easier for recipients (including bindgen's own test suite) to keep track of the source code location received. * Because we add new items to the DiscoveryItem enum anyway, we seem to have accepted it's OK to break compatibility in this callback (for now at least). Therefore I'm adding it as a parameter to the existing callback. If it's deemed acceptable to break compatibility in this way, I will follow the same thought process for some other changes too. Part of google/autocxx#124.

In its default behavior, bindgen does not emit information for any C++ operatorXYZ function (unless operatorXYZ is a valid identifier, which it isn't for C++ overloaded operators). This PR introduces a new command line option to represent all operators. This is not useful for those who directly consume the output of bindgen, but is useful for post-processors. Specifically, consumers will need to implement the callback to rename these items to something more useful. Part of google/autocxx#124

For virtual functions, bindgen has traditionally emitted a void* pointer because that's the least bad option for Rust code directly consuming the bindings. For downstream postprocessors and code generators, this throws away useful information about the actual type of the receiver. Provide a command-line option to put that back. Part of google/autocxx#124

This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks. The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods. In order to make these allocable on the stack, autocxx _does_ need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this: #[repr(align(16),packed)] pub struct TheType { _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc. } // the actual struct is quite a bit more complex The extra information provided in the callback within this PR allows that. For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this. 1. We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of rust-lang#2943. 2. That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback. 3. Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity. Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows. Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback. Part of google/autocxx#124

This makes two complementary improvements to the ParseCallbacks. The first is that Mods are now announced, as a new type of DiscoveredItem. The second is that the parentage of each item is announced. The parent of an item is often a mod (i.e. a C++ namespace) but not necessarily - it might be a struct within a struct, or similar. The reported information here is dependent on two pre-existing bindgen options: * whether to report C++ namespaces at all * whether to report inline namespaces conservatively. For that reason, the test suite gains two new tests. Part of google/autocxx#124

This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. This PR is not yet ready for merge, since we need to merge rust-lang#3139 and then enhance the tests to cover those cases too. Part of google/autocxx#124

This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. A follow up PR should enhance the tests once rust-lang#3139 is merged. Part of google/autocxx#124

Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. Part of google/autocxx#124

Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. An alternative approach here is to provide a mode to bindgen where it *always* uses all template params, by adding additional PhantomData fields to structs where those params are not currently used. This is being prototyped in google/autocxx#1425 but is unlikely to be successful, on the assumption that lots of the templated types can't actually be properly represented by bindgen/Rust, so the current strategy of discarding them is more likely to work in the broad strokes. Part of google/autocxx#124

As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124

Provide an option to add a line to every mod generated by bindgen. Part of google/autocxx#124, though this is only in fact necessary if we make the changes given in google/autocxx#1456.

This uses an existing callback to allow overriding of constructor name. Part of google/autocxx#124, though only necessary if we also do google/autocxx#1456.

This was referenced Jan 5, 2021

building the demo segfaults #168

Closed

Remove nested struct support #169

Merged

adetaylor mentioned this issue Jan 12, 2021

Support calling virtual functions #195

Closed

martinboehme mentioned this issue Apr 23, 2021

Nested types don't work #115

Closed

adetaylor mentioned this issue Apr 25, 2021

Detect &T<param elided by bindgen> #417

Merged

adetaylor mentioned this issue Apr 27, 2021

Don't generate bindings for deleted functions #426

Closed

adetaylor mentioned this issue Jan 8, 2022

Regularize bindgen annotations. #734

Merged

This was referenced Feb 19, 2022

Member initializers are ignored when checking for default constructors #816

Open

Base classes without member variables are ignored #832

Open

adetaylor mentioned this issue Aug 4, 2022

Function with wchar_t generated using std::uint32_t instead of wchar_t #1141

Open

adetaylor mentioned this issue Sep 27, 2024

Pluggable output backends rust-lang/rust-bindgen#2943

Open

adetaylor changed the title ~~Information discarded by bindgen which we need (tracker bug)~~ Unfork bindgen Feb 11, 2025

adetaylor mentioned this issue Feb 14, 2025

Switch from attributes to callbacks. adetaylor/rust-bindgen#12

Merged

adetaylor mentioned this issue Feb 14, 2025

Switch to using new version of autocxx-bindgen #1442

Merged

adetaylor mentioned this issue Feb 16, 2025

Report enums in ParseCallbacks. rust-lang/rust-bindgen#3133

Merged

adetaylor mentioned this issue Feb 16, 2025

Add extra tests. rust-lang/rust-bindgen#3134

Merged

adetaylor mentioned this issue Feb 16, 2025

Distinguish char16_t. rust-lang/rust-bindgen#3135

Merged

adetaylor mentioned this issue Feb 20, 2025

Include source code location in discovery callback rust-lang/rust-bindgen#3141

Open

adetaylor mentioned this issue Feb 20, 2025

Option to represent all C++ operators. rust-lang/rust-bindgen#3142

Open

adetaylor mentioned this issue Feb 20, 2025

MSRV is not accurate #1444

Closed

adetaylor mentioned this issue Feb 20, 2025

Include struct layout in callbacks. rust-lang/rust-bindgen#3143

Closed

adetaylor mentioned this issue Feb 21, 2025

Report on C++ visibility and other attributes rust-lang/rust-bindgen#3146

Draft

adetaylor mentioned this issue Feb 21, 2025

Remove code for unused template param detection. #1446

Merged

adetaylor mentioned this issue Feb 22, 2025

Denote types with unused template params. rust-lang/rust-bindgen#3148

Draft

adetaylor mentioned this issue Feb 24, 2025

Wrap template and opaque types in a marker. rust-lang/rust-bindgen#3151

Draft

adetaylor mentioned this issue Feb 24, 2025

Abstract naming stuff #520

Open

adetaylor added the architectural Architectural improvement label Feb 24, 2025

adetaylor mentioned this issue Feb 24, 2025

noexcept not supported and can mess up subclasses #1435

Open

adetaylor mentioned this issue Feb 26, 2025

Add option for all module raw lines rust-lang/rust-bindgen#3163

Draft

adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 26, 2025

Allow constructor name override.

6828091

This uses an existing callback to allow overriding of constructor name. Part of google/autocxx#124, though only necessary if we also do google/autocxx#1456.

adetaylor mentioned this issue Feb 26, 2025

Allow constructor name override. rust-lang/rust-bindgen#3164

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unfork bindgen #124

Unfork bindgen #124

adetaylor commented Nov 24, 2020 •

edited

Loading

martinboehme commented Apr 16, 2021

adetaylor commented Apr 16, 2021

martinboehme commented Apr 16, 2021

adetaylor commented Apr 25, 2021

martinboehme commented Apr 29, 2021

bsilver8192 commented Feb 17, 2022

adetaylor commented Feb 11, 2025

Unfork bindgen #124

Unfork bindgen #124

Comments

adetaylor commented Nov 24, 2020 • edited Loading

martinboehme commented Apr 16, 2021

adetaylor commented Apr 16, 2021

martinboehme commented Apr 16, 2021

adetaylor commented Apr 25, 2021

martinboehme commented Apr 29, 2021

bsilver8192 commented Feb 17, 2022

adetaylor commented Feb 11, 2025

adetaylor commented Nov 24, 2020 •

edited

Loading