-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow "artifact dependencies" on bin, cdylib, and staticlib crates #3028
Conversation
bd46321
to
b921f08
Compare
Awesome! Additional usecase: building wasm32-wasi executable that is then loaded by an example binary that is a simple demo webserver, like is used in power-cpu-sim (hosted version, in case you like cpu design). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions from me, although its unclear to me what a lib
exactly can be. I guess it is either an rlib
(static with compiler linking) or staticlib
(static without compiler linking) or dylib
(dynamic library).
I am not sure at all, if proc-macro
has a special treatment within cargo or only from rustc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! I feel like a feature along these lines is good to add to Cargo, but I do have some concerns about the motivation and some of the specifics below.
One concern I have is related to depending on binaries. I'm personally not convinced that what's proposed in this RFC is necessary or necessarily something that Cargo needs. One of the major drawback I see of "simply depend on a binary" is that running binaries is often quite involved and nontrivial. Fiddling with Command
, having good error reporting, and dealing with subprocess I/O is not always trivial and if a widely-used package simply provides a binary it means that all consumers are duplicating all this logic for handling the subprocess management. The alternative, however, having a library, provides an author an opportunity to make a much easier to use Rust API, document it, provide examples, etc. An author providing the binary would still be able to build the binary itself in the build script and reference it from the crate that runs the binary.
An additional concern I would have about binaries is that it sort of feels like we're pushing Cargo towards an entire package manager. The example given in the RFC is related to cmake
, but as the maintainer of the cmake
crate I would not want to be responsible for shipping cmake
's source code and building it if it's not already in the environment (or somehow handling "the environment is too old a version from what you requested). In general I can't ever really see myself managing build tools like that, especially for ubiquitous ones like make
, cmake
, clang
, etc.
Overall I feel that the most useful part of this RFC is being able to build multiple artifacts for possibly different targets, but I'm not sure that it's best served with a depencency-level directive rather than a package-level directive (commented more below). Otherwise I can't personally think of a concrete use case where this would be an improvement to a build that's otherwise quite complicated to do today (e.g. would presumably be multiple cargo
invocations driven by some sort of other build system)
|
||
There are many different possible use cases. | ||
|
||
- [Testing the behavior of a binary](/~https://github.com/rust-lang/cargo/issues/4316#issuecomment-361641639). Currently, this requires invoking `cargo build` recursively, or running `cargo build` before running `cargo test`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this RFC is necessary in this case for binaries defined in the local package. All integration test targets implicitly depend on all binary targets in a package today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @Twey might have been talking about using binaries in unit tests. I'll let them describe their use case, I don't know anything other than the linked comment.
There are many different possible use cases. | ||
|
||
- [Testing the behavior of a binary](/~https://github.com/rust-lang/cargo/issues/4316#issuecomment-361641639). Currently, this requires invoking `cargo build` recursively, or running `cargo build` before running `cargo test`. | ||
- [Running a binary that depends on another](/~https://github.com/rust-lang/rustc-perf/tree/master/collector#how-to-benchmark-a-change-on-your-own-machine). Currently, this requires running `cargo build`, making it difficult to keep track of when the binary was rebuilt. The use case for `rustc-perf` is to have a main binary that acts as an 'executor', which executes `rustc` many times, and a smaller 'shim' which wraps `rustc` with additional environment variables and arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this perhaps be expanded a bit? I'm not really sure what this is referring to, because cargo build
builds all the binaries today in a package. Does rustc-perf have different packages with binaries that depend on each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the binaries depend on each other, so you can't use cargo run
because that requires a specific binary. cargo build
does work, but means you have to run the binary as target/release/collector
, so it's easy to forget to recompile after a change.
There are four `type`s available: | ||
1. `"lib"`, the default | ||
2. `"bin"`, a crate building one or more binaries | ||
3. `"cdylib"`, a C-compatible dynamic library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about cdylib/staticlib here in the sense that these often have canonical meanings which Cargo would otherwise not be interpreting. For example if you depend on a lib
Cargo will pass all the right --extern
flags and make sure that it makes its way to the final binary. With a cdylib
and/or staticlib you may want that as well.
For example you might want to consume a crate's C API and want to have that automatically linked into the final format. By only supporting cdylib
and staticlib
here and nothing else about these crate types I'd fear that we'd get into a situation where depending on one of these crate types involves a lot of boilerplate that Cargo could otherwise handle if we took some time to game out what it meant.
Is it possible to game out here what the main use cases are here and see if we can improve the Cargo experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton This isn't intended for the use case where you just want to link your binary (or library) to the produced library. For that use case, I absolutely agree that cargo should provide more integration. This is intended for the use case where the library serves some other function, such as a runtime library for an interpreter (linked into interpreted programs rather than just the interpreter), or a VDSO for a kernel (linked into userspace programs run on that kernel), or a preloaded library for use with LD_PRELOAD
, or a remote debugging stub injected into another program or device. In those cases, you really do just want the compiled artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the problem lies exactly within type="lib"
providing some kind of integration (adding --extern
to rustc
invocations) and type="cdylib"
or type="staticlib"
not doing so. Or vice versa. This difference is not easily inferred by just reading a toml file that contains both or one of the settings.
And yet the options change enough about the build process that anybody who attempts to understand the dependency structure needs to be very aware of the difference in behaviour here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagisa I don't think it's reasonably possible to have "default" handling for cdylib
, because linking a shared library doesn't guarantee it'll be available at runtime.
That said, if we believe there's some value in having non-artifact dependencies on cdylib or staticlib, with automatic handling, we could flag artifact dependencies in some way. For instance, type="artifact:cdylib"
. We could then reserve type="cdylib"
for something else. I don't think that's necessary, but it's a reasonable possibility, and this feature would still be suitable for the intended purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option that comes to mind is to just remove type="lib"
for now – making the default integrated beaviour only happen when the type
is absent; or perhaps make it behave much like other types, in that it would only provide the rlib
as an artifact, without any integrated behaviour.
In that case people who want both the "integrated" behaviour as well as an artifact, they could specify the dependency multiple times, once with a type
and once without.
- `[dependencies]` | ||
- `[dev-dependencies]` | ||
|
||
By default, `build-dependencies` are built for the host, while `dependencies` and `dev-dependencies` are built for the target. You can specify the `target` attribute to build for a specific target, such as `target = "wasm32-wasi"`; a literal `target = "target"` will build for the target even if specifing a build dependency. (If the target is not available, this will result in an error at build time, just as if building the specified crate with a `--target` option for an unavailable target.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One worry I would have about supplying a target
feature is that this sort of attribute is often better placed on the package itself. For example if you're depending on a wasm blob the package that compiles to wasm is unlikely to ever not want to get compiled to wasm, but this means that you'd have to specify the target every single time you depend on it.
I think that this sort of attribute on a dependency would work but it isn't clearly the best solution available to us I think. I think it would be worthwhile to explore the use cases of this and see if they'd be better suited with a dependency-level or package-level target declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting something like [package] default-target = "wasm32-wasi"
? I'd be interested in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One worry I would have about supplying a target feature is that this sort of attribute is often better placed on the package itself. For example if you're depending on a wasm blob the package that compiles to wasm is unlikely to ever not want to get compiled to wasm, but this means that you'd have to specify the target every single time you depend on it.
I disagree: for power-cpu-sim (linked above), it is specifically designed to be able to compile to a command-line program for linux/etc. and the exact same crate is also designed to be compiled for the web using target wasm32-wasi and there is an example web server which runs on linux and requires the produced wasm file to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm not saying that a dependency-level directive isn't useful. Nor does one reason why it is useful mean it's more useful than a package-level directive. My point is that the package-level directive is probably useful in more cases.
@jyn514 yes that's what I'm thinking (subject to bikeshedding of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton A package-level directive for the default also makes sense, but that would be a separate proposal. If a package truly supports only one target, a package-level directive would suffice. The target
attribute on a dependency is intended for cases where the dependency supports multiple targets.
For example, consider a crate that builds virtual machine firmware, and supports multiple architectures. If you're building a virtual machine for a specific architecture, you need firmware for that architecture.
Or, consider a kernel that supports both 64-bit and 32-bit userspace applications, and needs to build different versions of a VDSO for each case.
I also expect target = "target"
on build dependencies to be quite common: "no, I don't want to run this as part of the build process, I want this to be runnable on the target system instead".
I would also expect to see cases like this:
[build-dependencies]
thing32 = { package = "thing", version = "...", target = "i686-unknown-linux-gnu" }
thing64 = { package = "thing", version = "...", target = "x86_64-unknown-linux-gnu" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't personally think of a concrete use case where this would be an improvement to a build that's otherwise quite complicated to do today (e.g. would presumably be multiple cargo invocations driven by some sort of other build system)
The goal here is to avoid having a build system on top of cargo - rustc-perf for instance just asks its users to deal with manually running cargo build && target/release/collector
.
Fiddling with Command, having good error reporting, and dealing with subprocess I/O is not always trivial and if a widely-used package simply provides a binary it means that all consumers are duplicating all this logic for handling the subprocess management. The alternative, however, having a library, provides an author an opportunity to make a much easier to use Rust API, document it, provide examples, etc. An author providing the binary would still be able to build the binary itself in the build script and reference it from the crate that runs the binary.
This requires the author to first be interested in providing a library. My original use case was for running compiletest
in integration tests, which is currently a binary and doesn't have a natural programmatic API. There is /~https://github.com/laumann/compiletest-rs, but that's a fork and the story around it is ... complicated. It would be nice to be able to use compiletest
in-tree without having to come up with a whole API no one will use but rustdoc.
There are many different possible use cases. | ||
|
||
- [Testing the behavior of a binary](/~https://github.com/rust-lang/cargo/issues/4316#issuecomment-361641639). Currently, this requires invoking `cargo build` recursively, or running `cargo build` before running `cargo test`. | ||
- [Running a binary that depends on another](/~https://github.com/rust-lang/rustc-perf/tree/master/collector#how-to-benchmark-a-change-on-your-own-machine). Currently, this requires running `cargo build`, making it difficult to keep track of when the binary was rebuilt. The use case for `rustc-perf` is to have a main binary that acts as an 'executor', which executes `rustc` many times, and a smaller 'shim' which wraps `rustc` with additional environment variables and arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the binaries depend on each other, so you can't use cargo run
because that requires a specific binary. cargo build
does work, but means you have to run the binary as target/release/collector
, so it's easy to forget to recompile after a change.
|
||
There are many different possible use cases. | ||
|
||
- [Testing the behavior of a binary](/~https://github.com/rust-lang/cargo/issues/4316#issuecomment-361641639). Currently, this requires invoking `cargo build` recursively, or running `cargo build` before running `cargo test`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @Twey might have been talking about using binaries in unit tests. I'll let them describe their use case, I don't know anything other than the linked comment.
- `[dependencies]` | ||
- `[dev-dependencies]` | ||
|
||
By default, `build-dependencies` are built for the host, while `dependencies` and `dev-dependencies` are built for the target. You can specify the `target` attribute to build for a specific target, such as `target = "wasm32-wasi"`; a literal `target = "target"` will build for the target even if specifing a build dependency. (If the target is not available, this will result in an error at build time, just as if building the specified crate with a `--target` option for an unavailable target.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting something like [package] default-target = "wasm32-wasi"
? I'd be interested in that.
@alexcrichton, I responded inline to several of your comments, but I also wanted to respond to your top-level comment:
Some crates may want to wrap the usage of the binary in an API, but that isn't always the case. That assumes your primary use case is to invoke the binary in some standardized way from Rust code via
We can certainly change the example to a more hypothetical build tool, if you'd prefer. I'd like to talk to you at more length about the idea of building cmake from source, though. (I'd expect to put that in something like a For something like This would be even more helpful for less ubiquitous build tools that not everyone already has, particularly in environments where installing tools is not trivial.
Several of the use cases proposed in the RFC are specifically cases where today someone would have to drive Cargo with some other build system, and this RFC would allow just using |
That's exactly what I need for power-cpu-sim, and I have to take up half the readme with instructions for how to install clang-11 using LLVM's APT repos. Replacing that all with just cargo build or maybe rustup component add would be awesome! |
I submitted a PR at jyn514#3 to expand the alternatives section. (I don't think the alternative is something we should consider, but it's worth documenting.) |
I think that's a really good point about a binary-calling-a-binary as well as something like One question I would have though is how is it expected to hook up a foreign binary into Cargo? For example it looks like the env vars specified to downstream crates are based on Cargo-based- (also FWIW I don't mind I do feel like this is a bit pie-in-the-sky for tools like cmake/clang/gcc though given that their build times is so long and ubiquity is so high. For smaller binaries and/or less-widely-known ones, however, I could imagine this being quite useful.
I think this makes sense as motivation, but after reading the RFC I was left with the impression that this is just a small feature in Cargo which may not be able to supplant other wrapper build systems. That being said this seems like a necessary step in any case. I'm sure this will unlock some use cases but it seems like it will only make progress on others (which is of course fine!). |
One concern about this RFC with targets I'm realizing now is that "it's just |
Those are targets that would already be built anyway, right? Like if cargo is building from host to target, it needs both standard libraries anyway (one for build scripts and one for the library being cross-compiled). In that case I don't think that needs to be addressed in the RFC - the only time this would break is when |
My point is that one of the major motivations it seems for this RFC is "let's get more projects simply doing |
With the new rust-toolchain format it is possible to specify which targets should be installed automatically. |
I would like to add my use-case that makes this desirable, in case it helps folks working in it: OverviewI have a workspace project with multiple packages. The two important ones are:
Current SituationIn
The build compiles multiple packages at the same time in parallel:
and the Since the build continues and eventually finishes building Desired StateHave the |
Hi,
|
FWIW I would be ok with removing the target-specific parts of the RFC, or limiting it only to the host or |
I think keeping target-specific parts is a major benefit of this rfc. |
@demurgos that's actually an example I'm worried about with this RFC, actually. It seems like it should work for wasm-bindgen but this would not be usable for wasm-bindgen. This RFC isn't intended for postprocessing artifacts, but only processing preexisting artifacts with external tools into the rest of the build. Nothing here would help wasm-bindgen's experience, I believe. @jyn514 I'm not saying that the target-specific bits should be removed, what I'm trying to do is point out that this is sort of a half-design in a lot of ways. I think it's solving real problems but it's leaving a lot of other closely related problems unsolved. That's possible and reasonable to do since you don't want to solve anything at once, but at the same time I think it's important to acknowledge what's explicitly not being solved and what pain points will continue to be. |
Hey, just wanted to add a use case for artifact dependencies. In This isn't really ideal since there have been a number of issues with calling cargo inside of cargo (from it easily creating directories that are beyond MAX_PATH in Windows and failing to build, to environment variables from the top level cargo build affecting the shader build). It would be so much nicer and simpler if you could say |
There are four `type`s available: | ||
1. `"lib"`, the default | ||
2. `"bin"`, a crate building one or more binaries | ||
3. `"cdylib"`, a C-compatible dynamic library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the problem lies exactly within type="lib"
providing some kind of integration (adding --extern
to rustc
invocations) and type="cdylib"
or type="staticlib"
not doing so. Or vice versa. This difference is not easily inferred by just reading a toml file that contains both or one of the settings.
And yet the options change enough about the build process that anybody who attempts to understand the dependency structure needs to be very aware of the difference in behaviour here.
@alexcrichton wrote:
With the RFC as specified, if you want to produce and depend on a bin artifact, it needs to be a Cargo Perhaps something like this? [[bin]]
name = "foo"
manual = true A Thoughts?
It's a tradeoff, just like any vendoring is a tradeoff. Many Also, if we have the above mechanism for supplying binaries via
As @bjorn3 observed, that's supported in rustup today, by specifying a
One day I hope we have a way to run Rust code post-build to generate or post-process artifacts. I want to avoid rolling too much into this RFC, though, and I'd expect post-build code to invite more controversy. In the meantime, though, if we add a mechanism for supplying binaries, people could technically handle this by building a workspace with two crates, one building the un-preprocessed binary, and the other depending on that and wasm-bindgen and emitting a processed binary. That's not ideal, but it works. @jyn514 wrote:
Given the ability to enable those targets via a rust-toolchain file, and the value of this for cases like wasm, I'd like to keep this in the RFC. @XAMPPRocky wrote:
That's an awesome example! I've added it to the examples in the RFC. |
Rather than omitting it only for cdylib and staticlib, omit it whenever the name matches the crate name.
Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Huzzah! The @rust-lang/cargo team has decided to accept this RFC. To track further discussion, subscribe to the tracking issue here: |
A number of projects, such as /~https://github.com/google/cargo-raze consume the output of `cargo metadata` when generating input to other build systems, such as Bazel and Buck. Typically, these projects use a `Cargo.toml` file as the input to drive their work, and it can be useful to express to these systems that a binary is required as part of the build. If such a dependency happens to have a lib target, sufficient dependency information happens to be exposed via the resolve field to recreate the dependency structure adequately. However, for binary-only targets, that entire branch of the dependency tree is cut off, making it hard to recreate this information. This PR adds an option to `cargo metadata` to allow rendering these subtrees, describing these deps as of kind "binary". This isn't really a concept in cargo-proper, but may become one via rust-lang/rfcs#3168 and/or rust-lang/rfcs#3028 - this output format is forwards-compatible with these proposals. In an RFC 3028 world, binary deps would appear as `dep_kind` "binary", and `cdylib` or `staticlib` deps would appear as new `DepKind` variants. We'd probably add a new value of the flag, `declared`, and set that to be the default. We may also want to deprecate the `include-if-no-library-dep` value, and make these ignored binary deps a hard error (replacing the existing warning log). In an RFC 3168 world, there's an interesting metadata question to have - there are (at least) three reasonable options in terms of handling metadata: 1. Require a direct dependency on any package whose binary deps are being used from the crate which uses them - this gives a nicely restricted search space, and allows for nicely curated metadata output. 2. Allow any transitive package dependencies to be used as binary deps - this may significantly expand the output produced, but would model the implementation of the feature. 3. Require explicitly tagging binary deps as being used as binary deps - this looks a lot like RFC 3028, and would tend in that direction.
Rendered
This RFC was co-authored by @jyn514 and @joshtriplett.
See rust-lang/cargo#4316 and rust-lang/cargo#7804 for previous discussions of this. Also discussed on Zulip.
Some motivating use cases (taken from the RFC):