-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove cmsis module; add acle module #557
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way the registers are extended with traits and used is a bit opinionated. I would like to see how the C code compares to the Rust code for some very simple tasks, like just reading the value of a register.
Typically we've tried to map as closely to C as possible - I am not opposed to something that is stylistically better, but the RFC will have to justify this.
All of this is pretty much "untested", but I think that we should merge this anyways and improve that afterwards - before the RFCs this will need to be tested and verified somehow but I don't want to block that work, nor ecosystem adoption, on any of that.
cc @alexcrichton once this is merged we should try to update stdsimd upstream so that the arm ecosystem can start depending on these. That will catch more bugs than we ever could during review and hopefully serve as a test-bed for these.
ACLE uses strings. For example usage, this is how CMSIS is implemented using ACLE. We could use strings but they are error prone (e.g. what should |
It should, but if it doesn't that should be fixable, it just needs to support |
Ping me when CI is green. |
@gnzlbg CI is now green. |
whitelist some ARM features required for rust-lang/stdarch#557 r? @gnzlbg or @alexcrichton
@alexcrichton this direction is pretty much the consensus of the embedded-wg and this looks good to me. Do you want to / have time to review this? |
Sure, I'll do! Can We keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parched commented here rust-embedded/wg#184 (comment) .
I think the following points should be addressed:
Arm generally prefers cross architecture portable solutions over adding intrinsics. For example, C11 atomics and in rust std::sync::atomic::spin_loop_hint is the portable YIELD. [...] yield isn't needed becuase it's already available as std::sync::atomic::spin_loop_hint.
Is there a "better" implementation of the YIELD
intrinsic (e.g. using llvm intrinsics) than using inline assembly? What does spin_loop_hint
use? If spin_loop_hint
is available on core, maybe we should just call that here, or maybe, once this is merged, it might be worth it to update it to call this intrinsic. I am unsure what to do here.
The barrier and hint intrinsics don't need to be marked unsafe, only the system register access ones need to be unsafe.
We should keep all intrinsics unsafe
for the time being, and if someone wants to relax that for some intrinsics, they can do so in a backwards compatible way in a follow up RFC after the main work is done.
To increase the portability, don't make the barrier argument availability conditional on target, just upgrade to next strongest when not available.
I think this could be built at zero-cost in a library on top of this one and I am unsure on whether we should add this type of portability here.
How about using an enum for the barrier parameter, rather than the specialization of some generic parameter? This will require a match for now as Rust can't yet enforce compile time arguments, but that match should be removed by the compiler when it's inlined. And hopefully be backwards compatible with const arguments in the future. What do you think @gnzlbg?
I think this is a good idea, and that we should make sure that it is possible to experiment with these APIs in a library on top of this one (maybe once this PR is merged those interested could try to build that and expose it on crates.io). It might be more worth it for the ecosystem to use a thin safer more ergonomic layer than these intrinsics directly.
Could wfi and nop be portable functions instead like above? Most architectures have equivalent instructions, and those that don't could just do nothing/compiler barrier.
I think it should also be possible to implement this in a crate on top of this.
Unfortunately rsr/wsr must take a &str parameter to allow for arbitrary "S_x_x_x_x" register names in future. For now, that means a match on the value to make a constant to pass to LLVM, but that should be fine given the limited number of named registers required for this initial M profile support. In future when compile time arguments are supported the match can be removed and all strings allowed.
@japaric what do you think? It might be worth it to just expose here a "stringy"-based API like ACLE does on C, and just expose an idiomatic API on top of it in crates.io.
Implementation of these intrinsics should probably use llvm.{arm,aarch64}.{hint,dmb,dsb.isb} and llvm.{read,write}_register rather than inline assembly as this gives LLVM (potentially) better understanding of what is going on.
@japaric there are examples of using link_llvm_intrinsics
for this through the library (just grep for llvm.
). I think this is worth addressing.
I would just put all the intrinsics in std::arch::arm (gated on target in the
future if needed) rather than duplicating them in std::arch::aarch64.
@parched sadly core
/std
are extremely explicit in that architecture-specific functionality is always behind a module containing the architecture name (arm
/aarch64
). The arm
module is not available when targetting aarch64 (the same applies to x86/x86_64). This can be annoying, but is a std
convention that might be an uphill battle to change at this point, and this can be worker around pretty easily in an external crate.
So from my POV, the bigger issue is that of whether we should expose a stringly typed API like ACLE does, or invent something new. As long as something better can be built on top of it at zero cost, I'd suggest "stick to what C does" here: it will get this merged sooner, the RFC will be smaller, the RFC process will be shorter because there will be less to discuss, etc. We are basically minimizing the risk of being incompatible with future changes / additions to ACLE.
You can check what IR clang lowers these to, in this case it's
Ok, that seems reasonable to me.
Ok, that's fair enough. |
That makes sense. I personally think this intrinsic belongs here, and that spin lock hint should just be upgraded to call it once we land this. I don't know if this intrinsic is only useful for spin lock hint, or if it potentially can have other uses. If its the later, then putting it here makes it more "discoverable". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me yeah! I don't personally know this well enough in the sense that I could provide any sort of API-design feedback or knowledge of the actual spec, but I'm willing to trust the embedded WG!
The first time I got a chance to look at this. May I ask in which place it will be documented how to properly use this ACLE implementation? For example that using dsb() can/has to be combined with the provided shareability domain structs might not be clear until you browse through all the code. Or will we spare this effort here because ideally it's us who wrap this right away into the cortex-m/r/a crates and elaborately document there? |
Meta-comment: The embedded WG has removed "stable core::arch::arm" from the 2018 milestone since we have a good enough stable
What's the motivation? The current approach is extensible and doesn't depend on unimplemented features. Using an enum means that, to account for future additions (e.g. ARMv8.5 gets a new shareability domain), a hidden variant will have to be added to it. Which means that intrinsics that use the enum will have to include a match arm that panics on use of the hidden variant. Which means that unoptimized calls to the intrinsic will include the panicking branch, which could pull in the formatting machinery bloating the binary by a few KB. If generics are so undesirable I would prefer to directly expose this: #[rustc_args_required_const(0)]
pub fn __dmb(val: i32) {
extern "C" {
#[link_name = "llvm.arm.dmb"]
fn __dmb(val: i32);
}
__dmb(val)
} It errors at compile time ("LLVM ERROR: Cannot select: intrinsic %llvm.arm.dmb") when you use an invalid value (e.g. greater than 16).
Could you give examples of these arbitrary registers? I'm aware of registers of the form
Could you elaborate how compile time arguments let us specify arbitrary register names w/o a match? To me the solution would require string interpolation via a macro (see below). macro_rules! __rsr {
($register:expr) => {
{
let r: u32;
asm!(concat!("mrs $0,", $register) : "=r"(r) : : : "volatile");
r
}
}
}
SGTM but the LLVM docs say about llvm.{read,write}_register:
Which mean I can't use those two in this PR.
I would prefer to avoid matches with panicking branches in this API because then anything built on top of this API will inherit them. Sure, the panicking branches will be optimized away in release but unoptimized code will keep them and will be bigger than it needs to be (because of the formatting machinery). If we could get the miri in the compiler to pre-process functions that have the Given that we are not time pressured I would personally prefer to optimize this API to be as slim as inline |
Probably this remark is outdated. I'm sure |
It just seems a bit more intuitive and nicer use to me, like
Yes, that's unfortunate, (
That could be fine, especially if the goal here is to be completely low level without any user friendliness. Although, I would prefer user friendliness if possible.
I mean the likes of implementation defined registers which rustc/LLVM will never be able to know all the names of, e.g., L2CTLR_EL1 on the Cortex-A73 which is
I was thinking by just delegating to |
Previous IR code generate By the way, due to the signature issue, if I try with #[link_name= ..] I get
|
Ah I see, I didn't notice the parameter was metadata. Looks like |
We probably need a
This does not need matches with panicking branches AFAICT, |
The only thing I am still unsure of is whether to use the
AFAICT we would need to handle those names in the intrinsic internals anyways, and if the intrinsic implementation does not support that, then newer register names won't work. I would prefer if we keep the approach that @japaric proposes here, since that would produce a compilation error in this case (e.g. because the register type is not available). This means we would need to add new types to support newer registers, but the implementation has to be upgraded anyways for those to work. |
It is quite the same. The constant string used by |
The AArch64 system registers take the form Ideally the intrinsic would validate they take the form of an architecturally defined name or the pattern above before letting LLVM error but that will likely depend on some detailed const argument support in rust. In C these instrinsics are normally compiler builtins so are not limited to what the language itself can do, could we do that here? |
We can't do this in |
@gnzlbg Just to summarize: basically you are saying to extend |
No, I am suggesting to implement new |
I'll merge this as soon as Travis-CI is green (please ping me if I do not immediately do so, I do not get automatically pinged when that happens). We can iterate on the API issues (string-API supporting all registers vs no-string-APIs supporting only some) in tree. |
Co-Authored-By: japaric <jorge@japaric.io>
Yet another error:
|
I have fixed most tests; now I'm seeing 8 errors like this one with
The instruction we want is there but there's a bunch of other stuff around. Should I increase the instruction limit to make these tests pass? |
Yes, if you have inspected the assembly and it looks reasonable, you can increase the limit for particular instructions here: /~https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/stdsimd-test/src/lib.rs#L135 |
@gnzlbg Travis is green |
also make these available on architectures that don't have a dedicated DMB / DSB / ISB instruction addresses #557 (comment)
…ATURE_DSP addresses #557 (comment)
Thanks a lot everybody! As mentioned, we can continue to iterate on this in tree as we gain experience with it. I've released I'll send a PR upstream to update stdsimd to a version including this ASAP so that one can use it on nightly, but that always takes a while. @japaric and @rust-lang-nursery/embedded-wg : after drafting an RFC for this and achieving consensus within the embedded WG on it, it might be wise to achieve consensus on the draft with some of the interested parties that have shown up in these issues and PRs before submitting a final RFC. |
ACLE (ARM C Language Extensions) is more general (supports ARMv4 to ARMv8) than
CMSIS (ARMv7-M and ARMv7-R)
Previous discussion: rust-embedded/wg#184
TODO in follow up PRs:
Move
src/arm/dsp.rs
intosrc/acle/{dsp,simd32}.rs
. Also the names used inarm::dsp
don't match ACLE -- they should start with__
. cc @paolotetiAdd 64-bit system registers. cc @andre-richter
Pre-requisite: whitelist the aclass, v5te, v6k and v6t2 target features in rustc.
closes #437
r? @gnzlbg