From 8ca326a05373b668c5918392e2ec6f0707c07964 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Sun, 4 Feb 2018 11:10:05 -0800
Subject: [PATCH 01/10] Stable SIMD in Rust

The purpose of this RFC is to provide a framework for SIMD to be used on stable
Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
scaffolding for other platforms as well as a future portable SIMD design (to be
fleshed out in another RFC).
---
 text/0000-stable-simd.md | 718 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 718 insertions(+)
 create mode 100644 text/0000-stable-simd.md

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
new file mode 100644
index 00000000000..9f8a3f0afd9
--- /dev/null
+++ b/text/0000-stable-simd.md
@@ -0,0 +1,718 @@
+- Feature Name: `vendor_intrinsics`
+- Start Date: 2018-02-04
+- RFC PR: (leave this empty)
+- Rust Issue: (leave this empty)
+
+# Summary
+[summary]: #summary
+
+The purpose of this RFC is to provide a framework for SIMD to be used on stable
+Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
+scaffolding for other platforms as well as a future portable SIMD design (to be
+fleshed out in another RFC).
+
+# Motivation
+[motivation]: #motivation
+
+Stable Rust today does not typically expose all of the capabilities of the
+platform that you're running on. A notable gap in Rust's support includes SIMD
+(single instruction multiple data) support. For example on x86 you don't
+currently have explicit access to the 128, 256, and 512 bit registers on the
+CPU. LLVM is in general an excellent optimizing compiler and often attempts to
+make use of these registers (auto vectorizing code), but it unfortunately is
+still somewhat limited and doesn't express the full power of the various SIMD
+intrinsics.
+
+The goal of this RFC is to enable using SIMD intrinsics on stable Rust, and in
+general provide a means to access the architecture-specific functionality of
+each vendor. For example the AES intrinsics on x86 would also be made available
+through this RFC, not only the SIMD-related AVX intrinsics.
+
+Note that this is certainly not the first discussion to broach the topic of SIMD
+in Rust, but rather this has been an ongoing discussion for quite some time now!
+For example the [simd crate][simd-crate] started [long ago][simd-start], we've
+had [rfcs][simd-rfc], we've had a [lot][i1] of [discussions][i2] on internals,
+and the [stdsimd] crate has been implemented.
+
+This RFC draws from much of the historical feedback and design that we've done
+around SIMD in Rust and is targeted at providing path forward for using SIMD on
+stable Rust while allowing the compiler to change in the future and retain a
+stable interface.
+
+[simd-rfc]: /~https://github.com/rust-lang/rfcs/pull/1199
+[simd-crate]: /~https://github.com/rust-lang-nursery/simd
+[simd-start]: http://huonw.github.io/blog/2015/08/simd-in-rust/
+[stdsimd]: /~https://github.com/rust-lang-nursery/stdsimd
+[i1]: https://internals.rust-lang.org/t/getting-explicit-simd-on-stable-rust/4380
+[i2]: https://internals.rust-lang.org/t/whats-the-next-step-towards-the-stabilization-of-simd/5867
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+Let's say you've just heard about this fancy feature called "auto vectorization"
+in LLVM and you want to take advantage of it. For example you've got a function
+like this you'd like to make faster:
+
+```rust
+pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
+    for ((a, b), c) in a.iter().zip(b).zip(c) {
+        *c = *a + *b;
+    }
+}
+```
+
+When [inspecting the assembly][asm1] you notice that rustc is making use of the
+`%xmmN` registers which you've read is related to SSE on your CPU. You know,
+however, that your CPU supports up to AVX2 which has bigger registers, so you'd
+like to get access to them!
+
+Your first solution to this problem is to compile with `-C
+target-feature=+avx2`, and after that you see the `%ymmN` registers being used,
+yay! Unfortunately though you're publishing this binary on CPUs which may not
+actually have AVX2 as a feature, so you don't want to enable AVX2 for the entire
+program. Instead what you can do is enable it for just this function:
+
+```rust
+#[target_feature(enable = "avx2")]
+pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
+    for ((a, b), c) in a.iter().zip(b).zip(c) {
+        *c = *a + *b;
+    }
+}
+```
+
+And [sure enough][asm2] you see the `%ymmN` registers getting used in this
+function! Note, however, that because you've explicitly enabled a feature you're
+required to declare the function as `unsafe`, as specified in [RFC
+2045][rfc2045] (although this requirement is likely to be relaxed in [RFC
+2212][rfc2212]). This worked as a proof of concept but what you still need to do
+is dispatch at runtime whether the local CPU that you're running on supports
+AVX2 or not. Thankfully, though, libstd has a handy macro for this!
+
+[rfc2212]: /~https://github.com/rust-lang/rfcs/pull/2212
+
+```rust
+pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
+    // Note that this `unsafe` block is safe because we're testing
+    // that the `avx2` feature is indeed available on our CPU.
+    if cfg_feature_enabled!("avx2") {
+        unsafe { foo_avx2(a, b, c) }
+    } else {
+        foo_fallback(a, b, c)
+    }
+}
+
+#[target_feature(enable = "avx2")]
+unsafe fn foo_avx2(a: &[u8], b: &[u8], c: &mut [u8]) {
+    foo_fallback(a, b, c) // this will inline the function below
+}
+
+unsafe fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
+    for ((a, b), c) in a.iter().zip(b).zip(c) {
+        *c = *a + *b;
+    }
+}
+```
+
+And [sure enough once again][asm3] we see that `foo` is dispatching at runtime
+to the appropriate function, and only `foo_avx2` is using our `%ymmN` registers!
+
+[asm1]: https://play.rust-lang.org/?gist=36b253cd70840ea2ce6aad90418ec58b&version=nightly&mode=release
+[asm2]: https://play.rust-lang.org/?gist=a31bdd3ce2b9a60e3317ccafb0133490&version=nightly&mode=release
+[asm3]: https://play.rust-lang.org/?gist=111fe75d944aaec409203a3b72e7d742&version=undefined
+[rfc2045]: /~https://github.com/rust-lang/rfcs/blob/master/text/2045-target-feature.md
+
+Ok great! At this point we've seen how to enable CPU features for
+functions-at-a-time as well as how they could be used in a larger context to do
+runtime dispatch to the most appropriate implementation. As we saw in the
+motivation, however, we're just relying on LLVM to auto-vectorize here which
+often isn't good enough or otherwise doesn't expose the functionality we want.
+
+For **explicit and guaranteed simd** on stable Rust you'll be using a new module
+in the standard library, `std::vendor`. The `std::vendor` module is defined by
+vendors, not us actually! For example Intel [publishes a list of
+intrinsics][intel-intr] as does [ARM][arm-intr]. These exact functions and their
+signatures will be available in `std::vendor` with types translated to Rust
+(e.g. `int32_t` becomes `i32`). Vendor specific types like `__m128i` on Intel
+will also live in `std::vendor`.
+
+For example let's say that we're writing a function that encodes a `&[u8]` in
+ascii hex and we want to convert `&[1, 2]` to `"0102"`. The [stdsimd]
+crate currently has this [as an example][hex-example], and let's take a look at
+a few snippets from that.
+
+First up you'll see the dispatch routine like we wrote above:
+
+```rust
+fn hex_encode<'a>(src: &[u8], dst: &'a mut [u8]) -> Result<&'a str, usize> {
+    let len = src.len().checked_mul(2).unwrap();
+    if dst.len() < len {
+        return Err(len);
+    }
+
+    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+    {
+        if cfg_feature_enabled!("avx2") {
+            return unsafe { hex_encode_avx2(src, dst) };
+        }
+        if cfg_feature_enabled!("sse4.1") {
+            return unsafe { hex_encode_sse41(src, dst) };
+        }
+    }
+
+    hex_encode_fallback(src, dst)
+}
+```
+
+Here we have some routine business about hex encoding in general, and then for
+x86/x86\_64 platforms we have optimized versions specifically for avx2 and
+sse41. Using the `cfg_feature_enabled!` macro in libstd we saw above we'll
+dispatch to the correct one at runtime.
+
+Taking a closer look at [`hex_encode_sse41`] we see that it starts out with a
+bunch of weird looking function calls:
+
+```rust
+let ascii_zero = _mm_set1_epi8(b'0' as i8);
+let nines = _mm_set1_epi8(9);
+let ascii_a = _mm_set1_epi8((b'a' - 9 - 1) as i8);
+let and4bits = _mm_set1_epi8(0xf);
+```
+
+As it turns out though, these are all Intel SIMD intrinsics! For example
+[`_mm_set1_epi8`] is defined as creating an instance of `__m128i`, a 128-bit
+integer register. The intrinsic specificall sets all bytes to the first
+argument.
+
+These functions are all imported through `std::vendor::*` at the top of the
+example (in this case `stdsimd::vendor::*`). We go on to use a bunch of these
+intrinsics throughout the `hex_encode_sse41` function to actually do the hex
+encoding.
+
+The example listed currently has some tests/benchmarks as well, and if we run
+the benchmarks we'll see:
+
+```
+test benches::large_default    ... bench:      73,432 ns/iter (+/- 12,526) = 14279 MB/s
+test benches::large_fallback   ... bench:   1,711,030 ns/iter (+/- 286,642) = 612 MB/s
+test benches::small_default    ... bench:          30 ns/iter (+/- 18) = 3900 MB/s
+test benches::small_fallback   ... bench:         204 ns/iter (+/- 74) = 573 MB/s
+test benches::x86::large_avx2  ... bench:      69,742 ns/iter (+/- 9,157) = 15035 MB/s
+test benches::x86::large_sse41 ... bench:     108,463 ns/iter (+/- 70,250) = 9667 MB/s
+test benches::x86::small_avx2  ... bench:          25 ns/iter (+/- 8) = 4680 MB/s
+test benches::x86::small_sse41 ... bench:          25 ns/iter (+/- 14) = 4680 MB/s
+```
+
+Or in other words, our runtime dispatch implementation ("default") is **20 times
+faster** than the fallback implementation (no explicit SIMD). Furthermore the
+AVX2 implementation is nearly 2x faster than the SSE4.1 implementation for large
+inputs, and the SSE4.1 implementation is over 10x faster than the default
+fallback as well.
+
+With `std::vendor` and `cfg_feature_enabled!` we've now written a program that's
+20x faster on supported hardware, yet it also continues to run on older hardware
+as well! Not bad for a few dozen lines on each function!
+
+[intel-intr]: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#
+[arm-intr]: https://developer.arm.com/technologies/neon/intrinsics
+[hex-example]: /~https://github.com/rust-lang-nursery/stdsimd/blob/ee046e0419e4d5e8f742b138313eeefd603326b5/examples/hex.rs
+[`hex_encode_sse41`]: /~https://github.com/rust-lang-nursery/stdsimd/blob/ee046e0419e4d5e8f742b138313eeefd603326b5/examples/hex.rs#L114-L160
+[`_mm_set1_epi8`]: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_set1_epi8&expand=4669
+
+---
+
+Note that this RFC is explicitly not attempting to stabilize/design a set of
+"portable simd operations". The contents of `std::vendor` are platform specific
+and provide no guarantees about portability. Efforts in the past, however, such
+as with [simd.js] and the [simd crate][simd-crate] show that it's desirable and
+useful to have a set of types which are usable across platforms.
+
+Furthermore LLVM does quite a good job with a portable `u32x4` type, for
+example, in terms of platform support and speed on platforms that support it.
+This RFC is not going to go too much into the details about these types, but
+rather these guidelines will still hold:
+
+* The vendor intrinsics **will not** take portable types as arguments. For
+  example `u32x4` and `__m128i` will be different types on x86. The two types,
+  however, will be convertible between one another (either via transmutes or via
+  explicit functions). This conversion will have zero run-time cost.
+* The portable simd types will likely live in a module like `std::simd` rather
+  than `std::vendor`.
+
+The design around these portable types are ongoing, however, and stay tuned for
+an RFC for the `std::simd` module!
+
+[simd.js]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SIMD
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Stable SIMD in Rust ends up requiring a surprising number of both language and
+library features to be productive. Thankfully, though, there's has been quite a
+bit of experimentation over time with SIMD in Rust and we've gotten a lot of
+good experience along the way! In this section, though, we'll be going into the
+various features in detail.
+
+## The `#[target_feature]` Attribute
+
+The `#[target_feature]` attribute was specified in [RFC 2045][rfc2045] and
+remains unchanged from that specification. As a quick recap it allows you to add
+this attribute to functions:
+
+```rust
+#[target_feature(enable = "avx2")]
+```
+
+The only currently allowed key is `enable` (one day we may allow `disable`). The
+string values accepted by `enable` will be separately stabilized but are likely
+to be guided by vendor definitions. For example in Intel's [intrinsic
+guide][intel-intr] it lists functions under "AVX2", so we're likely to stabilize
+the name `avx2` for Rust.
+
+There's a good number of these features supported by the compiler today. It's
+expected that when stabilizing other pieces of this RFC the names of the
+following existing features for x86 will be stabilized:
+
+* `aes`
+* `avx2`
+* `avx`
+* `bmi2`
+* `bmi` - to be renamed to `bmi1`, the name Intel gives it
+* `fma`
+* `fxsr`
+* `lzcnt`
+* `popcnt`
+* `rdrnd`
+* `rdseed`
+* `sse2`
+* `sse3`
+* `sse4.1`
+* `sse4.2`
+* `sse`
+* `ssse3`
+* `xsave`
+* `xsavec`
+* `xsaveopt`
+* `xsaves`
+
+Note that AVX-512 names are missing from this list, but that's because we
+haven't implemented any AVX-512 intrinsics yet. Those'll get stabilized on their
+own once implemented. Additionally note that `mmx` is missing from this list.
+For reasons discussed later, it's proposed that MMX types are omitted from the
+first pass of stabilization. AMD also has some specific features supported
+(`sse4a`, `tbm`), and so do ARM, MIPS, and PowerPC, but none of these feature
+names a proposed for becoming stable in the first pass.
+
+## The `target_feature` value in `#[cfg]`
+
+In addition to enabling target features for a function the compiler will also
+allow statically testing whether a particular target feature is enabled. This
+corresponds to the `cfg_target_feature` feature today in rustc, and can be seen
+via:
+
+```rust
+#[cfg(target_feature = "avx)]
+fn foo() {
+    // implementation that can use `avx`
+}
+
+#[cfg(not(target_feature = "avx))]
+fn foo() {
+    // a fallback implementation
+}
+```
+
+Additionally this is also made available to `cfg!`:
+
+```rust
+if cfg!(target_feature = "avx") {
+    println!("this program was compiled with AVX support");
+}
+```
+
+The `#[cfg]` attribute and `cfg!` macro statically resolve and **do not do
+runtime dispatch**. Tweaking these functions is currently done via the `-C
+target-feature` flag to the compiler. This flag to the compiler accepts a
+similar set of strings to the ones specified above and is already "stable".
+
+## The `cfg_feature_enabled!` Macro
+
+One mode of operation with vendor intrinsics is to compile *part* of a program
+with certain CPU features enabled but not the entire program. This way a
+portable program can be compiled which runs across a broad range of hardware
+which can still benefit from optimized implementations for particular hardware
+at runtime.
+
+The crux of this support in libstd is this macro provided by libstd,
+`cfg_feature_enabled!`. The macro will accept one argument, a string literal.
+The string can be any feature passed to `#[target_feature(enable = ...)]` for
+the platform you're compiling for. Finally, the macro will resolve to a `bool`
+result.
+
+For example on x86 you could write:
+
+```rust
+if cfg_feature_enabled!("sse4.1") {
+    println!("this cpu has sse4.1 features enabled!");
+}
+```
+
+It would, however, be an error to write this on x86 cpus:
+
+```rust
+cfg_feature_enabled!("neon"); //~ ERROR: neon is an ARM feature, not x86
+cfg_feature_enabled!("foo"); //~ ERROR: unknown target feature for x86
+```
+
+The macro is intended to be implemented in the `std` crate (**not** `core`) and
+made available via the normal macro preludes. The implementation of this macro
+is expected to be what [`stdsimd`][stdsimd] does today, notably:
+
+* The first time the macro is invoked all the local CPU features will be
+  detected.
+* The detected features will then be cached globally (currently in a bitset) for
+  the rest of the execution of the program.
+* Further invocations of `cfg_feature_enabled!` are expected to be cheap runtime
+  dispatches. (aka load a value and check whether a bit is set)
+* Exception: in some cases the result of the macro is statically known: for
+  example, `cfg_feature_enabled!("sse2")` when the binary is being compiled
+  with "sse42" globally. In these cases, none of the steps above are
+  performed and the macro just expands to `true`.
+
+The exact method of CPU feature detection various by platform, OS, and
+architecture. For example on x86 we make heavy use of the `cpuid` instruction
+whereas on ARM the implementation currently uses getauxval/`/proc` mounted
+information on Linux. It's expected that the detection will vary for each
+particular target, as necessary.
+
+Note that the implementation details of the macro today prevent it from being
+located in libcore. If getauxval or `/proc` is used that requires libc to be
+available or `File` in one form or another. These concerns are currently
+std-only (not available in libcore). This is also a conservative route for x86
+where it is possible to do CPU feature detection in libcore (as it's just the
+`cpuid` instruction), but for consistency across platforms the macro will only
+be available in libstd for now. This placement can of course be relaxed in the
+future if necessary.
+
+## The `std::vendor` Module
+
+This is where the real meat is. A new module will be added to the standard
+library, `std::vendor`. This module will also be available in `core::vendor`
+(and `std` will simply reexport it). The contents of this module provide no
+portability guarantees (like `std::os` and unlike the rest of `std`). APIs
+present on one platform may not be present on another.
+
+The contents of the `vendor` modules are defined by, well, vendors! For example
+Intel has an [intrinsics guide][intr-guide] which will serve as a guideline for
+all contents in the `vendor` module itself. The standard library will not
+deviate in naming or type signature of any intrinsic defined by a vendor.
+
+For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and
+256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel
+says. Note that all intrinsics will also be `unsafe`, according to [RFC 2045].
+
+Function signatures defined by vendors are typically defined in terms of C
+types. In Rust, however, those aren't always available! Instead the intrinsics
+will be defined in terms of Rust-specific types. Some types are easily
+translated such as `int32_t`, but otherwise a different mapping may be applied
+per-vendor.
+
+The current proposed mapping for x86 intrinsics is:
+
+| What Intel says | Rust Type |
+|-----------------|-----------|
+| `void*`         | `*mut u8` |
+| `char`          | `i8`      |
+| `short`         | `i16`     |
+| `int`           | `i32`     |
+| `long long`     | `i64`     |
+| `const int`     | `i32` [0] |
+
+[0] required to be compile-time constants.
+
+Other than these exceptions the x86 intrinsics will be defined exactly as Intel
+defines them. This will necessitate new types in the `std::vendor` modules for
+SIMD registers! For example these new types will all be present in `std::vendor`
+on x86 platforms:
+
+* `__m128`
+* `__m128d`
+* `__m128i`
+* `__m256`
+* `__m256d`
+* `__m256i`
+
+(note that AVX-512 types will come in the future!)
+
+Infrastructure-wise the contents of `std::vendor` are expected to continue to be
+defined in the [`stdsimd` crate/repository][stdsimd]. Intrinsics defined here go
+through a rigorous test suite involving automatic verification against the
+upstream vendor defintion, verification that the correct instruction is
+generated by LLVM, and at least one runtime test for each intrinsic to ensure it
+not only compiles but also produces correct results. It's expected that
+stabilized intrinsics will meet these critera to the best of their ability.
+
+Currently today on x86 and ARM platforms the stdsimd crate performs all these
+checks, but these checks are not yet implemented for PowerPC/MIPS/etc, but
+that's always just some more work to do!
+
+It's not expected that the contents of `std::vendor` will remain static for all
+time. Rather intrinsics will continue to be implemented in `stdsimd` and make
+their way into the main Rust repository. For example there are not currently any
+implemented AVX-512 intrinsics, but that doesn't mean we won't implement them!
+Rather once implemented they'll be stabilized and included in libstd following
+the Rust release model.
+
+## The types in `std::vendor`
+
+It's worth paying close attention to the types in `std::vendor`. Types like
+`__m128i` are intended to represent a 128-bit packed SIMD register on x86, but
+there's nothing stopping you from using types like `Option<__m128i>` in your
+program!  Most generic containers and such probably aren't written with packed
+SIMD types in mind, and it'd be a bummer if everything stopped working once you
+used a packed SIMD type in one of them.
+
+Instead it will be required that the types defined in `std::vendor` do indeed
+work when used in "nonstandard" contexts. For example `Option<__m128i>` should
+never produce a compiler error or a codegen error when used (it may just be
+slower than you expect). This requires special care to be taken both in
+representation of these arguments as well as their ABI.
+
+Implementation-wise these packed SIMD types are implemented in terms of a
+"portable" vector in LLVM. LLVM as a results gets most of this logic correct for
+us in terms of having these compile without errors in many contexts. The ABI,
+however, had to be special cased as it was a location where LLVM didn't always
+help us.
+
+The Rust ABI will currently be implemented such that all related packed-SIMD
+types are passed via *memory* instead of by-value. This means that regardless of
+the target features enabled for a function everything should agree on how packed
+SIMD arguments are passed across boundaries and whatnot.
+
+Again though, note that this section is largely an implementation detail of SIMD
+in Rust today, though it's enabling the usage Effortsfectively without a lot of
+codegen errors popping up all over the place.
+
+## Intrinsics in `std::vendor` and constant arguments
+
+There are a number of intrinsics on x86 (and other) platforms that require their
+arguments to be constants rather than decided at runtime. For example
+[`_mm_insert_pi16`] requires its third argument to be a constant value where
+only the lowest two bits are used. The Rust type system, however, does not
+currently have a stable way of expressing this information.
+
+Eventually we will likely have some form of `const` arguments or `const`
+machinery to guarantee that these functions are called and monomorphized with
+constant arguments, but for now this RFC proposes taking a more conservative
+route forward. Instead we'll, for the time being, forbid the functions from
+being invoked with non-constant arguments. Prototyped in [#48018][const-pr] the
+`stdsimd` crate will have an unstable attribute where the compiler can help
+provide this guarantee.
+
+[const-pr]: /~https://github.com/rust-lang/rust/pull/48018
+
+It's hoped that this restriction will allow `stdsimd` to be forward compatible
+with a future const-powered world of Rust but in the meantime not otherwise
+block stabilization of these intrinsics.
+
+## Portable packed SIMD
+
+So-called "portable" packed SIMD types are currently implemented in both the
+[stdsimd] and [simd][simd-crate] crates. These types look like `u8x16` and
+explicitly specify how many lanes they have (16 in this case) and what type each
+line is (`u8` in this case). These types are intended to unconditionally
+available (like the rest of libstd) and simply optimized much more aggressively
+on platforms that have native support for the various operations.
+
+For example `u8x16::add` may be implemented differently on i586 vs i686, and
+also entirely differently implemented on ARM. The idea with portable packed SIMD
+types is that they represent a broad intersection of fast behavior across a
+broad range of platforms.
+
+It's intended that this RFC neither includes nor rules out the addition of
+portable packed-SIMD types in Rust. It's expected that an upcoming RFC will
+propose the addition of these types in a `std::simd` module. These types will be
+orthogonal to scalable-vector types which are expected to be proposed in
+another, also different, RFC.  What this RFC does do, however, is explicitly
+specify that:
+
+* The portable SIMD types (both packed and scalable) will not be used in vendor
+  intrinsics.
+* The vendor SIMD types will be distinct types from the portable SIMD types.
+
+Or, in other words, it's intended that portable SIMD types are entirely
+decoupled from vendor intrinsics. If they both end up being implemented then
+there will be zero-cost interoperation between them, but neither
+will necessarily depend on the other.
+
+[soundbug]: /~https://github.com/rust-lang/rust/issues/44367
+
+## Not stabilizing MMX in this RFC
+
+This RFC proposed notably omitting the MMX intrinsics, or those related to
+`__m64` in other words. The MMX type `__m64` and the intrinsics have been
+somewhat problematic in a number of ways. Known cases include:
+
+* [MMX intrinsics aren't always desirable][mmx1]
+* [LLVM codegen errors happen with debuginfo enabled and MMX][mmx2]
+* [LLVM codegen errors with MMX types and i586][mmx3]
+
+[mmx1]: /~https://github.com/rust-lang/rust/pull/45367#issuecomment-337883136
+[mmx2]: /~https://github.com/rust-lang-nursery/stdsimd/issues/246
+[mmx3]: /~https://github.com/rust-lang-nursery/stdsimd/issues/300
+
+Due to these issues having an unclear conclusion as well as a seeming lack of
+desire to stabilize MMX intrinsics, the `__m64` and all related intrinsics
+**will not be stabilized** via this RFC.
+
+# Drawbacks
+
+[drawbacks]: #drawbacks
+
+This RFC represents a *significant* addition to the standard library, maybe one
+of the largest we've ever done! As a result alternate implementations of Rust
+will likely have a difficult time catching up to rustc/LLVM with all the SIMD
+intrinsics. Additionaly the semantics of "packed SIMD types should work everywhere"
+may be overly difficult to implement in alternate implementations. It is worth
+noting that both Cretonne and GCC support packed SIMD types.
+
+Due to the enormity of what's being added to the standard library it's also
+infeasible to carefully review each addition in isolation. While there are a
+number of automatic verifications in place we're likely to inevitably make a
+mistake when stabilizing something. *Fixing* a stabilization can often be quite
+difficult and costly.
+
+# Rationale and alternatives
+[alternatives]: #alternatives
+
+Over the years quite a few iterations have happened for SIMD in Rust. This RFC
+draws from as many of those as it can and attempts to strike a balance between
+exposing functionality whiel still allowing us to implement everything in a
+stable fashion for years to come (and without blocking us from updating LLVM,
+for example). Despite this there's a few alternatives we could do as well.
+
+## Portable types in vendor interfaces
+
+It was initially attempted in the [stdsimd] crate that we would use the portable
+types on all of the vendor intrinsics. For example instead of:
+
+```rust
+pub unsafe fn _mm_set1_epi8(val: i8) -> __m128i;
+```
+
+we would instead define
+
+```rust
+pub unsafe fn _mm_set1_epi8(val: i8) -> i8x16;
+```
+
+The latter definition here is much easier for a beginner to SIMD to read (or at
+least I gawked when I first saw `__m128i`).
+
+The downside of this approach, however, is that Intel isn't telling us what to
+do. While that may sound simple, this RFC is proposing an addition of
+**thousands** of functions to the standard library in a stable manner. It's
+infeasible for any one person (or even the entire libs team) to scrutinize all
+functions and assess whether the correct signature is applied (aka was it
+`i8x16` or `i16x8`?)
+
+Furthermore not all intrinsics from Intel actually have an interpretation with
+one of the portable types. For example some intrinsics take an integer constant
+which when 0 interprets the input as `u8x16` and when 1 interprets it as
+`u16x8` (as an example). This effectively means that there *isn't* a correct
+choice in all situations for what portable type should be used.
+
+Consequently it's proposed that instead of portable types the exact vendor types
+are used in all vendor intrinsics. This provides us a much easier route to
+stabilization ("make sure it's what Intel says") along with no need to interpret
+what Intel does and attempt to find the most appropriate type.
+
+There is interest by both current `stdsimd` maintainers and users to expose
+a "better-typed" SIMD API in crates.io that builds on top of the intrinsics
+proposed for stabilization here.
+
+## Stabilizing SIMD implementation details
+
+Another alternative to the bulk of this RFC is allowing more raw access to the
+internals of LLVM. For example stabilizing `#[repr(simd)]` or the ability to
+write `extern "platform-intrinsics" { ... }` or `#[link_llvm_intrinsic...]`.
+This is certainly a *much* smaller surface area to stabilize (aka not
+thousands of intrinsics).
+
+This avenue was decided against, however, for a few reasons:
+
+* Such raw interfaces may change over time as they simply represent LLVM as a
+  current point in time rather than what LLVM wants to do in the future.
+* Alternate implementations of rustc or alternate rustc backends like Cretonne
+  may not expose the same sort of functionality that LLVM provides, or
+  implementing the interfaces may be much more difficult in alternate backends
+  than in LLVM's.
+
+As a result, it's intended that instead of exposing raw building blocks (and
+allowing `stdsimd` to live on crates.io) we'll instead pull in `stdsimd` to the
+standard library and expose it as the stable interface to SIMD in Rust.
+
+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+There's a number of unresolved questions around stabilizing SIMD today which
+don't pose serious blockers and may also wish to be considered open bugs rather
+than blocking stabilization:
+
+## Relying on unexported LLVM APIs
+
+The static detection performed by `cfg!` and `#[cfg]` currently relies on a
+[Rust-specific patch to LLVM][llvm-patch]. LLVM internal knows all about
+hierarchies of features and such. For example if you compile with `-C
+target-feature=+avx2` then `cfg!(target_feature = "sse2")` also needs to resolve
+to `true`. Rustc, however, does not know about these features and relies on
+learning this information through LLVM.
+
+Unfortunately though LLVM does not actually export this information for us to
+consume (as far as we know). As a result we have a [local patch][llvm-patch]
+which exposed this information for us to read. The consequence of this
+implementation detail is that when compiled against the system LLVM the `cfg!`
+macro may not work correctly when used in conjunction with `-C target-feature`
+or `-C target-cpu` flags.
+
+It appears that clang [vendors and/or duplicates][clang] LLVM's functionality in
+this regard. It's an option for rustc to do the same but it may also be an
+option to expose the information in upstream LLVM. So far there appears to have
+been no attempts to upstream this patch into LLVM itself.
+
+[llvm-patch]: /~https://github.com/rust-lang/llvm/commit/68e1e29618b2bd094d82faac16cf8e89959bbd68
+[clang]: /~https://github.com/llvm-mirror/clang/blob/679d846fcc73bd213347785185006d591698a132/lib/Basic/Targets/X86.cpp
+
+## Intrinsics that take constant arguments and function pointers
+
+Mentioned above it's planned to give intrinsics that take constant arguments
+some special treatment, but unfortunately we still haven't forbidden function
+pointers to these intrinsics. That means that it's still possible to pass
+runtime-determined values which may otherwise not be possible to do in a world
+where it's powered by `const` features rather than special attributes.
+
+## Packed SIMD types in `extern` functions are not sound
+
+The packed SIMD types have particular care paid to them with respect to their
+ABI in Rust and how they're passed between functions, notably to ensure that
+they work properly throughout Rust programs. The "fix" to pass them in memory
+over function calls, however, was only applied to the "Rust" ABI and not any
+other function ABIs.
+
+A consequence of this change is that if you instead label all your functions
+`extern` then the [same bug][soundbug] will arise. It may be possible to
+implement a "lint" or a compiler error of sorts to forbid this situation in the
+short term. We could also possibly accept this as a known bug for the time
+being.
+
+## What if we're wrong?
+
+Despite the CI infrastructure of the `stdsimd` crate it seems inevitable that
+we'll get an intrinsic wrong at some point. What do we do in a situation like
+that? This situation is somewhat analagous to the `libc` crate but there you can
+fix the problem downstream (just have a corrected type/definition) for for
+vendor intrinsics it's not so easy.
+
+Currently it seems that our only recourse would be to add a `2` suffix to the
+function name or otherwise indicate there's a corrected version, but that's not
+always the best...

From e8312d1be96eb42816df3a63e1e36aaa4d3f6456 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:33:13 -0800
Subject: [PATCH 02/10] typos

---
 text/0000-stable-simd.md | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 9f8a3f0afd9..95e4fdaa087 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -403,13 +403,14 @@ portability guarantees (like `std::os` and unlike the rest of `std`). APIs
 present on one platform may not be present on another.
 
 The contents of the `vendor` modules are defined by, well, vendors! For example
-Intel has an [intrinsics guide][intr-guide] which will serve as a guideline for
+Intel has an [intrinsics guide][intel-intr] which will serve as a guideline for
 all contents in the `vendor` module itself. The standard library will not
 deviate in naming or type signature of any intrinsic defined by a vendor.
 
 For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and
 256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel
-says. Note that all intrinsics will also be `unsafe`, according to [RFC 2045].
+says. Note that all intrinsics will also be `unsafe`, according to [RFC
+2045][rfc2045].
 
 Function signatures defined by vendors are typically defined in terms of C
 types. In Rust, however, those aren't always available! Instead the intrinsics
@@ -490,16 +491,16 @@ the target features enabled for a function everything should agree on how packed
 SIMD arguments are passed across boundaries and whatnot.
 
 Again though, note that this section is largely an implementation detail of SIMD
-in Rust today, though it's enabling the usage Effortsfectively without a lot of
-codegen errors popping up all over the place.
+in Rust today, though it's enabling usage without a lot of codegen errors
+popping up all over the place.
 
 ## Intrinsics in `std::vendor` and constant arguments
 
 There are a number of intrinsics on x86 (and other) platforms that require their
 arguments to be constants rather than decided at runtime. For example
-[`_mm_insert_pi16`] requires its third argument to be a constant value where
-only the lowest two bits are used. The Rust type system, however, does not
-currently have a stable way of expressing this information.
+[`_mm_insert_pi16`][_mm_insert_pi16] requires its third argument to be a
+constant value where only the lowest two bits are used. The Rust type system,
+however, does not currently have a stable way of expressing this information.
 
 Eventually we will likely have some form of `const` arguments or `const`
 machinery to guarantee that these functions are called and monomorphized with
@@ -510,6 +511,7 @@ being invoked with non-constant arguments. Prototyped in [#48018][const-pr] the
 provide this guarantee.
 
 [const-pr]: /~https://github.com/rust-lang/rust/pull/48018
+[_mm_insert_pi16]: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=insert_pi&expand=2973
 
 It's hoped that this restriction will allow `stdsimd` to be forward compatible
 with a future const-powered world of Rust but in the meantime not otherwise
@@ -587,7 +589,7 @@ difficult and costly.
 
 Over the years quite a few iterations have happened for SIMD in Rust. This RFC
 draws from as many of those as it can and attempts to strike a balance between
-exposing functionality whiel still allowing us to implement everything in a
+exposing functionality while still allowing us to implement everything in a
 stable fashion for years to come (and without blocking us from updating LLVM,
 for example). Despite this there's a few alternatives we could do as well.
 

From a65759e4081c51bc615c835d2ba9bbd274179eb9 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:36:56 -0800
Subject: [PATCH 03/10] Correct errant unsafety

---
 text/0000-stable-simd.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 95e4fdaa087..6ef68f7330b 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -92,7 +92,7 @@ AVX2 or not. Thankfully, though, libstd has a handy macro for this!
 [rfc2212]: /~https://github.com/rust-lang/rfcs/pull/2212
 
 ```rust
-pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
+pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
     // Note that this `unsafe` block is safe because we're testing
     // that the `avx2` feature is indeed available on our CPU.
     if cfg_feature_enabled!("avx2") {
@@ -107,7 +107,7 @@ unsafe fn foo_avx2(a: &[u8], b: &[u8], c: &mut [u8]) {
     foo_fallback(a, b, c) // this will inline the function below
 }
 
-unsafe fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
+fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
     for ((a, b), c) in a.iter().zip(b).zip(c) {
         *c = *a + *b;
     }

From f1bddc39836e1162112ff0ebe2f3ac7dbf72a0ae Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:40:22 -0800
Subject: [PATCH 04/10] Tweak wording

---
 text/0000-stable-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 6ef68f7330b..9dded64035e 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -104,7 +104,7 @@ pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
 
 #[target_feature(enable = "avx2")]
 unsafe fn foo_avx2(a: &[u8], b: &[u8], c: &mut [u8]) {
-    foo_fallback(a, b, c) // this will inline the function below
+    foo_fallback(a, b, c) // the function below is inlined here
 }
 
 fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {

From 7fe2ee33f910b9b8d874f6df0e3087c584d748e9 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:43:09 -0800
Subject: [PATCH 05/10] Fix the third play link

---
 text/0000-stable-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 9dded64035e..f0f509e936c 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -119,7 +119,7 @@ to the appropriate function, and only `foo_avx2` is using our `%ymmN` registers!
 
 [asm1]: https://play.rust-lang.org/?gist=36b253cd70840ea2ce6aad90418ec58b&version=nightly&mode=release
 [asm2]: https://play.rust-lang.org/?gist=a31bdd3ce2b9a60e3317ccafb0133490&version=nightly&mode=release
-[asm3]: https://play.rust-lang.org/?gist=111fe75d944aaec409203a3b72e7d742&version=undefined
+[asm3]: https://play.rust-lang.org/?gist=5083f8af2563889a49fe0a3219a1f236&version=nightly&mode=release
 [rfc2045]: /~https://github.com/rust-lang/rfcs/blob/master/text/2045-target-feature.md
 
 Ok great! At this point we've seen how to enable CPU features for

From 73e734322da134582fad6d6ad15afceabbf16074 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:45:09 -0800
Subject: [PATCH 06/10] Even better third link!

---
 text/0000-stable-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index f0f509e936c..54b33176e53 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -119,7 +119,7 @@ to the appropriate function, and only `foo_avx2` is using our `%ymmN` registers!
 
 [asm1]: https://play.rust-lang.org/?gist=36b253cd70840ea2ce6aad90418ec58b&version=nightly&mode=release
 [asm2]: https://play.rust-lang.org/?gist=a31bdd3ce2b9a60e3317ccafb0133490&version=nightly&mode=release
-[asm3]: https://play.rust-lang.org/?gist=5083f8af2563889a49fe0a3219a1f236&version=nightly&mode=release
+[asm3]: https://play.rust-lang.org/?gist=cfce0743910291517aae3b15f70a7cbd&version=nightly&mode=release
 [rfc2045]: /~https://github.com/rust-lang/rfcs/blob/master/text/2045-target-feature.md
 
 Ok great! At this point we've seen how to enable CPU features for

From 12186153aebab9b1f52b72945caaf1954742a972 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 09:55:05 -0800
Subject: [PATCH 07/10] More typos!

---
 text/0000-stable-simd.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 54b33176e53..3142290c465 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -311,12 +311,12 @@ corresponds to the `cfg_target_feature` feature today in rustc, and can be seen
 via:
 
 ```rust
-#[cfg(target_feature = "avx)]
+#[cfg(target_feature = "avx")]
 fn foo() {
     // implementation that can use `avx`
 }
 
-#[cfg(not(target_feature = "avx))]
+#[cfg(not(target_feature = "avx"))]
 fn foo() {
     // a fallback implementation
 }

From e8cf49ba4f32cfd396b72698ca35da17ef9a3a3e Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Wed, 7 Feb 2018 10:22:09 -0800
Subject: [PATCH 08/10] typo

---
 text/0000-stable-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index 3142290c465..c4c8d048e8e 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -712,7 +712,7 @@ being.
 Despite the CI infrastructure of the `stdsimd` crate it seems inevitable that
 we'll get an intrinsic wrong at some point. What do we do in a situation like
 that? This situation is somewhat analagous to the `libc` crate but there you can
-fix the problem downstream (just have a corrected type/definition) for for
+fix the problem downstream (just have a corrected type/definition) for
 vendor intrinsics it's not so easy.
 
 Currently it seems that our only recourse would be to add a `2` suffix to the

From 8c6200f4ad6adbf09632b50219dc7e1a5ea60b09 Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Thu, 8 Feb 2018 13:31:07 -0800
Subject: [PATCH 09/10] Minor updates:

* Rename `std::vendor` to `std::arch`
* Resolve an unresolved question about function pointers
* Rename `cfg_feature_enabled!` to `is_target_feature_detected!`
---
 text/0000-stable-simd.md | 147 +++++++++++++++++++--------------------
 1 file changed, 72 insertions(+), 75 deletions(-)

diff --git a/text/0000-stable-simd.md b/text/0000-stable-simd.md
index c4c8d048e8e..42666e7592b 100644
--- a/text/0000-stable-simd.md
+++ b/text/0000-stable-simd.md
@@ -95,7 +95,7 @@ AVX2 or not. Thankfully, though, libstd has a handy macro for this!
 pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
     // Note that this `unsafe` block is safe because we're testing
     // that the `avx2` feature is indeed available on our CPU.
-    if cfg_feature_enabled!("avx2") {
+    if is_target_feature_detected!("avx2") {
         unsafe { foo_avx2(a, b, c) }
     } else {
         foo_fallback(a, b, c)
@@ -129,12 +129,12 @@ motivation, however, we're just relying on LLVM to auto-vectorize here which
 often isn't good enough or otherwise doesn't expose the functionality we want.
 
 For **explicit and guaranteed simd** on stable Rust you'll be using a new module
-in the standard library, `std::vendor`. The `std::vendor` module is defined by
-vendors, not us actually! For example Intel [publishes a list of
+in the standard library, `std::arch`. The `std::arch` module is defined by
+vendors/architectures, not us actually! For example Intel [publishes a list of
 intrinsics][intel-intr] as does [ARM][arm-intr]. These exact functions and their
-signatures will be available in `std::vendor` with types translated to Rust
+signatures will be available in `std::arch` with types translated to Rust
 (e.g. `int32_t` becomes `i32`). Vendor specific types like `__m128i` on Intel
-will also live in `std::vendor`.
+will also live in `std::arch`.
 
 For example let's say that we're writing a function that encodes a `&[u8]` in
 ascii hex and we want to convert `&[1, 2]` to `"0102"`. The [stdsimd]
@@ -152,10 +152,10 @@ fn hex_encode<'a>(src: &[u8], dst: &'a mut [u8]) -> Result<&'a str, usize> {
 
     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
     {
-        if cfg_feature_enabled!("avx2") {
+        if is_target_feature_detected!("avx2") {
             return unsafe { hex_encode_avx2(src, dst) };
         }
-        if cfg_feature_enabled!("sse4.1") {
+        if is_target_feature_detected!("sse4.1") {
             return unsafe { hex_encode_sse41(src, dst) };
         }
     }
@@ -166,8 +166,8 @@ fn hex_encode<'a>(src: &[u8], dst: &'a mut [u8]) -> Result<&'a str, usize> {
 
 Here we have some routine business about hex encoding in general, and then for
 x86/x86\_64 platforms we have optimized versions specifically for avx2 and
-sse41. Using the `cfg_feature_enabled!` macro in libstd we saw above we'll
-dispatch to the correct one at runtime.
+sse41. Using the `is_target_feature_detected!` macro in libstd we saw above
+we'll dispatch to the correct one at runtime.
 
 Taking a closer look at [`hex_encode_sse41`] we see that it starts out with a
 bunch of weird looking function calls:
@@ -184,7 +184,7 @@ As it turns out though, these are all Intel SIMD intrinsics! For example
 integer register. The intrinsic specificall sets all bytes to the first
 argument.
 
-These functions are all imported through `std::vendor::*` at the top of the
+These functions are all imported through `std::arch::*` at the top of the
 example (in this case `stdsimd::vendor::*`). We go on to use a bunch of these
 intrinsics throughout the `hex_encode_sse41` function to actually do the hex
 encoding.
@@ -209,9 +209,9 @@ AVX2 implementation is nearly 2x faster than the SSE4.1 implementation for large
 inputs, and the SSE4.1 implementation is over 10x faster than the default
 fallback as well.
 
-With `std::vendor` and `cfg_feature_enabled!` we've now written a program that's
-20x faster on supported hardware, yet it also continues to run on older hardware
-as well! Not bad for a few dozen lines on each function!
+With `std::arch` and `is_target_feature_detected!` we've now written a program
+that's 20x faster on supported hardware, yet it also continues to run on older
+hardware as well! Not bad for a few dozen lines on each function!
 
 [intel-intr]: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#
 [arm-intr]: https://developer.arm.com/technologies/neon/intrinsics
@@ -222,7 +222,7 @@ as well! Not bad for a few dozen lines on each function!
 ---
 
 Note that this RFC is explicitly not attempting to stabilize/design a set of
-"portable simd operations". The contents of `std::vendor` are platform specific
+"portable simd operations". The contents of `std::arch` are platform specific
 and provide no guarantees about portability. Efforts in the past, however, such
 as with [simd.js] and the [simd crate][simd-crate] show that it's desirable and
 useful to have a set of types which are usable across platforms.
@@ -232,12 +232,12 @@ example, in terms of platform support and speed on platforms that support it.
 This RFC is not going to go too much into the details about these types, but
 rather these guidelines will still hold:
 
-* The vendor intrinsics **will not** take portable types as arguments. For
-  example `u32x4` and `__m128i` will be different types on x86. The two types,
-  however, will be convertible between one another (either via transmutes or via
-  explicit functions). This conversion will have zero run-time cost.
+* The intrinsics **will not** take portable types as arguments. For example
+  `u32x4` and `__m128i` will be different types on x86. The two types, however,
+  will be convertible between one another (either via transmutes or via explicit
+  functions). This conversion will have zero run-time cost.
 * The portable simd types will likely live in a module like `std::simd` rather
-  than `std::vendor`.
+  than `std::arch`.
 
 The design around these portable types are ongoing, however, and stay tuned for
 an RFC for the `std::simd` module!
@@ -335,24 +335,24 @@ runtime dispatch**. Tweaking these functions is currently done via the `-C
 target-feature` flag to the compiler. This flag to the compiler accepts a
 similar set of strings to the ones specified above and is already "stable".
 
-## The `cfg_feature_enabled!` Macro
+## The `is_target_feature_detected!` Macro
 
-One mode of operation with vendor intrinsics is to compile *part* of a program
-with certain CPU features enabled but not the entire program. This way a
-portable program can be compiled which runs across a broad range of hardware
-which can still benefit from optimized implementations for particular hardware
-at runtime.
+One mode of operation with intrinsics is to compile *part* of a program with
+certain CPU features enabled but not the entire program. This way a portable
+program can be compiled which runs across a broad range of hardware which can
+still benefit from optimized implementations for particular hardware at
+runtime.
 
 The crux of this support in libstd is this macro provided by libstd,
-`cfg_feature_enabled!`. The macro will accept one argument, a string literal.
-The string can be any feature passed to `#[target_feature(enable = ...)]` for
-the platform you're compiling for. Finally, the macro will resolve to a `bool`
-result.
+`is_target_feature_detected!`. The macro will accept one argument, a string
+literal.  The string can be any feature passed to `#[target_feature(enable =
+...)]` for the platform you're compiling for. Finally, the macro will resolve to
+a `bool` result.
 
 For example on x86 you could write:
 
 ```rust
-if cfg_feature_enabled!("sse4.1") {
+if is_target_feature_detected!("sse4.1") {
     println!("this cpu has sse4.1 features enabled!");
 }
 ```
@@ -360,8 +360,8 @@ if cfg_feature_enabled!("sse4.1") {
 It would, however, be an error to write this on x86 cpus:
 
 ```rust
-cfg_feature_enabled!("neon"); //~ ERROR: neon is an ARM feature, not x86
-cfg_feature_enabled!("foo"); //~ ERROR: unknown target feature for x86
+is_target_feature_detected!("neon"); //~ COMPILE ERROR: neon is an ARM feature, not x86
+is_target_feature_detected!("foo"); //~ COMPILE ERROR: unknown target feature for x86
 ```
 
 The macro is intended to be implemented in the `std` crate (**not** `core`) and
@@ -370,13 +370,13 @@ is expected to be what [`stdsimd`][stdsimd] does today, notably:
 
 * The first time the macro is invoked all the local CPU features will be
   detected.
-* The detected features will then be cached globally (currently in a bitset) for
-  the rest of the execution of the program.
-* Further invocations of `cfg_feature_enabled!` are expected to be cheap runtime
-  dispatches. (aka load a value and check whether a bit is set)
+* The detected features will then be cached globally (when possible and
+  currently in a bitset) for the rest of the execution of the program.
+* Further invocations of `is_target_feature_detected!` are expected to be cheap
+  runtime dispatches. (aka load a value and check whether a bit is set)
 * Exception: in some cases the result of the macro is statically known: for
-  example, `cfg_feature_enabled!("sse2")` when the binary is being compiled
-  with "sse42" globally. In these cases, none of the steps above are
+  example, `is_target_feature_detected!("sse2")` when the binary is being
+  compiled with "sse42" globally. In these cases, none of the steps above are
   performed and the macro just expands to `true`.
 
 The exact method of CPU feature detection various by platform, OS, and
@@ -394,29 +394,30 @@ where it is possible to do CPU feature detection in libcore (as it's just the
 be available in libstd for now. This placement can of course be relaxed in the
 future if necessary.
 
-## The `std::vendor` Module
+## The `std::arch` Module
 
 This is where the real meat is. A new module will be added to the standard
-library, `std::vendor`. This module will also be available in `core::vendor`
+library, `std::arch`. This module will also be available in `core::arch`
 (and `std` will simply reexport it). The contents of this module provide no
 portability guarantees (like `std::os` and unlike the rest of `std`). APIs
 present on one platform may not be present on another.
 
-The contents of the `vendor` modules are defined by, well, vendors! For example
-Intel has an [intrinsics guide][intel-intr] which will serve as a guideline for
-all contents in the `vendor` module itself. The standard library will not
-deviate in naming or type signature of any intrinsic defined by a vendor.
+The contents of the `arch` modules are defined by, well, architectures! For
+example Intel has an [intrinsics guide][intel-intr] which will serve as a
+guideline for all contents in the `arch` module itself. The standard library
+will not deviate in naming or type signature of any intrinsic defined by an
+architecture.
 
 For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and
 256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel
 says. Note that all intrinsics will also be `unsafe`, according to [RFC
 2045][rfc2045].
 
-Function signatures defined by vendors are typically defined in terms of C
+Function signatures defined by architectures are typically defined in terms of C
 types. In Rust, however, those aren't always available! Instead the intrinsics
 will be defined in terms of Rust-specific types. Some types are easily
 translated such as `int32_t`, but otherwise a different mapping may be applied
-per-vendor.
+per-architecture.
 
 The current proposed mapping for x86 intrinsics is:
 
@@ -432,8 +433,8 @@ The current proposed mapping for x86 intrinsics is:
 [0] required to be compile-time constants.
 
 Other than these exceptions the x86 intrinsics will be defined exactly as Intel
-defines them. This will necessitate new types in the `std::vendor` modules for
-SIMD registers! For example these new types will all be present in `std::vendor`
+defines them. This will necessitate new types in the `std::arch` modules for
+SIMD registers! For example these new types will all be present in `std::arch`
 on x86 platforms:
 
 * `__m128`
@@ -445,10 +446,10 @@ on x86 platforms:
 
 (note that AVX-512 types will come in the future!)
 
-Infrastructure-wise the contents of `std::vendor` are expected to continue to be
+Infrastructure-wise the contents of `std::arch` are expected to continue to be
 defined in the [`stdsimd` crate/repository][stdsimd]. Intrinsics defined here go
 through a rigorous test suite involving automatic verification against the
-upstream vendor defintion, verification that the correct instruction is
+upstream architecture defintion, verification that the correct instruction is
 generated by LLVM, and at least one runtime test for each intrinsic to ensure it
 not only compiles but also produces correct results. It's expected that
 stabilized intrinsics will meet these critera to the best of their ability.
@@ -457,23 +458,23 @@ Currently today on x86 and ARM platforms the stdsimd crate performs all these
 checks, but these checks are not yet implemented for PowerPC/MIPS/etc, but
 that's always just some more work to do!
 
-It's not expected that the contents of `std::vendor` will remain static for all
+It's not expected that the contents of `std::arch` will remain static for all
 time. Rather intrinsics will continue to be implemented in `stdsimd` and make
 their way into the main Rust repository. For example there are not currently any
 implemented AVX-512 intrinsics, but that doesn't mean we won't implement them!
 Rather once implemented they'll be stabilized and included in libstd following
 the Rust release model.
 
-## The types in `std::vendor`
+## The types in `std::arch`
 
-It's worth paying close attention to the types in `std::vendor`. Types like
+It's worth paying close attention to the types in `std::arch`. Types like
 `__m128i` are intended to represent a 128-bit packed SIMD register on x86, but
 there's nothing stopping you from using types like `Option<__m128i>` in your
 program!  Most generic containers and such probably aren't written with packed
 SIMD types in mind, and it'd be a bummer if everything stopped working once you
 used a packed SIMD type in one of them.
 
-Instead it will be required that the types defined in `std::vendor` do indeed
+Instead it will be required that the types defined in `std::arch` do indeed
 work when used in "nonstandard" contexts. For example `Option<__m128i>` should
 never produce a compiler error or a codegen error when used (it may just be
 slower than you expect). This requires special care to be taken both in
@@ -494,7 +495,7 @@ Again though, note that this section is largely an implementation detail of SIMD
 in Rust today, though it's enabling usage without a lot of codegen errors
 popping up all over the place.
 
-## Intrinsics in `std::vendor` and constant arguments
+## Intrinsics in `std::arch` and constant arguments
 
 There are a number of intrinsics on x86 (and other) platforms that require their
 arguments to be constants rather than decided at runtime. For example
@@ -508,9 +509,12 @@ constant arguments, but for now this RFC proposes taking a more conservative
 route forward. Instead we'll, for the time being, forbid the functions from
 being invoked with non-constant arguments. Prototyped in [#48018][const-pr] the
 `stdsimd` crate will have an unstable attribute where the compiler can help
-provide this guarantee.
+provide this guarantee. As an extra precaution as well [#48078][const-pr2] also
+implements disallowing taking a function pointer to these intrinsics, requiring
+a direct invocation.
 
 [const-pr]: /~https://github.com/rust-lang/rust/pull/48018
+[const-pr2]: /~https://github.com/rust-lang/rust/pull/48078
 [_mm_insert_pi16]: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=insert_pi&expand=2973
 
 It's hoped that this restriction will allow `stdsimd` to be forward compatible
@@ -538,13 +542,14 @@ orthogonal to scalable-vector types which are expected to be proposed in
 another, also different, RFC.  What this RFC does do, however, is explicitly
 specify that:
 
-* The portable SIMD types (both packed and scalable) will not be used in vendor
+* The portable SIMD types (both packed and scalable) will not be used in
   intrinsics.
-* The vendor SIMD types will be distinct types from the portable SIMD types.
+* The per-architecture SIMD types will be distinct types from the portable SIMD
+  types.
 
 Or, in other words, it's intended that portable SIMD types are entirely
-decoupled from vendor intrinsics. If they both end up being implemented then
-there will be zero-cost interoperation between them, but neither
+decoupled from intrinsics. If they both end up being implemented then
+there will be jkro-cost interoperation between them, but neither
 will necessarily depend on the other.
 
 [soundbug]: /~https://github.com/rust-lang/rust/issues/44367
@@ -574,9 +579,9 @@ desire to stabilize MMX intrinsics, the `__m64` and all related intrinsics
 This RFC represents a *significant* addition to the standard library, maybe one
 of the largest we've ever done! As a result alternate implementations of Rust
 will likely have a difficult time catching up to rustc/LLVM with all the SIMD
-intrinsics. Additionaly the semantics of "packed SIMD types should work everywhere"
-may be overly difficult to implement in alternate implementations. It is worth
-noting that both Cretonne and GCC support packed SIMD types.
+intrinsics. Additionaly the semantics of "packed SIMD types should work
+everywhere" may be overly difficult to implement in alternate implementations.
+It is worth noting that both Cretonne and GCC support packed SIMD types.
 
 Due to the enormity of what's being added to the standard library it's also
 infeasible to carefully review each addition in isolation. While there are a
@@ -593,10 +598,10 @@ exposing functionality while still allowing us to implement everything in a
 stable fashion for years to come (and without blocking us from updating LLVM,
 for example). Despite this there's a few alternatives we could do as well.
 
-## Portable types in vendor interfaces
+## Portable types in architecture interfaces
 
 It was initially attempted in the [stdsimd] crate that we would use the portable
-types on all of the vendor intrinsics. For example instead of:
+types on all of the intrinsics. For example instead of:
 
 ```rust
 pub unsafe fn _mm_set1_epi8(val: i8) -> __m128i;
@@ -624,8 +629,8 @@ which when 0 interprets the input as `u8x16` and when 1 interprets it as
 `u16x8` (as an example). This effectively means that there *isn't* a correct
 choice in all situations for what portable type should be used.
 
-Consequently it's proposed that instead of portable types the exact vendor types
-are used in all vendor intrinsics. This provides us a much easier route to
+Consequently it's proposed that instead of portable types the exact architecture
+types are used in all intrinsics. This provides us a much easier route to
 stabilization ("make sure it's what Intel says") along with no need to interpret
 what Intel does and attempt to find the most appropriate type.
 
@@ -685,14 +690,6 @@ been no attempts to upstream this patch into LLVM itself.
 [llvm-patch]: /~https://github.com/rust-lang/llvm/commit/68e1e29618b2bd094d82faac16cf8e89959bbd68
 [clang]: /~https://github.com/llvm-mirror/clang/blob/679d846fcc73bd213347785185006d591698a132/lib/Basic/Targets/X86.cpp
 
-## Intrinsics that take constant arguments and function pointers
-
-Mentioned above it's planned to give intrinsics that take constant arguments
-some special treatment, but unfortunately we still haven't forbidden function
-pointers to these intrinsics. That means that it's still possible to pass
-runtime-determined values which may otherwise not be possible to do in a world
-where it's powered by `const` features rather than special attributes.
-
 ## Packed SIMD types in `extern` functions are not sound
 
 The packed SIMD types have particular care paid to them with respect to their

From 7ce3862f1e354aac841e5b35fe873ac86c7da01a Mon Sep 17 00:00:00 2001
From: Alex Crichton <alex@alexcrichton.com>
Date: Mon, 26 Feb 2018 13:10:23 -0800
Subject: [PATCH 10/10] RFC 2325

---
 text/{0000-stable-simd.md => 2325-stable-simd.md} | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename text/{0000-stable-simd.md => 2325-stable-simd.md} (99%)

diff --git a/text/0000-stable-simd.md b/text/2325-stable-simd.md
similarity index 99%
rename from text/0000-stable-simd.md
rename to text/2325-stable-simd.md
index 42666e7592b..a2c0e42b048 100644
--- a/text/0000-stable-simd.md
+++ b/text/2325-stable-simd.md
@@ -1,7 +1,7 @@
 - Feature Name: `vendor_intrinsics`
 - Start Date: 2018-02-04
-- RFC PR: (leave this empty)
-- Rust Issue: (leave this empty)
+- RFC PR: [rust-lang/rfcs#2325](/~https://github.com/rust-lang/rfcs/pull/2325)
+- Rust Issue: [rust-lang/rust#48556](/~https://github.com/rust-lang/rust/issues/48556)
 
 # Summary
 [summary]: #summary