-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we handle matrix ABIs? #133144
Comments
afaik PowerPC MMA doesn't change the ABI: #131800 (comment) |
It is good this issue is about handling ABIs rather than merely describing them, then? specifically, if we want to avoid involving this in our ABIs, we need to adopt the same bans. |
How does LLVM even represent these types in function signatures? Sounds to me like this will require |
I'm not sure if there's much in common that would justify My current understanding is PowerISA's Matrix Multiply Assist
C Interop
Intrinsics
Arm Scalable Matrix ExtensionsIt is almost more like a dedicated thread-local allocation... the "ZArray"... that gets reinterpreted or examined along various dimensions. Then you set the CPU into Matrix Math... sorry, "Arm Streaming SVE" state... and Big Array Math happens, accumulating into the ZArray. The Big Array Math however is expressible as vector operations that just might use a different size than the normal Arm SVE operations, which is why it's "Streaming SVE": the model is "matrix math is mostly a pile of vector operations, done really fast". This does remove the ability to use some of the more complicated Arm SVE2 operations while in it. C Interop
x86 AMX TilesThe tiles seem to be more "classic" registers, but use an interesting API. They are also "shape-changing" in a way. I assume @sayantn knows more about this. C Interop
Intrinsics
fn some_tile_intrinsic(dst: &mut __tile1024i, src_a: __tile1024i, src_b: __tile1024i) |
For AMX, the tile registers are nothing complicated - just plain old registers (with a 8192 bit size, so it is not enabled by default on Linux). The interesting bit is the Take for example the instruction Intel lists 2 intrinsics, the The Also, the type for |
Some CPU architectures have developed "matrix extensions". These are sometimes equivalent to "vectors, but bigger" in terms of how the ABI should be handled (reusing the same architectural state, thus having similar concerns). But not always! They may use entirely different architectural state, usually entirely "caller-save" (i.e. always "volatile" or "call-clobbered").
AArch64
Scalable Matrix Extensions
PowerPC
MMA
x86
AMX
x86_amx_intrinsics
#126622amx_tile
type, AKAx86_amx
or__tile1024i
References
The text was updated successfully, but these errors were encountered: