Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing functions for f16 and f128 #587

Closed
wants to merge 13 commits into from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Apr 7, 2024

Fixes #567

README.md Outdated Show resolved Hide resolved
@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from fc0051a to 405f0aa Compare April 7, 2024 17:15
@tgross35
Copy link
Contributor Author

@Amanieu this will be the corresponding update to builtins once rust-lang/rust#122470 merges. How do I need to gate things here since it will only work with a recent nightly?

@Amanieu
Copy link
Member

Amanieu commented Apr 10, 2024

compiler-builtins is effectively an internal implementation detail of rustc, you're not meant to use it directly. So it therefore only supports the latest rustc anyways.

However you may have problems with bootstrap since that uses the latest beta which won't have support for the f16/f128 types. I believe cfg(bootstrap) should work for dependencies of the standard library though.

README.md Outdated
@@ -264,7 +269,6 @@ These builtins involve floating-point types ("`f128`", "`f80`" and complex numbe
- ~~ppc/gcc_qmul.c~~
- ~~ppc/gcc_qsub.c~~
- ~~ppc/multc3.c~~
- ~~subtf3.c~~
- ~~trunctfdf2.c~~
- ~~trunctfsf2.c~~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these aren't needed? I would expect all the tf builtins to be needed for f128.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, I just wanted to get arithmetic passing before working on those.

@tgross35
Copy link
Contributor Author

Awesome, thanks for the information. Luckily compiler support did get merged right before the beta branch so I have been able to avoid cfg(not(bootstrap))ing in most places.

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch 2 times, most recently from d6c05d0 to e5e93e8 Compare April 12, 2024 02:15
@tgross35
Copy link
Contributor Author

What is the best way to test this locally? For everything I have tried, the macros at

compiler_builtins::foreach_cas!(cas::test);
compiler_builtins::foreach_cas16!(test_cas16);
compiler_builtins::foreach_swp!(swap::test);
compiler_builtins::foreach_ldadd!(add::test);
compiler_builtins::foreach_ldclr!(clr::test);
compiler_builtins::foreach_ldeor!(xor::test);
compiler_builtins::foreach_ldset!(or::test);
can't be located and running ./ci/run-docker.sh doesn't find the right linker. Then the tests at /~https://github.com/tgross35/compiler-builtins/blob/7d347bb9e348e070a87b4123a29c3fdd796d4dee/testcrate/tests/addsub.rs#L112 fail because the symbols to compare against aren't available on my machine (which is aarch64, maybe I just need to gate this?).

Looking forward to being able to get some output from CI but I'm still waiting on nightly to catch up 🙂

@Amanieu
Copy link
Member

Amanieu commented Apr 13, 2024

Normally run-docker.sh just works, but I guess nobody has tried running it on aarch64. It works fine for me when run on an x86_64 host.

If you're feeling adventurous, you can try fixing the tests to work on an aarch64 host.

Regarding the LSE issues, I think that may be because you are on an Apple system where these symbols don't exist. That test should be gated on target_os = "linux".

@tgross35
Copy link
Contributor Author

The linker just wasn't available on ubuntu 18, but I had to add a way to not use host rustc. Opened #591 to add a way to use a different image

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from 7d347bb to a467dd2 Compare April 13, 2024 19:31
@@ -201,6 +201,10 @@ intrinsics! {
add(a, b)
}

pub extern "C" fn __addtf3(a: f128, b: f128) -> f128 {
add(a, b)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will unconditionally codegen f16 and f128 usage, breaking cg_clif and cg_gcc. Maybe add a cargo feature to disable these new intrinsics?

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch 2 times, most recently from cb353e2 to 44c414f Compare April 14, 2024 07:30
@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch 2 times, most recently from 32d6c0f to db3fb45 Compare April 14, 2024 09:05
@tgross35
Copy link
Contributor Author

tgross35 commented Apr 15, 2024

What is the reason tests don't seem to get run on these platforms? (aarch, i586, thumb, wasm). Wondering if they should be enabled for this.

image

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from a19f4ec to 53203b0 Compare April 17, 2024 06:08
@tgross35
Copy link
Contributor Author

tgross35 commented Apr 17, 2024

My widening function is failing at "0xffffffffffffffffffffffffffffffff * 0xffffffffffffffffffffffffffffffff = 0xfffffffffffffffffffffffffffffffe00000000000000000000000000000001 got 0xfffffffffffffffffffffffffffffffd00000000000000000000000000000001", /~https://github.com/rust-lang/compiler-builtins/actions/runs/8717331100/job/23912367209?pr=587#step:9:370. This seems to be identical to what LLVM's wideMultiply does, https://clang.godbolt.org/z/Y4vMxd6se (original source), but from Python it seems like the upper half should end with an e rather than a d:

>>> hex(0xffffffffffffffffffffffffffffffff * 0xffffffffffffffffffffffffffffffff)
'0xfffffffffffffffffffffffffffffffe00000000000000000000000000000001'

Relevant code if anyone has any ideas

fn zero_widen_mul(self, rhs: Self) -> Self::D {
let product11: u64 = word!(1, self) * word!(1, rhs);
let product12: u64 = word!(1, self) * word!(2, rhs);
let product13: u64 = word!(1, self) * word!(3, rhs);
let product14: u64 = word!(1, self) * word!(4, rhs);
let product21: u64 = word!(2, self) * word!(1, rhs);
let product22: u64 = word!(2, self) * word!(2, rhs);
let product23: u64 = word!(2, self) * word!(3, rhs);
let product24: u64 = word!(2, self) * word!(4, rhs);
let product31: u64 = word!(3, self) * word!(1, rhs);
let product32: u64 = word!(3, self) * word!(2, rhs);
let product33: u64 = word!(3, self) * word!(3, rhs);
let product34: u64 = word!(3, self) * word!(4, rhs);
let product41: u64 = word!(4, self) * word!(1, rhs);
let product42: u64 = word!(4, self) * word!(2, rhs);
let product43: u64 = word!(4, self) * word!(3, rhs);
let product44: u64 = word!(4, self) * word!(4, rhs);
let sum0: u128 = u128::from(product44);
let sum1: u128 = u128::from(product34) + u128::from(product43);
let sum2: u128 = u128::from(product24) + u128::from(product33) + u128::from(product42);
let sum3: u128 = u128::from(product14)
+ u128::from(product23)
+ u128::from(product32)
+ u128::from(product41);
let sum4: u128 = u128::from(product13) + u128::from(product22) + u128::from(product31);
let sum5: u128 = u128::from(product12) + u128::from(product21);
let sum6: u128 = u128::from(product11);
let r0: u128 =
(sum0 & u128::from(WORD_FULL_MASK)) + ((sum1 & u128::from(WORD_LO_MASK)) << 32);
let r1: u128 = (sum0 >> 64)
+ ((sum1 >> 32) & u128::from(WORD_FULL_MASK))
+ (sum2 & u128::from(WORD_FULL_MASK))
+ ((sum3 << 32) & u128::from(WORD_HI_MASK));
let lo = r0.wrapping_add(r1 << 64);
let hi = (r1 >> 64)
+ (sum1 >> 96)
+ (sum2 >> 64)
+ (sum3 >> 32)
+ sum4
+ (sum5 << 32)
+ (sum6 << 64);
u256([
(lo & U128_LO_MASK) as u64,
((lo >> 64) & U128_LO_MASK) as u64,
(hi & U128_LO_MASK) as u64,
((hi >> 64) & U128_LO_MASK) as u64,
])
}

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from 5dc5fda to e03afc8 Compare April 19, 2024 17:45
@marthadev
Copy link

My widening function is failing at "0xffffffffffffffffffffffffffffffff * 0xffffffffffffffffffffffffffffffff = 0xfffffffffffffffffffffffffffffffe00000000000000000000000000000001 got 0xfffffffffffffffffffffffffffffffd00000000000000000000000000000001", /~https://github.com/rust-lang/compiler-builtins/actions/runs/8717331100/job/23912367209?pr=587#step:9:370. This seems to be identical to what LLVM's wideMultiply does, https://clang.godbolt.org/z/Y4vMxd6se (original source), but from Python it seems like the upper half should end with an e rather than a d:

AFAICT the code is dropping a carry with the wrapping_add

let lo = r0.wrapping_add(r1 << 64);
let hi = (r1 >> 64)
+ (sum1 >> 96)
+ (sum2 >> 64)
+ (sum3 >> 32)
+ sum4
+ (sum5 << 32)
+ (sum6 << 64);

Should probably be something like:

    let (lo, carry) = r0.overflowing_add(r1 << 64);
    let hi = (r1 >> 64)
        + (sum1 >> 96)
        + (sum2 >> 64)
        + (sum3 >> 32)
        + sum4
        + (sum5 << 32)
        + (sum6 << 64)
        + u128::from(carry);

@tgross35
Copy link
Contributor Author

Thanks @marthadev, I just today realized that we have a nightly carrying_add too to make this even easier. Any idea if what LLVM has is also incorrect then, or am I missing something in that comparison?

@marthadev
Copy link

marthadev commented Apr 25, 2024

I just today realized that we have a nightly carrying_add

TIL, it does indeed look nice

Any idea if what LLVM has is also incorrect then, or am I missing something in that comparison?

The LLVM implementation seems to have the same problem, which honestly surprises me given that the code is 10 years old. I have no clue though how widely that function is being used, but having a logic bug in integer arithmetic seems pretty bad, so I am glad you found it.

Edit: Curiously enough, the LLVM 64x64 version even explicitly comments about the carry

@tgross35
Copy link
Contributor Author

I opened an issue at llvm/llvm-project#90009 for the LLVM side to take a look at, thanks for confirming.

@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from fcd9612 to 6d70c0f Compare May 6, 2024 05:37
Try splitting part of 'Int' into 'MinInt' so we don't need to implement everything on u256/i256

Add addsub test

Add mul/div/rem tests

Add cmp test

Remove 32-bit div implementation

formatting updates

disable div tests for now

Bigint updates

Big update

Fix widen mul

wrapping add

disable duplicate symbols in builtins

Apply temporary unord fix from @beetrees rust-lang#593

tests

add lowerhex display

errors by ref

tests

fix-test

Update big tests

Fix core calls

Disable widen_mul for signed

Test adding symbols in build.rs

Add a feature to compile intrinsics that are missing on the system for testing

update

Disable f128 tests on platforms without system support

add missing build.rs file

pull cas file from master

testgs

print more div values

Add a benchmark

Work on fixing bit widths

Update benchmark
@tgross35 tgross35 force-pushed the f16-f128-intrinsics branch from 6d70c0f to bbe3aef Compare May 6, 2024 06:34
@tgross35
Copy link
Contributor Author

Everything except division has been cleaned up and split off to #606

@tgross35
Copy link
Contributor Author

Superceded by a few PRs, many have merged already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to find long double symbols on aarch64-apple-darwin
5 participants