Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang's isSignedCharDefault() is incorrect for multiple architectures #115957

Closed
4 tasks done
arichardson opened this issue Nov 12, 2024 · 13 comments · Fixed by #115964
Closed
4 tasks done

Clang's isSignedCharDefault() is incorrect for multiple architectures #115957

arichardson opened this issue Nov 12, 2024 · 13 comments · Fixed by #115964
Labels
ABI Application Binary Interface clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' confirmed Verified by a second party diverges-from:gcc Does the clang frontend diverge from gcc on this issue release:backport

Comments

@arichardson
Copy link
Member

arichardson commented Nov 12, 2024

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while comparing isSignedCharDefault() with the Rust list of systems with unsigned char: /~https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears s390x, csky, xtensa, msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@arichardson arichardson added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" ABI Application Binary Interface diverges-from:gcc Does the clang frontend diverge from gcc on this issue labels Nov 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/issue-subscribers-clang-frontend

Author: Alexander Richardson (arichardson)

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while isSignedCharDefault() with the Rust list of systems with unsigned char: /~https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears, s390x and csky, xtensa and msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@shafik
Copy link
Collaborator

shafik commented Nov 13, 2024

CC @erichkeane @AaronBallman

@AaronBallman AaronBallman added the confirmed Verified by a second party label Nov 13, 2024
@AaronBallman
Copy link
Collaborator

Something to consider as we fix these: we may need to provide ABI tags so users can still get the old (incorrect) ABI as needed.

@arichardson
Copy link
Member Author

Do we really need a compatibility fallback here? The sign of char rarely matters for code generation and I believe none of the affected targets are "tier 1"? Affected projects can always use -f(no)-signed-char.

@AaronBallman
Copy link
Collaborator

Do we really need a compatibility fallback here?

Because this impacts the answer you get back from type traits like is_signed_v, and those tend to get used when specializing templates, the ABI impact can be surprisingly wide.

The sign of char rarely matters for code generation and I believe none of the affected targets are "tier 1"? Affected projects can always use -f(no)-signed-char.

That's a good point I hadn't considered, but it seems somewhat unclean to me. That gets us to "to get the Clang 19 ABI, pass -fclang-abi-compat=19 but you may also need to set -f(no)-signed-char as well" which isn't great.

@arichardson
Copy link
Member Author

The template specialization is a good point, I had not considered that.

I think the main question is whether the affected targets have a real binary compatibility story or tend to be more of an embedded "recompile the whole system" target.

  • MSP430 is a microcontroller, so I'd imagine limited binary compatibility concerns.
  • I don't know anything about CSKY but it is 32-bit embedded with Linux support, so that may have stronger requirements.
  • I also don't know much about XTensa, but it also appears to be used for embedded architectures.

We could forward args from RenderCharacterOptions to isSignedCharDefault add a check for OPT_fclang_abi_compat_EQ inside this function, but that would require some duplication unless we move this logic from the Driver to actual per-target LangOptions overrides.

Arguably the latter would be cleaner than handling this in the Driver (and there is already a FIXME). This would allow test/Preprocessor to check for the value of __CHAR_UNSIGNED__ instead of writing Driver tests as I did in #115967 and #115964. But I don't think I will be able find the time to do this larger refactoring any time soon.

arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
@AaronBallman
Copy link
Collaborator

I think the main question is whether the affected targets have a real binary compatibility story or tend to be more of an embedded "recompile the whole system" target.

* MSP430 is a microcontroller, so I'd imagine limited binary compatibility concerns.

* I don't know anything about CSKY but it is 32-bit embedded with Linux support, so that may have  stronger requirements.

* I also don't know much about XTensa, but it also appears to be used for embedded architectures.

Yeah, my hope is always that people will be able to recompile everything at once, but my experience has been that there's a surprising number of situations where that's not plausible. My bigger concern is more that we tell users to use -fclang-abi-compat=X to get ABI compatibility with version X of Clang; I think it's useful for there to be one flag to regain ABI compatibility instead of having to combine flags to do so.

We could forward args from RenderCharacterOptions to isSignedCharDefault add a check for OPT_fclang_abi_compat_EQ inside this function, but that would require some duplication unless we move this logic from the Driver to actual per-target LangOptions overrides.

Arguably the latter would be cleaner than handling this in the Driver (and there is already a FIXME). This would allow test/Preprocessor to check for the value of __CHAR_UNSIGNED__ instead of writing Driver tests as I did in #115967 and #115964. But I don't think I will be able find the time to do this larger refactoring any time soon.

Yeah, I think handling this on a per-target basis would be cleaner than handling this in the driver. However, I don't think that larger refactoring should block fixing the behavior of whether char is signed or not.

Do we document ABI stability guarantees for any of the target tiers? e.g., can we skip ABI tags because these targets are not tier 1 and we document somewhere that they're not ABI stable as a result?

arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
arichardson added a commit to arichardson/rust that referenced this issue Nov 13, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).
@EugeneZelenko EugeneZelenko added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' and removed clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/issue-subscribers-clang-driver

Author: Alexander Richardson (arichardson)

While debugging a signed/unsigned char issue I started looking at the list of triples that use unsigned chars by default and it appears the logic in Clang does not match the ABI documents for some architectures. I noticed some issues while comparing isSignedCharDefault() with the Rust list of systems with unsigned char: /~https://github.com/rust-lang/rust/blob/6503543d11583d1686d4989847b2afbec8d9fdba/library/core/src/ffi/mod.rs#L92

So far it appears s390x, csky, xtensa, msp430 are missing from isSignedCharDefault(), but there might be other omissions.

I don't see a CSKY GCC on godbolt, but https://godbolt.org/z/qaaW18znY confirms that the other architectures diverge from GCC

@arichardson
Copy link
Member Author

MSP430 has been fixed (so github autoclosed this) but others may need a compatibility storage.

@arichardson arichardson reopened this Nov 14, 2024
arichardson added a commit to arichardson/rust that referenced this issue Nov 15, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
@arichardson arichardson reopened this Nov 23, 2024
arichardson added a commit to arichardson/rust that referenced this issue Dec 4, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
arichardson added a commit to arichardson/rust that referenced this issue Dec 10, 2024
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 11, 2024
De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 11, 2024
Rollup merge of rust-lang#132975 - arichardson:ffi-c-char, r=tgross35

De-duplicate and improve definition of core::ffi::c_char

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes rust-lang#129945
Closes rust-lang#131319
arichardson added a commit that referenced this issue Feb 9, 2025
@arichardson
Copy link
Member Author

This should all be fixed now.

/cherry-pick d204724

@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2025

This should all be fixed now.

/cherry-pick d204724

Error: Command failed due to missing milestone.

@arichardson
Copy link
Member Author

/cherry-pick d204724

@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2025

/pull-request #126436

github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 9, 2025
@nikic nikic moved this from Needs Triage to Done in LLVM Release Status Feb 10, 2025
Icohedron pushed a commit to Icohedron/llvm-project that referenced this issue Feb 11, 2025
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this issue Feb 20, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this issue Feb 20, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to carolynzech/rust that referenced this issue Feb 20, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Feb 20, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to thanhnguyen-aws/verify-rust-std that referenced this issue Feb 21, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to thanhnguyen-aws/verify-rust-std that referenced this issue Feb 21, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 21, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 21, 2025
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Feb 22, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to carolynzech/rust that referenced this issue Feb 22, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this issue Feb 22, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
github-actions bot pushed a commit to thanhnguyen-aws/verify-rust-std that referenced this issue Feb 22, 2025
Instead of having a list of unsigned char targets for each OS, follow the
logic Clang uses and instead set the value based on architecture with
a special case for Darwin and Windows operating systems. This makes it
easier to support new operating systems targeting Arm/AArch64 without
having to modify this config statement for each new OS. The new list does
not quite match Clang since I noticed a few bugs in the Clang
implementation (llvm/llvm-project#115957).

Fixes: rust-lang#129945
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' confirmed Verified by a second party diverges-from:gcc Does the clang frontend diverge from gcc on this issue release:backport
Projects
Development

Successfully merging a pull request may close this issue.

5 participants