Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove basic_string<unsigned char> from embind #23070

Merged

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Dec 4, 2024

Only char, wchar, char8, char16, and char32 are valid specializations for std::basic_string:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for basic_string that allowed any type to be passed for a long time. It looks there have been several attempts to remove this after which they restored it due to complaints, in chronological order:
llvm/llvm-project@aeecef0 llvm/llvm-project@08a0faf llvm/llvm-project@e30a148 llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694, eventually removed it. So std::basic_string<unsigned char> is not allowed anymore. This removes all uses of std::basic_string<unsigned char> from embind.

This needs to be done to update libc++ to LLVM 19 (#22994). I'm uploading this as a separate PR because this removes a functionality from embind.

@aheejin aheejin requested review from sbc100 and brendandahl December 4, 2024 10:27
@aheejin aheejin changed the title Remove basic_string<unsigned_char> from embind Remove basic_string<unsigned char> from embind Dec 4, 2024
@brendandahl
Copy link
Collaborator

I suppose this could break some people, though I'm not aware of any active users of it. For c++20, we could change it to register char8_t, however I lean towards removing it and bringing it back if needed.

If we do remove it, we should also update the code here. The condition (name === "std::string") will always be true then, so we can use the preprocessor macro to only emit the code in the corresponding branches of the if (stdStringIsUTF8).

test("can pass Uint8Array to std::basic_string<unsigned char>", function() {
var e = cm.emval_test_take_and_return_std_basic_string_unsigned_char(new Uint8Array([65, 66, 67, 68]));
assert.equal('ABCD', e);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the test such that it tests that this does not work?

Copy link
Member Author

@aheejin aheejin Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, and to do that I needed to revive emval_test_take_and_return_std_basic_string_unsigned_char in embind_test.cpp and the error message is like Cannot call emval_test_take_and_return_std_basic_string_unsigned_char due to unbound types: NSt3__212basic_stringIhNS_11char_traitsIhEENS_9allocatorIhEEEE, so the first test would be like

        test("cannot pass Uint8Array to std::basic_string<unsigned char>", function() {       
            var e = assert.throws(TypeError, function() {                        
              cm.emval_test_take_and_return_std_basic_string_unsigned_char(new Uint8Array([65, 66, 67, 68]));      
            });                                                                  
            assert.equal('Cannot call emval_test_take_and_return_std_basic_string_unsigned_char due to unbound types: NSt3__212basic_stringIhNS_11char_traitsIhEENS_9allocatorIhEEEE', e.message);
        }); 

Are you suggesting to change five deleted tests here like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping 😀 By the way I wasn't able to get that code to work, it says the throwing error kind is different...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping ping 😀 I think we can just delete this. The libc++ change means they don't allow invalid specializations anymore and unsigned char is one of many possible invalid specializations after all (Even basic_string<int> was possible before), and I don't think we can cover all invalid specializations by error tests anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@aheejin
Copy link
Member Author

aheejin commented Dec 5, 2024

If we do remove it, we should also update the code here. The condition (name === "std::string") will always be true then, so we can use the preprocessor macro to only emit the code in the corresponding branches of the if (stdStringIsUTF8).

I tried to replace if (stdStringIsUTF8) { then_body } else { else_body } with

#if EMBIND_STD_STRING_IS_UTF8
then_body
#else
else_body
#endif

but stdStringIsUTF8 variable is also being used elsewhere in that file too:

if (stdStringIsUTF8 && valueIsOfTypeString) {

if (stdStringIsUTF8 && valueIsOfTypeString) {

So removing the variable completely and replace its usage with the EMBIND_STD_STRING_IS_UTF8 predecessor doesn't seem to work. Should I just do

    var stdStringIsUTF8
#if EMBIND_STD_STRING_IS_UTF8
    = true;
#else
    = false;
#endif

? Done here (49bd6fb), please let me know if you would like to do something else.

Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specialization for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allowed any type
to be passed for a long time. It looks there have been several attempts
to remove this after which they restored it due to complaints, in
chronological order:
llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned_char>` is not
allowed anymore. This removes all uses of
`std::basic_string<unsigned_char>` from embind.

This needs to be done to update libc++ to LLVM 19 (emscripten-core#22994). I'm
uploading this as a separate PR because this removes a functionality
from embind.
@aheejin aheejin force-pushed the remove_basic_string_unsigned_char branch from aac49df to 49bd6fb Compare December 5, 2024 05:21
@brendandahl
Copy link
Collaborator

I think we could get rid of stdStringIsUTF8 since it's now a static value, but that's going to add a lot more preprocessor junk. Leaving stdStringIsUTF8 should be fine as closure will inline and remove any dead code.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 10, 2024

I'll defer to brendan on for the lgtm

Copy link
Collaborator

@brendandahl brendandahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ but add a note in the changelog that this no longer supported.

@aheejin
Copy link
Member Author

aheejin commented Dec 10, 2024

r+ but add a note in the changelog that this no longer supported.

Done

@aheejin aheejin merged commit fb77f78 into emscripten-core:main Dec 11, 2024
28 checks passed
@aheejin aheejin deleted the remove_basic_string_unsigned_char branch December 11, 2024 00:43
hedwigz pushed a commit to hedwigz/emscripten that referenced this pull request Dec 18, 2024
Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specializations for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allowed any type
to be passed for a long time. It looks there have been several attempts
to remove this after which they restored it due to complaints, in
chronological order:

llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned char>` is not
allowed anymore. This removes all uses of `std::basic_string<unsigned
char>` from embind.

This needs to be done to update libc++ to LLVM 19 (emscripten-core#22994). I'm
uploading this as a separate PR because this removes a functionality
from embind.
aheejin added a commit that referenced this pull request Jan 3, 2025
This updates libcxx and libcxxabi to LLVM 19.1.4:
/~https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.4

The initial update was done using `update_libcxx.py` and
`update_libcxxabi.py`, and subsequent fixes were made in indidual
commits. The commit history here is kind of messy because of CI testing
so not all individual commits are noteworthy.

Additional changes:

- Build libcxx and libcxxabi with C++23:
8b0bfdf

/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/src/expected.cpp
was added in llvm/llvm-project#87390 and this
file assumes C++23 to be compiled. Apparently libc++ sources are always
built with C++23 so they don't guard things against it in `src/`:
llvm/llvm-project#87390 (comment)
This commit also builds libc++abi with C++23 because it doesn't seem
there's any downside to it.

- Exclude newly added `compiler_rt_shims.cpp`:
5bbcbf0
We have excluded files in
/~https://github.com/emscripten-core/emscripten/tree/main/system/lib/libcxx/src/support/win32.
This is a new file added in this directory in
llvm/llvm-project#83575.

- Disable time zone support:
a5f2cbe
We disabled C++20 time zone support in LLVM 18 update (#21638):
df9af64
The list of source files related to time zone support has changed in
llvm/llvm-project#74928, so this commit reflects
it.

- Re-add + update `__assertion_handler` from
`default_assertion_handler.in`:
41f8037
This file was added as a part of LLVM 18 update (#21638) in
8d51927
and mistakenly deleted when I ran `update_libcxx.py`. This file was
copied from
/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/vendor/llvm/default_assertion_handler.in,
so this also updates the file with the newest
`default_assertion_handler.in`.

- `_LIBCPP_PSTL_CPU_BACKEND_SERIAL` -> `_LIBCPP_PSTL_BACKEND_SERIAL`:
4b969c3
The name changed in this update, so reflecting it on our
`__config_site`.

- Directly include `pthread.h` from `emscripten/val.h`:
a5a76c3
  Due to file rearrangements happened, this was necessary.

---

Other things to note:

- `std::basic_string<unsigned_char>` is not supported anymore
The support for `std::basic_string<unsigned_char>`, which was being used
by embind, is removed in this version.#23070 removes the uses from
embind.

- libcxxabi uses `__FILE__` in more places
llvm/llvm-project#80689 started to use
`_LIBCXXABI_ASSERT`, which
[uses](/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxxabi/src/abort_message.h#L22)
`__FILE__`, in `private_typeinfo.cpp`.
`__FILE__` macro produces different paths depending on the local
directory names and how the file is given to clang in the command line,
and this file is included in the result of one of our code size tests,
`other.test_minimal_runtime_code_size_hello_embind`, which caused the
result of the test to vary depending on the CI bots and how the library
is built (e.g., whether by embuilder, ninja, or neither).
Even though this was brought to surface during this LLVM 19 update,
`__FILE__` macro could be a problem for producing reproducible builds
anyway. We discussed this problem in #23195, and the fixes landed in
#23222, #23225, and #23256.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants