Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use consistent prefix map when building deterministic system libraries #23222

Merged
merged 3 commits into from
Dec 19, 2024

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Dec 19, 2024

When deterministic_paths is set, we are currently using -ffile-prefix-map to produce the same path in data and debug info. In the case of absolute paths, their emscripten path is replaced with a fake path /emsdk/emscripten, and in the case of relative paths, all path relative to the emscripten directory is removed, so ../../system/lib/somefile.c becomes system/lib/somefiles.c.

if self.deterministic_paths:
source_dir = utils.path_from_root()
relative_source_dir = os.path.relpath(source_dir, build_dir)
cflags += [f'-ffile-prefix-map={source_dir}=/emsdk/emscripten',
f'-ffile-prefix-map={relative_source_dir}/=',
'-fdebug-compilation-dir=/emsdk/emscripten']
if self.deterministic_paths:
source_dir = utils.path_from_root()
if batch_inputs:
relative_source_dir = os.path.relpath(source_dir, build_dir)
cflags += [f'-ffile-prefix-map={relative_source_dir}/=']
cflags += [f'-ffile-prefix-map={source_dir}=/emsdk/emscripten',
'-fdebug-compilation-dir=/emsdk/emscripten']

So this does not make relative paths and absolute paths the same, which can lead to different builds depending on whether the command line uses absolute paths vs. relative ones. Currently we use relative paths when EMCC_BATCH_BUILD is set. And Ninja builds cannot use relative paths.

This is also what was suggested in #23195 (comment) while discussing __FILE__ problem in #23195.

This does not change any code size tests because no code size tests happens to contain __FILE__ at the moment. (One will be added in #22994)

When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
/~https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
/~https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
emscripten-core#23195 (comment)
while discussins `__FILE__` problem in #23915.

This does not change any code size tests because no code size tests
happens to contain `__FILE__` at the moment. (One will be added in
 emscripten-core#22994)
@sbc100
Copy link
Collaborator

sbc100 commented Dec 19, 2024

Looks like the test_dwarf_system_lib failure is real.

Can we come up with a better PR title? Something like "Use consistent prefix map when building deterministic system libraries"?

@@ -42,6 +42,10 @@
# link time.
USE_NINJA = int(os.environ.get('EMCC_USE_NINJA', '0'))

# A fake emscripten path to use in __FILE__ macro and debug info to produce
# reproducible builds across platforms.
FAKE_EMSCRIPTEN_PATH = '/emsdk/emscripten'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DETERMINISITIC_PREFIX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments and tests passing

@aheejin aheejin changed the title Use /emsdk/emscripten for relative path substitution Use consistent prefix map when building deterministic system libraries Dec 19, 2024
@aheejin
Copy link
Member Author

aheejin commented Dec 19, 2024

Looks like the test_dwarf_system_lib failure is real.

Fixed.

Can we come up with a better PR title? Something like "Use consistent prefix map when building deterministic system libraries"?

Done.

'-fdebug-compilation-dir=/emsdk/emscripten']
cflags += [f'-ffile-prefix-map={relative_source_dir}={DETERMINISITIC_PREFIX}']
cflags += [f'-ffile-prefix-map={source_dir}={DETERMINISITIC_PREFIX}',
f'-fdebug-compilation-dir={DETERMINISITIC_PREFIX}']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these two chunks could be combined into self.get_cflags() rather than duplicated? Can be part of the followup of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #23225.

@aheejin aheejin merged commit 971d4f6 into emscripten-core:main Dec 19, 2024
29 checks passed
@aheejin aheejin deleted the fake_path branch December 19, 2024 04:35
aheejin added a commit to aheejin/emscripten that referenced this pull request Dec 19, 2024
This factors duplicate routines to add deterministic prefix paths out to
`Library.get_cflags()`. Suggested in
emscripten-core#23222 (comment).
aheejin added a commit that referenced this pull request Dec 20, 2024
This factors duplicate routines to add deterministic prefix paths out to
`Library.get_cflags()`. Suggested in

#23222 (comment).
aheejin added a commit that referenced this pull request Jan 3, 2025
This updates libcxx and libcxxabi to LLVM 19.1.4:
/~https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.4

The initial update was done using `update_libcxx.py` and
`update_libcxxabi.py`, and subsequent fixes were made in indidual
commits. The commit history here is kind of messy because of CI testing
so not all individual commits are noteworthy.

Additional changes:

- Build libcxx and libcxxabi with C++23:
8b0bfdf

/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/src/expected.cpp
was added in llvm/llvm-project#87390 and this
file assumes C++23 to be compiled. Apparently libc++ sources are always
built with C++23 so they don't guard things against it in `src/`:
llvm/llvm-project#87390 (comment)
This commit also builds libc++abi with C++23 because it doesn't seem
there's any downside to it.

- Exclude newly added `compiler_rt_shims.cpp`:
5bbcbf0
We have excluded files in
/~https://github.com/emscripten-core/emscripten/tree/main/system/lib/libcxx/src/support/win32.
This is a new file added in this directory in
llvm/llvm-project#83575.

- Disable time zone support:
a5f2cbe
We disabled C++20 time zone support in LLVM 18 update (#21638):
df9af64
The list of source files related to time zone support has changed in
llvm/llvm-project#74928, so this commit reflects
it.

- Re-add + update `__assertion_handler` from
`default_assertion_handler.in`:
41f8037
This file was added as a part of LLVM 18 update (#21638) in
8d51927
and mistakenly deleted when I ran `update_libcxx.py`. This file was
copied from
/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/vendor/llvm/default_assertion_handler.in,
so this also updates the file with the newest
`default_assertion_handler.in`.

- `_LIBCPP_PSTL_CPU_BACKEND_SERIAL` -> `_LIBCPP_PSTL_BACKEND_SERIAL`:
4b969c3
The name changed in this update, so reflecting it on our
`__config_site`.

- Directly include `pthread.h` from `emscripten/val.h`:
a5a76c3
  Due to file rearrangements happened, this was necessary.

---

Other things to note:

- `std::basic_string<unsigned_char>` is not supported anymore
The support for `std::basic_string<unsigned_char>`, which was being used
by embind, is removed in this version.#23070 removes the uses from
embind.

- libcxxabi uses `__FILE__` in more places
llvm/llvm-project#80689 started to use
`_LIBCXXABI_ASSERT`, which
[uses](/~https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxxabi/src/abort_message.h#L22)
`__FILE__`, in `private_typeinfo.cpp`.
`__FILE__` macro produces different paths depending on the local
directory names and how the file is given to clang in the command line,
and this file is included in the result of one of our code size tests,
`other.test_minimal_runtime_code_size_hello_embind`, which caused the
result of the test to vary depending on the CI bots and how the library
is built (e.g., whether by embuilder, ninja, or neither).
Even though this was brought to surface during this LLVM 19 update,
`__FILE__` macro could be a problem for producing reproducible builds
anyway. We discussed this problem in #23195, and the fixes landed in
#23222, #23225, and #23256.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants