Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101525: Disable peephole optimization process of BOLT #103187

Merged
merged 1 commit into from
Apr 5, 2023

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Apr 2, 2023

Background

Experimental optimization techniques using LLVM-BOLT have been applied in CPython 3.12, but in order to provide this as an official feature, it must be ensured that the optimized binary does not change the expected behavior.

It has been confirmed that the peephole optimization provided by LLVM-BOLT re-raises the #53093 issue, and it passes the unit tests normally when this is disabled.

A detailed explanation of LLVM_BOLT will be shared at this year's Python Language Summit lightning talk, and materials will be published after the presentation.
(Also, I already shared the draft material at Discord of Python core team)

Performance Impact

I measured that enabling or disabling peephole optimization of LLVM-BOLT has almost no impact on performance, and it shows only noise-level results in the pyperformance benchmarks also.

pyperformance

Benchmark as-is to-be
async_generators 531 ms 550 ms: 1.04x slower
async_tree_cpu_io_mixed 1.10 sec 1.10 sec: 1.01x slower
chameleon 10.3 ms 10.4 ms: 1.01x slower
chaos 104 ms 103 ms: 1.01x faster
bench_thread_pool 1.24 ms 1.25 ms: 1.01x slower
coroutines 44.0 ms 44.2 ms: 1.01x slower
coverage 151 ms 147 ms: 1.02x faster
crypto_pyaes 118 ms 118 ms: 1.01x slower
deepcopy_reduce 4.68 us 4.70 us: 1.01x slower
deltablue 5.32 ms 5.28 ms: 1.01x faster
django_template 53.0 ms 54.3 ms: 1.02x slower
dulwich_log 110 ms 111 ms: 1.01x slower
fannkuch 630 ms 611 ms: 1.03x faster
float 116 ms 115 ms: 1.01x faster
generators 112 ms 112 ms: 1.00x slower
genshi_text 34.3 ms 34.5 ms: 1.01x slower
go 214 ms 212 ms: 1.01x faster
hexiom 9.28 ms 9.32 ms: 1.01x slower
json_loads 38.6 us 39.1 us: 1.01x slower
logging_format 10.1 us 10.0 us: 1.01x faster
logging_silent 157 ns 156 ns: 1.01x faster
logging_simple 9.04 us 9.14 us: 1.01x slower
mako 15.4 ms 15.7 ms: 1.02x slower
meteor_contest 165 ms 166 ms: 1.01x slower
nbody 166 ms 160 ms: 1.04x faster
nqueens 130 ms 132 ms: 1.02x slower
pathlib 26.9 ms 27.3 ms: 1.01x slower
pickle_list 5.94 us 5.90 us: 1.01x faster
pidigits 303 ms 303 ms: 1.00x slower
pprint_safe_repr 1.11 sec 1.11 sec: 1.01x slower
pprint_pformat 2.26 sec 2.27 sec: 1.00x slower
python_startup 12.7 ms 12.7 ms: 1.00x slower
python_startup_no_site 9.40 ms 9.39 ms: 1.00x faster
raytrace 461 ms 463 ms: 1.00x slower
regex_compile 194 ms 196 ms: 1.01x slower
regex_effbot 4.87 ms 4.80 ms: 1.01x faster
richards 66.6 ms 67.5 ms: 1.01x slower
scimark_fft 497 ms 492 ms: 1.01x faster
scimark_lu 178 ms 183 ms: 1.03x slower
scimark_monte_carlo 107 ms 106 ms: 1.01x faster
scimark_sor 182 ms 180 ms: 1.01x faster
scimark_sparse_mat_mult 6.30 ms 6.36 ms: 1.01x slower
spectral_norm 165 ms 169 ms: 1.02x slower
sqlglot_optimize 83.0 ms 83.4 ms: 1.00x slower
sqlglot_normalize 172 ms 173 ms: 1.00x slower
sqlite_synth 3.89 us 3.84 us: 1.01x faster
sympy_expand 742 ms 745 ms: 1.00x slower
unpickle_list 7.53 us 7.50 us: 1.00x faster
xml_etree_generate 132 ms 131 ms: 1.01x faster
Geometric mean (ref) 1.00x slower

Benchmark hidden because not significant (31): 2to3, async_tree_none, async_tree_io, async_tree_memoization, bench_mp_pool, deepcopy, deepcopy_memo, docutils, genshi_xml, html5lib, json_dumps, mdp, pickle, pickle_dict, pickle_pure_python, pyflate, regex_dna, regex_v8, sqlglot_parse, sqlglot_transpile, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_process

@corona10
Copy link
Member Author

corona10 commented Apr 2, 2023

@gvanrossum @carljm @erlend-aasland
I will merge this PR by next week if there is no serious objection.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

Do we know why/how BOLT's peephole optimizations break compatibility? It seems that either we are implicitly relying on some undefined behavior, or the optimization is broken (and the BOLT team would likely be interested to know about that.)

@corona10
Copy link
Member Author

corona10 commented Apr 5, 2023

Do we know why/how BOLT's peephole optimizations break compatibility?

I haven't reached that point yet, I think I need to acquire assembly code after BOLTed,
I will try to investigate the point too.

@corona10 corona10 merged commit a62ff97 into python:main Apr 5, 2023
@corona10 corona10 deleted the gh-101525 branch April 5, 2023 00:10
gaogaotiantian pushed a commit to gaogaotiantian/cpython that referenced this pull request Apr 8, 2023
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants