-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up v8 heap snapshots #702
Conversation
jdapena
commented
Jun 21, 2023
- Add "Speeding up V8 heap snapshots" blog post.
* Add "Speeding up V8 heap snapshots" blog post.
My email address has now been registered as CLA compliant via my employer. How do we make the CLA bot reassess? |
Co-authored-by: Rob Palmer <rpalmer57@bloomberg.net>
cf33dbe
to
485d04f
Compare
CLA is ready now. I cannot approve running the workflow (nor assign anybody to review the patch). |
@syg are you able to run the workflow? |
Several fixes in accuracy, specially related to NODE_OPTIONS processing, coming from the code review. Also some writing improvements. Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
When we tried to make the post less Node.js specific, we want a bit too far, as the parts related to --heapsnapshot-near-heap-limit are indeed Node.js specific. Provided more clarity on that.
…ation Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
The level of details in teh source position caching fix explanation was quite poor, not even explaining why the source line position was not cached. Expand it after proposal from Joyee Cheung.
In previous version, we talked about "unoptimized development JS" and "optimized production JS", that could be misleading, as Joyee Cheung pointed out, because it could induce reader to think about JS runtime optimizations. So this change removes reference there to optimized/unoptimized JS to later detail how production source code is optimized using bundling and minification.
In the original description, we wrongly state that the heap limit is set by V8 to be 1400MB. Instead of this, it was an application specific limit. Also clarify that, in the test case, hitting Out-Of-Memory would hint there was a leak. Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
@syg < Apparently I cannot add you or any other reviewer to the ticket. Who should review blog posts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of wording suggestions and some remaining questions, but overall looks good, thanks
src/_img/speeding-up-v8-heap-snapshots/59293b17-6d52-4d9e-8737-5b23d038d50a.png
Outdated
Show resolved
Hide resolved
- Once we had strings with a hash key value in lower positions, then the storing of new numbers would offset all the positions. | ||
- Or even worse: if we would introduce a string with a hash key value that would be a low number, and we already found several hundreds or thousands of consecutive numbers, then that value would be moved several hundreds or thousands of positions. | ||
|
||
What did I do to fix it? As the problem comes mostly from numbers represented as strings that would fall in consecutive positions, I modified the hash function so we would rotate the resulting hash value 2 positions to the left. So, for each number, we would introduce 3 free positions. Why 2? Empirical testing across several work-sets showed this number was the best choice to minimize collisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you perhaps add code snippets demonstrating the improved hash directly? V8 blog audience should be used to bitwise manipulation code.
|
||
## Narrowing down the Problem | ||
|
||
Jason Williams started investigating the issue using some V8 parameters. As described in the previous post, V8 has some nice command line parameters that can help with that. These options were used to curate the heap snapshots , simplify the reproduction, and improve observability: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading this section after reading the entire post, I'm unclear on the point of listing the flags in detail. It seems almost incidental: here are the flags that we were using to capture heap snapshots, and where we observed surprising slowdowns. Could this section be shortened? Perhaps I'm missing the intention though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially the problem was detected just extracting regular heap snapshots from DevTools.
But using these command line arguments allowed to get a detailed breakdown of what was happening, and also allowed to reproduce the heap snapshot performance problem several times, without requiring to use a remote DevTools connection.
So this is part of the investigation steps (increasing and simplifying reproducibility).
In general, the intent of the whole post is not only explaining the specific fixes, but the whole investigation procedure that led to them.
Many insightful improvements to the writing of heap snapshot optimizations blog post, that can be landed altogether. Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
"And old friend" reference was for the ETW fix that I presented in a different blog post that is not even linked. And it is redundant as later I explain that this issue is similar to another one fixed in ETW. Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
The images in the v8 heap snapshot optimizations post were too much big, with lots of empty space that was not relevant and made it harder to read the contents. After cropping the images, not it is more focused on the information that matters.
Co-authored-by: Shu-yu Guo <syg@chromium.org>
Manually apply suggestions that could not be applied directly from the GH UI.
…vestigation Following @syg recommendation, avoid explicit engineers reference while explaining the procedure to investigate the performance problem.
V8 team is not sponsored. Work is actually done by engineers of different companies. Rewrite the credits section accordingly.
df811e8
to
3811c74
Compare
Provided a new version of the explanation of the hash algorithm changes, with value examples, and snippets of the original and the new hash functions. |
Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
…previous post Originally the blog post was part of a series until it was proposed to be an independent post in v8.dev. Make the reference to the previous post independent of the fact by referring to the link of the previous post.
…mplified writing Several changes to reduce redundancies and simplify reading the post: - Remove step-by-step guide of using Windows Performance Recording, referring to upstream documentation now. - Snippets are now almost C++ code instead of pseudocode. - Some typos. - Simplified line break caching explanation, and removed reference to equivalent ETW fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good % remaining comments! Thanks for the many rounds of iterations.
|
||
Furthermore, we found the problem happened on both Windows and Linux. The problem was also not platform-specific. | ||
|
||
## Windows Performance Analyzer to the rescue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this section adds to the article.
Co-authored-by: Shu-yu Guo <syg@chromium.org>
Applied changes. Only pending running the workflow and merging. I'd prefer to keep the Windows Performance Analyzer section, that is now really small, as it still give readers hint about the tool used for the investigation. |
I still think something as short as "I used Windows Performance Toolkit to do the profiling" suffices. I feel like that workflow (1) isn't generally applicable and (2) out of character for the V8 blog. V8 blog posts tend to be about highlighting improvements to the infrastructure, or VM guts, or language hacks, not play-by-play accounts of how we achieved the results. |
d930645
to
b065c43
Compare
@syg < I finally removed the WPA section, and just made the reference that it was the tool used (that is visible in the screenshots of the reports). I cannot apply the resulting patch myself now it is approved. BTW, thanks everybody for all the careful review. Post is now far better! |
|
||
Normally, after V8 finishes calculating the offsets of line breaks in a script, it caches them in a newly allocated array attached to the script. Unfortunately, the snapshot implementation cannot modify the heap when traversing it, so the newly calculated line information cannot be cached. | ||
|
||
The solution? Before generating the heap snapshot, we now iterate over all the scripts in the V8 context to compute and cache the offsets of the line breaks. As this is not done when we traverse the heap for heap snapshot generation, it is still possible to modify the heap and store the source line positions as a cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution? Before generating the heap snapshot, we now iterate over all the scripts in the V8 context to compute and cache the offsets of the line breaks. As this is not done when we traverse the heap for heap snapshot generation, it is still possible to modify the heap and store the source line positions as a cache. | |
The solution? Before generating the heap snapshot, we now iterate over all the scripts in the V8 context to compute and cache the offsets of the line breaks. As this is done when we traverse the heap for heap snapshot generation, it is still possible to modify the heap and store the source line positions as a cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original sentence was OK. When we traverse the heap for heap snapshot generation we cannot add entries to the heap. That's the reason we do that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I think I misread 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for the many rounds
Any last concerns before I publish? @joyeecheung? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks