Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: improve promise based readFile performance for big files #44295

Merged

Conversation

BridgeAR
Copy link
Member

This significantly reduces the peak memory for the promise
based readFile operation by reusing a single memory chunk after
each read and strinigifying that chunk immediately.

Refs: #44239 (reply in thread)

Signed-off-by: Ruben Bridgewater ruben@bridgewater.de

Benchmark with a few runs:

                                                              confidence improvement accuracy (*)    (**)   (***)
fs/readfile-promises.js concurrent=1 len=1024 duration=5                     -1.18 %       ±7.07%  ±9.67% ±13.12%
fs/readfile-promises.js concurrent=1 len=16777216 duration=5                 -2.11 %       ±3.34%  ±4.56%  ±6.20%
fs/readfile-promises.js concurrent=1 len=33554432 duration=5         ***     59.48 %       ±5.41%  ±7.45% ±10.25%
fs/readfile-promises.js concurrent=1 len=4194304 duration=5          ***      9.95 %       ±5.10%  ±6.93%  ±9.32%
fs/readfile-promises.js concurrent=1 len=524288 duration=5                    1.88 %       ±6.77%  ±9.22% ±12.45%
fs/readfile-promises.js concurrent=1 len=8388608 duration=5           **      6.18 %       ±4.19%  ±5.70%  ±7.69%
fs/readfile-promises.js concurrent=10 len=1024 duration=5                    -1.21 %       ±8.11% ±11.12% ±15.15%
fs/readfile-promises.js concurrent=10 len=16777216 duration=5        ***      9.72 %       ±5.07%  ±6.92%  ±9.36%
fs/readfile-promises.js concurrent=10 len=33554432 duration=5        ***     26.58 %       ±6.29%  ±8.56% ±11.56%
fs/readfile-promises.js concurrent=10 len=4194304 duration=5          **      9.66 %       ±5.88%  ±8.00% ±10.77%
fs/readfile-promises.js concurrent=10 len=524288 duration=5                   1.38 %       ±5.49%  ±7.47% ±10.07%
fs/readfile-promises.js concurrent=10 len=8388608 duration=5         ***     16.84 %       ±4.35%  ±5.93%  ±8.02%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 12 comparisons, you can thus
expect the following amount of false-positive results:
  0.60 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.12 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

@BridgeAR BridgeAR requested a review from mcollina August 19, 2022 15:12
@nodejs-github-bot nodejs-github-bot added fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Aug 19, 2022
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@BridgeAR BridgeAR marked this pull request as draft August 19, 2022 18:50
@BridgeAR BridgeAR force-pushed the improve-fs-promises-readfile branch from 744bbd9 to e72802b Compare August 20, 2022 01:10
@BridgeAR BridgeAR marked this pull request as ready for review August 20, 2022 01:11
@BridgeAR
Copy link
Member Author

BridgeAR commented Aug 20, 2022

We might consider a similar fix for #41435. If I checked correct, it outperforms the regular readFile variant for big files.
The lower peak memory and chunked stringify operations is better than less C++/JS boundary crossings.

@mcollina
Copy link
Member

Does this also impact "old" callback- based readFile?

@BridgeAR
Copy link
Member Author

Does this also impact "old" callback- based readFile?

No. This is solely about the promise based readFile with encodings. The non encoding part is also not impacted.
The implementation of the callback-based readFile differs from the promise based one.

23:39:15                                                                               confidence improvement accuracy (*)    (**)   (***)
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=1024 duration=5              *     -9.10 %       ±8.50% ±11.31% ±14.73%
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=16777216 duration=5         **      7.96 %       ±5.27%  ±7.02%  ±9.16%
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=33554432 duration=5        ***     63.48 %       ±6.59%  ±8.81% ±11.56%
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=4194304 duration=5                  2.20 %       ±3.73%  ±4.97%  ±6.47%
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=524288 duration=5                  -0.89 %       ±2.44%  ±3.25%  ±4.23%
23:39:15 fs/readfile-promises.js concurrent=10 encoding='utf8' len=8388608 duration=5                  4.81 %       ±5.08%  ±6.77%  ±8.84%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=1024 duration=5                     -0.63 %       ±2.87%  ±3.82%  ±4.98%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=16777216 duration=5         ***     -9.93 %       ±4.71%  ±6.28%  ±8.21%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=33554432 duration=5         ***     75.47 %      ±10.06% ±13.56% ±17.98%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=4194304 duration=5          ***    -11.12 %       ±5.45%  ±7.30%  ±9.61%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=524288 duration=5                   -2.80 %       ±5.00%  ±6.65%  ±8.67%
23:39:15 fs/readfile-promises.js concurrent=1 encoding='utf8' len=8388608 duration=5          ***      9.33 %       ±3.50%  ±4.66%  ±6.07%

I ran the benchmark on our machines and there seems to be a weird effect for some file sizes where there's a performance drop. I guess it has to do with how V8 handles some things internally.
Reads up to 512kb are not impacted (the one star is misleading) and big reads (all tests above 32mb) are always significantly faster. In-between there are some file sizes that profit and some where there's a ~10% performance hit.

@BridgeAR
Copy link
Member Author

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina
Copy link
Member

@BridgeAR are you planning to do the same for the sync and callback versions too?

@BridgeAR BridgeAR force-pushed the improve-fs-promises-readfile branch 3 times, most recently from 3f028d6 to 83562ca Compare October 4, 2022 23:06
This significantly reduces the peak memory for the promise
based readFile operation by reusing a single memory chunk after
each read and strinigifying that chunk immediately.

Signed-off-by: Ruben Bridgewater <ruben@bridgewater.de>
@BridgeAR BridgeAR force-pushed the improve-fs-promises-readfile branch from 4d4997b to 6c92996 Compare October 4, 2022 23:07
@BridgeAR BridgeAR added request-ci Add this label to start a Jenkins CI on a PR. and removed needs-ci PRs that need a full CI run. labels Oct 5, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 5, 2022
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@BridgeAR
Copy link
Member Author

BridgeAR commented Oct 5, 2022

are you planning to do the same for the sync and callback versions too?

@mcollina I suggest to merge this and I am looking into the other part afterwards.

@nodejs-github-bot
Copy link
Collaborator

@mcollina mcollina added the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 6, 2022
@mcollina
Copy link
Member

mcollina commented Oct 6, 2022

Yes, let's do it.

@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 6, 2022
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/44295
✔  Done loading data for nodejs/node/pull/44295
----------------------------------- PR info ------------------------------------
Title      fs: improve promise based readFile performance for big files (#44295)
Author     Ruben Bridgewater  (@BridgeAR)
Branch     BridgeAR:improve-fs-promises-readfile -> nodejs:main
Labels     fs
Commits    1
 - fs: improve promise based readFile performance for big files
Committers 1
 - Ruben Bridgewater 
PR-URL: /~https://github.com/nodejs/node/pull/44295
Reviewed-By: Matteo Collina 
Reviewed-By: James M Snell 
------------------------------ Generated metadata ------------------------------
PR-URL: /~https://github.com/nodejs/node/pull/44295
Reviewed-By: Matteo Collina 
Reviewed-By: James M Snell 
--------------------------------------------------------------------------------
   ⚠  Commits were pushed since the last review:
   ⚠  - fs: improve promise based readFile performance for big files
   ℹ  This PR was created on Fri, 19 Aug 2022 15:12:52 GMT
   ✔  Approvals: 2
   ✔  - Matteo Collina (@mcollina) (TSC): /~https://github.com/nodejs/node/pull/44295#pullrequestreview-1080779016
   ✔  - James M Snell (@jasnell) (TSC): /~https://github.com/nodejs/node/pull/44295#pullrequestreview-1087869731
   ✔  Last GitHub CI successful
   ℹ  Last Benchmark CI on 2022-08-20T10:00:08Z: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1176/
   ℹ  Last Full PR CI on 2022-10-06T13:18:58Z: https://ci.nodejs.org/job/node-test-pull-request/47101/
- Querying data for job/node-test-pull-request/47101/
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  Aborted `git node land` session in /home/runner/work/node/node/.ncu
/~https://github.com/nodejs/node/actions/runs/3198926242

@nodejs-github-bot nodejs-github-bot added the commit-queue-failed An error occurred while landing this pull request using GitHub Actions. label Oct 6, 2022
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added commit-queue Add this label to land a pull request using GitHub Actions. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Oct 6, 2022
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 6, 2022
@nodejs-github-bot nodejs-github-bot merged commit d3dd49f into nodejs:main Oct 6, 2022
@nodejs-github-bot
Copy link
Collaborator

Landed in d3dd49f

danielleadams pushed a commit that referenced this pull request Oct 11, 2022
This significantly reduces the peak memory for the promise
based readFile operation by reusing a single memory chunk after
each read and strinigifying that chunk immediately.

Signed-off-by: Ruben Bridgewater <ruben@bridgewater.de>
PR-URL: #44295
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants