Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next >12.0.1 causes memory leak after upgrading from Next 11 #36703

Closed
1 task done
jflayhart opened this issue May 5, 2022 · 26 comments
Closed
1 task done

Next >12.0.1 causes memory leak after upgrading from Next 11 #36703

jflayhart opened this issue May 5, 2022 · 26 comments
Labels
bug Issue was opened via the bug report template.

Comments

@jflayhart
Copy link
Contributor

jflayhart commented May 5, 2022

Verify canary release

  • I verified that the issue exists in Next.js canary release

Provide environment information

$ next info
(node:39290) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

What browser are you using? (if relevant)

Chrome 100

How are you deploying your application? (if relevant)

custom express server running on GCP distributed across multiple containers running 1GB memory

Describe the Bug

We're in a bit of a catch22 because we tried to upgrade to 12.0.0, which works, but then our Lighthouse scores started failing due to a security vulnerability in 12.0.0. I don't know if y'all ever fixed that but something was done in 12.0.5 it seems based on the change log, so we tried that, which created another issue: memory leaks.

After upgrading to 12.0.5 a new issue occurred where the prod optimized build app simply crashes due to OOM errors:

<--- Last few GCs --->
12:22:59.957 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver [27:0x7f24d13882c0]   755827 ms: Mark-sweep (reduce) 770.6 (773.1) -> 770.3 (774.9) MB, 998.9 / 0.0 ms  (average mu = 0.100, current mu = 0.001) allocation failure scavenge might not succeed
12:22:59.958 dev-dev-uc1-gke next-4072-webserver [27:0x7f24d13882c0]   756837 ms: Mark-sweep (reduce) 771.3 (776.9) -> 771.1 (777.6) MB, 1009.1 / 0.0 ms  (average mu = 0.052, current mu = 0.001) allocation failure scavenge might not succeed
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver <--- JS stacktrace --->
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
12:23:00.016 dev-dev-uc1-gke next-4072-webserver error Command failed with signal "SIGABRT".

I tried 12.1 and even canary releases to see if anything worked itself out. Same problem. I tracked down a memory leak with growing Promises and Arrays within webpack runtime and cjs-loader and tslib (whatever that means) when building prod build locally and running node --inspect:

image

Screen Shot 2022-05-03 at 1 52 29 PM

In the end, I found 12.0.1 to be the only version after 12.0.0 with no leak; going to 12.0.2 starts the memory leak and crashes the app in the same way.

The prime suspects are the following deps we have installed based on their interaction with promises and arrays:

  • next >=12.0.2
  • node 14.17.6
  • express 4.17.1
  • babel?
  • axios?
  • styled-components 5.3.3
  • react 17.0.2
  • redux
  • typescript 4.3.2

Already tried upgrading node to latest LTS (16), still doesn't work. Local dev runs "fine" albeit slow, but local builds don't crash. Only prod optimized builds

Expected Behavior

App is expected to NOT crash, so I upgraded my personal website to 12.1, which is hosted on Vercel and has less dependencies than the enterprise, e-commerce site I am working on. Well, it is fine and running great, so makes me think the issue is more complex possibly around the use of a custom server in the enterprise app and/or Next12 isn't playing nice with some node deps.

When I ran this in 12.0.0 and 12.0.1 the memory leak is non-existent:

Screen Shot 2022-05-03 at 3 30 33 PM

It is not until 12.0.2 that we experience the memory leak! and I've gone through your change logs and nothing looks suspicious.

To Reproduce

Unable to reproduce with custom server next example, so it's likely because we're using an enterprise app and something (ie other dependencies) isn't playing nice with next12 and would like to understand what could be causing that.

I can only assume something around SWC would have changed the build process so when it runs in node it gets wonky? Just speculating at this point because we can't upgrade to next12 at all.

But in our enterprise app, I simply do the following to reproduce consistently every time:

  • Next >12.0.1
  • yarn build (uses babel, not SWC)
  • NODE_ENV=production node --inspect build/server/index.js
@jflayhart jflayhart added the bug Issue was opened via the bug report template. label May 5, 2022
@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

Curious if there are any known dependencies that Next requires to be updated for Next12+? What changed specifically in >=12.0.2 that could be causing growing Promises and array objects on a node server?

I could provide our package.json deps if that helps, but I am not exactly sure if it would be the fault of prod deps or devDeps due to the build process possibly causing issue.

@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

Possibly related

As a temporary workaround you can disable esmExternals

What does this mean exactly? What is this disabling as a workaround? I'd prefer a real fix rather than a workaround :D What are the tradeoffs with disabling esmExternals?

@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

@timneutkens sorry to directly ping you, but es-modules-support definitely seems to be the culprit and I don't understand why. Could you please shed some light onto what changed in Next12? Y'all said Next12 is "backwards compatible" but it seems that is not the case with enabling by default instead of opt-in:

ES Modules support includes backward compatibility to support the previous import behavior of CommonJS. In Next.js 12, esmExternals: true will become the default

That doesn't seem like a backwards compat change given the amount of mem leaks people are running into 🤷🏻 But maybe I am misunderstanding. Setting to false "works" so I can now run Next >12.0.1 (including latest), but is that the right thing to do? Again, what are we disabling exactly and why do we want to do this? I don't want to do this just cause "now it works" I'd like to understand any tradeoffs. Thanks!

@timneutkens
Copy link
Member

timneutkens commented May 5, 2022

given the amount of mem leaks people are running into

As far as we've seen on the other issues the webpack config was customized through next-transpile-modules which is not officially supported as it's a community customization of webpack config, could you confirm you're also using next-transpile-modules? If you have a reproduction that'd be great given that the other issues don't have a reproduction provided which makes it impossible to investigate further, so far my suspicion is that next-transpile-modules has a memory leak but we couldn't reproduce it in a new app so it might be related to specific combination of imports.

We're planning to build something like next-transpile-modules into Next.js to have an official supported way to do something like that: #35150

Y'all said Next12 is "backwards compatible" but it seems that is not the case with enabling by default instead of opt-in:

It is backwards compatible. It passes all integration tests and we test Next.js against the internal apps that Vercel has like the vercel.com app. These are among the largest apps we've seen to date.
What you can't expect from customizing webpack config is that your customization of the build pipeline is backwards compatible between versions as you get full control over the webpack config.

What are the tradeoffs with disabling esmExternals?

Resolving of ESM packages in node_modules will be incorrect / not work. It won't resolve packages with "type": "module". More here: https://nextjs.org/blog/next-11-1#es-modules-support

@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

@timneutkens awesome thanks so much for that explanation! I trust that y'all have the right tests in place and it's hard to know what enterprise, custom apps are doing with your framework. I get it.

We do NOT use next-transpile-modules

Considering that the esmExternals: false change seems to help most people out of this OOM issue, including me, should we add it to docs as a last resort "backwards compat" config change? For example, in Next12 upgrade guide:

"if you experience [this OOM error] first try setting esmExternals: false since it was turned on by default in Next12. Doing this will...."?

What do you think?

@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

If you have a reproduction that'd be great given that the other issues don't have a reproduction provided which makes it impossible to investigate further, so far my suspicion is that next-transpile-modules has a memory leak but we couldn't reproduce it in a new app so it might be related to specific combination of imports.

I've spent so much time already, so I may as well work on a solid reproduction outside a large enterprise app 😅 The problem is I don't know what in our npm spaghetti is causing the conflict with Next12, so hard to reproduce with only a handful of deps in an isolated sandbox. Will try my best...

@timneutkens
Copy link
Member

@timneutkens awesome thanks so much for that explanation! I trust that y'all have the right tests in place and it's hard to know what enterprise, custom apps are doing with your framework. I get it.

Considering that the esmExternals: false change seems to help most people out of this OOM issue, should we add it to docs as a last resort "backwards compat" config change? For example, in Next12 upgrade guide:

"if you experience [this OOM error] first try setting esmExternals: false since it was turned on by default in Next12. Doing this will...."?

What do you think?

I'd prefer to fix the OOM even if it's not specifically in Next.js through a combination of deps. That flag will definitely go away in the future. Main problem so far is that there is no reproduction provided that would allow us to spend time on it as I'd be happy to have someone on our team look at it if one is provided.

I've spent so much time already, so I may as well work on a solid reproduction outside a large enterprise app 😅 The problem is I don't know what in our npm spaghetti is causing the conflict with Next12, so hard to reproduce with only a handful of deps in an isolated sandbox. Will try my best...

That'd be great! 🙏

@jflayhart
Copy link
Contributor Author

jflayhart commented May 5, 2022

@timneutkens I can start to reproduce what may eventually lead to a OOM server kill in my sandbox env, but it is so small of an app I can't get it to crash (yet). It makes sense why a large, enterprise site would crash more quickly due to its size and need for more resources.

That being said, the memory retained size trends are the same in heap snapshots between our proprietary enterprise site and my personal sandbox site, where obviously the latter is far less resource intensive. But I have reproduced the same memory leak!

To reproduce, I installed deps based on what I assumed to be the main offenders from preliminary R&D on the enterprise site:

  • custom express server
  • styled components (oh btw, it returns a Promise which is sus)
  • typescript
  • babel (no swc)
  1. checkout /~https://github.com/jflayhart/website/tree/repro-oom
  2. yarn install
  3. yarn build
  4. yarn start
  5. Use Chrome and go to http://localhost:8080 and use chrome://inspect to open node DevTools
  6. Take heap snapshot
  7. refresh http://localhost:8080 a ton of times (eg for mac users, hold down cmd+r for about 30s)
  8. Take heap snapshot
  9. repeat steps 7-8 and even wait longer holding down the refresh hotkey to rapid refresh the site
  10. You will see some constructors grow over time; Promises almost exponentially

At its core, my sandboxed env is barebones version of our enterprise app, where the memory leak is mainly around Promises being held in the global namespace:

Screen Shot 2022-05-05 at 3 16 08 PM

Screen Shot 2022-05-05 at 3 16 22 PM

Screen Shot 2022-05-05 at 3 16 59 PM

You can imagine this is orders of magnitude worse for an enterprise app where, assuming the culprit is styled-components, there are thousands more lines of css-in-jss.

This is all a working theory, but it is fact that Promises and other constructors are retaining memory over time (see promises start growing up to the top in the last screenshot):

Screen Shot 2022-05-05 at 3 24 11 PM

Screen Shot 2022-05-05 at 3 24 24 PM


Could this mean something is wrong with _document's initialProps usage with styled-component? My working theory is this is part of the issue

UPDATE: Setting esmExternals: false on sandbox project still shows growing retained memory size, but at a MUCH slower rate and after a while most of it gets GCd correctly:
Screen Shot 2022-05-06 at 12 49 30 PM

@nol13
Copy link

nol13 commented May 6, 2022

not much additional info to provide, but we had to downgrade from 12.1.4 to 12.0.7 to fix a memory leak crash in the node process. can say that it was unrelated to which compiler we used, tried both. 12.0.7 not having the issue though

@jchatrny
Copy link

jchatrny commented May 9, 2022

I've just tested and downgrading to 12.0.1 with babel fallback didn't solve issue for us. Issue described in #32314 still occurs (FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory).

  • node 16.13.1
  • babel 7.16.7
  • styled-components 5.3.3
  • react 17.0.2
  • typescript 4.5.4
  • redux 4.1.1
  • @reduxjs/toolkit 1.8.1

For now we use dynamic imports for reducers as workaround.

@jflayhart
Copy link
Contributor Author

jflayhart commented May 10, 2022

not much additional info to provide, but we had to downgrade from 12.1.4 to 12.0.7 to fix a memory leak crash in the node process. can say that it was unrelated to which compiler we used, tried both. 12.0.7 not having the issue though

@nol13 could you please list out your deps as well? I'm trying to see if there's a pattern here, for example, do you have styled-components perhaps? Axios? react-query? What could be causing Node's Promises leak (eg. nodejs/node#34328)?

@nol13
Copy link

nol13 commented May 12, 2022

none of those

"next": "12.0.7",
"next-connect": "0.10.1",
"react": "17.0.2",
"react-redux": "7.2.4",
"request": "^2.88.2",
"send": "^0.17.1",
"async-retry": "1.2.3",

@jflayhart
Copy link
Contributor Author

Hmm is redux the common thread here? @jchatrny what do you mean when you say "we use dynamic imports for reducers as workaround"?

@hypeJunction
Copy link

I have been fighting with heap memory issues for hours now, and nothing seemed to help. It started randomly this morning, and have been driving me crazy since. The only thing that helped is wiping .next directory and starting the server again. Maybe it helps someone with the knowledge of internals to figure out what's happening. Seems that webpack is trying to do something it shouldn't. I tried to inspect with node --inspect, but the traces were quite useless.

@LuudJanssen
Copy link
Contributor

LuudJanssen commented Jun 20, 2022

We are also experiencing this issue on our enterprise grade app, where we see exactly the same behavior as @jflayhart. When trying to reproduce it on a small project, the node process doesn't get killed, because it never reaches the critical memory point. However, for our large project, memory usage skyrockets to about 5GB before ending on heap errors.

Turning off esmExternals solved the issue.

It seems to be happening after the compilation has been done (the server logs that the compilation of the page has been done).

These are the dependencies we use:

  1. next 12.1.5
  2. typescript 4.6.3
  3. react 17.0.2
  4. @chakra-ui/react 1.8.8
  5. @chakra-ui/theme 1.14.1
  6. @chakra-ui/theme-tools 1.3.6
  7. next-i18next 11.0.0
  8. Joi 17.6.0
  9. dayjs 1.11.1
  10. lottie-react 2.2.1
  11. lottie-web 5.9.2
  12. @emotion/react 11.9.0
  13. @emotion/styled 11.8.1
  14. framer-motion 6.3.0
  15. wretch 1.7.9
  16. winston 3.7.2
  17. ... others

@balazsorban44
Copy link
Member

balazsorban44 commented Jun 20, 2022

Hi, we recently landed some changes (#37397) to canary that might help fix this issue, without setting esmExternals: false.

Please try it out by installing next@canary and let us know!

It seemed to help a lot of people in this thread: #32314

@LuudJanssen
Copy link
Contributor

@balazsorban44 seems to have fixed the issue! I'll check later for a 100% confirm, because I had to disable middleware because of breaking changes in the latest canary. If the memory leak was because of the middleware, it might not have been fixed.

@balazsorban44
Copy link
Member

Thanks for checking. I would say the memory issue is unlikely to be middleware related, but happy to get a 100% confirmation. 👍

@jflayhart could you check canary as well?

@glitteringkatie
Copy link

I was lurking on here and want to confirm that switching to canary seems to have gotten rid of the memory leak in our enterprise-grade app as well! 🎉

@jflayhart
Copy link
Contributor Author

jflayhart commented Jun 21, 2022

@balazsorban44 I just pulled down v12.1.7-canary.42 and it's working great! Not seeing the same memory leaks 🎉 Certainly not around Promises. Great work!

Curious what was the exact issue in webpack (presuming since that was updated)?

Screen Shot 2022-06-21 at 12 06 25 PM

@glitteringkatie
Copy link

I was lurking on here and want to confirm that switching to canary seems to have gotten rid of the memory leak in our enterprise-grade app as well! 🎉

Follow up to my lurk, turns out the memory leak just changed 😞 We are using next in 2 repos: one with a lot of files and one with LONG files. Prior to the canary, long files had the memory leak and lots of files did not, whereas with the canary our repo with long files is now fine and our lots of files repo is getting OOM. I know this is a vague report but wanted to pass this insight along before we file an issue through vercel!

@nol13
Copy link

nol13 commented Sep 7, 2022

This issues is (tentatively) fixed for us in 12.2.6-canary.8. Guessing it might be #39902 that did the trick

@moshie
Copy link

moshie commented Sep 9, 2022

Confirmed here early reports are showing that 12.3.0 has fixed the leak our end 🎉

@nol13
Copy link

nol13 commented Sep 12, 2022

Cool we might have to test out 12.3.0 then. Was worried we'd have to go with the canary since #39902 got reverted.

@balazsorban44
Copy link
Member

Based on multiple reports, going to close this for now. Please open a new issue (testing next@canary) with a new reproduction if you have similar issues.

@github-actions
Copy link
Contributor

This closed issue has been automatically locked because it had no new activity for a month. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template.
Projects
None yet
Development

No branches or pull requests

9 participants