Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEXT-1314] High memory usage in deployed Next.js project #49929

Closed
1 task done
ProchaLu opened this issue May 17, 2023 · 131 comments
Closed
1 task done

[NEXT-1314] High memory usage in deployed Next.js project #49929

ProchaLu opened this issue May 17, 2023 · 131 comments
Labels
bug Issue was opened via the bug report template. linear: next Confirmed issue that is tracked by the Next.js team.

Comments

@ProchaLu
Copy link
Contributor

ProchaLu commented May 17, 2023

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:30 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8103
    Binaries:
      Node: 18.15.0
      npm: 9.5.0
      Yarn: 1.22.19
      pnpm: 8.5.0
    Relevant packages:
      next: 13.4.3-canary.1
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 5.0.4

Which area(s) of Next.js are affected? (leave empty if unsure)

No response

Link to the code that reproduces this issue

https://codesandbox.io/p/github/ProchaLu/next-js-ram-example/

To Reproduce

Describe the Bug

I have been working on a small project to reproduce an issue related to memory usage in Next.js. The project is built using the Next.js canary version 13.4.3-canary.1. It utilizes Next.js with App Router and Server Actions and does not use a database.

The problem arises when deploying the project on different platforms and observing the memory usage behavior. I have deployed the project on multiple platforms for testing purposes, including Vercel and Fly.io.

  • On Vercel: https://next-js-ram-example.vercel.app/
    When interacting with the deployed version on Vercel, the project responds as expected. The memory usage remains stable and does not show any significant increase or latency

  • On Fly.io: https://memory-test.fly.dev/
    However, when deploying the project on Fly.io, I noticed that the memory usage constantly remains around 220 MB, even during normal usage scenarios

Expected Behavior

I expect the small project to run smoothly without encountering any memory-related issues when deployed on various platforms, including Fly.io. Considering the previous successful deployment on Fly.io, which involved additional resource usage and utilized Next.js 13 with App Router and Server Actions, my anticipation is that the memory usage will remain stable and within acceptable limits.

Fly.io discussion: https://community.fly.io/t/high-memory-usage-in-deployed-next-js-project/12954?u=upleveled

Which browser are you using? (if relevant)

Chrome

How are you deploying your application? (if relevant)

Vercel, fly.io

NEXT-1314

@ProchaLu ProchaLu added the bug Issue was opened via the bug report template. label May 17, 2023
@ProchaLu ProchaLu changed the title High memory usage in Deployed Next.js Project High memory usage in deployed Next.js project May 17, 2023
@thexpand
Copy link

Is this related to Server Actions, have you isolated the case?

@TheBit
Copy link

TheBit commented May 22, 2023

@thexpand This is not related to Server Actions. It is a severe memory leak starting from v13.3.5-canary.9. I was going to open a bug but found this one.

@shuding I suspect your PR #49116 as others in mentioned canary are not likely to cause this. Can you please take a look? This blocks us from upgrading to the latest Next.js.

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020
    Binaries:
      Node: 18.15.0
      npm: 9.5.0
      Yarn: 1.22.19
      pnpm: 7.9.0
    Relevant packages:
      next: 13.4.4-canary.0
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 4.6.2

Tech Stack:

  • Rush.js monorepo (on pnpm)
  • +several Next.js apps in SSG (no SSR, no app dir but with next/image (not legacy one) and with middleware.js and few redirects and rewrites in next.config.js)
  • +all of which uses latest (v7) common/storybook as 90% of theirs code base (so transpileModules is used for it)
  • +external configuration and MaterialUI styles comes from Apollo Client used in getStaticProps pointing towards Strapi GraphQL (so on-demand revalidation is used)
  • and everything deployed to our Kubernetes (build and run-time both are using latest Bullseye Debian so that sharp for next/image is working correctly) and monitored in Grafana
  • BTW, Next.js is in standalone mode

Proofs:

13.3.5-canary.8 vs 13.3.5-canary.9

alt

13.3.5-canary.8 vs 13.3.5-canary.9 with all images unoptimized

alt

13.3.5-canary.8 vs 13.4.4-canary.0 (to test latest canary) with all images unoptimized + middleware removed

alt

So, as you can see, the leak comes not from next/image or middleware, and the only PR which theoretically could cause this from canary.9 is this as for me: #49116

P.S. I also checked13.3.4 and found no leakage there. But on this version, we get Internal Server Error from middleware so can't use it, so I had to find a minimum canary version where this problem has been fixed - and this version is /~https://github.com/vercel/next.js/releases/tag/v13.3.5-canary.2, so we lock on this version for now (probably this PR #48723 fixed middleware problem)

@Josehower
Copy link
Contributor

@shuding or @ijjk any thoughts on this issue? Can you confirm a current memory leak in Next.js?

@Josehower

This comment was marked as resolved.

@Josehower
Copy link
Contributor

Josehower commented Jun 7, 2023

I created this reproduction repo using the latest canary version of Next.js for the error documented before. In the repo i am using auto-cannon to request multiple pages very fast, simulating traffic to the website.

i documented this in a new issue since seems a different error #50909

@Josehower
Copy link
Contributor

Josehower commented Jun 9, 2023

I created a different reproduction repo using the latest canary version of Next.js. The error is crashing the dev server when an import is missing.

https://nextjs.org/docs/messages/module-not-found

<--- Last few GCs --->

[2218:0x5eb9a70]    40167 ms: Mark-sweep 252.1 (263.9) -> 250.1 (263.7) MB, 206.0 / 0.0 ms  (average mu = 0.174, current mu = 0.125) allocation failure scavenge might not succeed
[2218:0x5eb9a70]    40404 ms: Mark-sweep 252.4 (263.9) -> 250.6 (264.2) MB, 216.7 / 0.0 ms  (average mu = 0.135, current mu = 0.086) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb02930 node::Abort() [/usr/local/bin/node]
 2: 0xa18149 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0xcdd16e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xcdd4e7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xe94b55  [/usr/local/bin/node]
 6: 0xe95636  [/usr/local/bin/node]
 7: 0xea3b5e  [/usr/local/bin/node]
 8: 0xea45a0 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 9: 0xea751e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0xe68a5a v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
11: 0x11e17c6 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0x15d5439  [/usr/local/bin/node]

I documented this in a new issue since it seems a different error #51025

@Josehower
Copy link
Contributor

Josehower commented Jun 13, 2023

I created a reproduction using Docker to showcase how a Simple project using Next.js crashes when being used in environtments with ~225MB.

Steps to reproduce:

  1. run docker pull josehower/next-js-memory-leak-reproduction-example:latest
  2. run docker run -p 3000:3000 --memory=225m josehower/next-js-memory-leak-reproduction-example:latest
    • NOTE: in some environments the app is not even running with this memory restriction in this case add more memory --memory=256m
  3. visit http://localhost:3000/
  4. click fire
  5. confirm the app is turning unresponsive and throwing the following error
Error: socket hang up
    at connResetException (node:internal/errors:717:14)
    at Socket.socketOnEnd (node:_http_client:526:23)
    at Socket.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1359:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ECONNRESET'
}
  1. you can confirm the app is running when removing the --memory=225m option from the command

The way I recreated this Reproduction was by creating a Docker image from a simple Next.js app using autocannon to fake traffic to the website from a button.

  • Created the reproduction repo
  • Adding a Dockerfile and setup the image
  • building with docker build . -t <username>/<image-name>:<version>
  • publishing to Docker with docker push docker push <username>/<image-name>:<version>

@Starefossen
Copy link

Is this related to #49677 maybe?

@TheBit
Copy link

TheBit commented Jun 14, 2023

@Starefossen thanx! I've read that issue and found one connection with my setup: #49677 (comment) I also run my Next.js with NODE_OPTIONS='--dns-result-order=ipv4first' due to this, which was done here: nodejs/node#39987, and our migration from Node 16 to 18.

I will try to find time and repeat my upgrade without this option or some other change related to this. I can see that it already fixes the problem for this developer.

@karl-run
Copy link

Here are two pods in a k8s-cluster, the two first lines are my pages based branch, vs default memory usage of my app-dir based branch. Literally 10x from start before any requests. The app-dir branch also hits 7-800mb of usage after a while.

image

@karl-run
Copy link

It also jumps up to 2x the ram usage after a single request. Example of a pages dir vs app dir deployed app:
image

@gopherine
Copy link

gopherine commented Jun 16, 2023

+1 happening the same on our systems , no dev in my team with 8Gigs of ram is able to work with it, this is also happening specifically when we are using app router , its kind of painful to switch to different routing definition back and forth. next version 13.4.4

@gabemeola
Copy link

gabemeola commented Jun 17, 2023

Usage graph running on railway. June 15th image optimization was disabled. Also cache rate drastically increase — may be related.

Not sure how much of an issue this is in serverless land since processes don't run long enough to have memory leaks.
CleanShot 2023-06-16 at 21 05 52@2x

Similar: #44685

@SanderRuusmaa
Copy link

+1 same situation can be seen in our deployed next js app :/

@hampuskraft
Copy link

Same issue here, running on Next.js 13.4.6 deployed on Fly.io. I worked around the problem by allocating 2048 MiB of memory to the instance and a 512 MiB swap as a buffer. As you can see, I'm only delaying the inevitable OOM, but this at least makes the issue much less frequent.

fly-metrics net_d_fly-app_fly-app_orgId=96146 var-app=arewepomeloyet-stg from=now-2d to=now (1)

You can find the source code here: /~https://github.com/hampuskraft/arewepomeloyet.com.

@LukeTarr
Copy link

Yep I can confirm this seems to be a leak somewhere. I'm running a super basic Next server on Railway and you can see the memory usage at completely idle here:

Screenshot 2023-06-21 at 12 57 52 PM

Here's a list of packages and versions being used if this helps anyone debug:

`

"@clerk/nextjs": "^4.21.7",

"@types/node": "20.3.1",

"@types/react": "18.2.13",

"@types/react-dom": "18.2.6",

"autoprefixer": "10.4.14",

"eslint": "8.43.0",

"eslint-config-next": "13.4.6",

"next": "13.4.6",

"postcss": "8.4.24",

"react": "18.2.0",

"react-dom": "18.2.0",

"tailwindcss": "3.3.2",

"typescript": "5.1.3"

`

@karl-run
Copy link

karl-run commented Jun 22, 2023

Tried the newest 13.4.7, things look roughly the same:

Here's another example of a pod that has min: 1 - max: 2 replica, where the green has been alive for a while, where the yellow came up and initially used 300MB, then as soon as a single request hit it it jumped to 520MB.

image

This app isn't using a single next/image-component. So it's definitely not related to that.

Here's the same app in production that's actually getting a few thousand visits:

image

@remorses
Copy link
Contributor

Try uninstalling sharp, it made memory usage much lower better in my case

@hampuskraft
Copy link

I already uninstalled sharp and disabled image optimization, it didn't help in my case.

@broksonic21
Copy link

Not sure if this is next's supported solution, so YMMV, but only thing that helped us on 13.4.4+ was to set:

  experimental: {
    appDir: false 
    // this also controls running workers /~https://github.com/vercel/next.js/issues/45508#issuecomment-1597087133, which is causing
    // memory issues in 13.4.4 so until that's fixed, we don't want this.
  },

That disables the new appdir support which became the default in 13.4, but also turns off the extra workers. It also fixed the leaked socket issue calling crashes/timeout issue (#51560 ), which appears related - the extra processes (see #45508 for build, but also next start) are leaking as far as I can tell, causing everyone's memory issues. Might not be exact cause, but highly correlated for sure.

@hampuskraft
Copy link

This issue is indeed about the App Router (appDir) feature. The feature is supposedly stable (but it's clear that it isn't), which is why it was adopted. Turning it off would require rewriting our codebases.

image

@broksonic21
Copy link

@hampuskraft I should say -our site is using pages - we haven't done any work for app yet (but were still broken unless we turned off the appDir experimental feature/default.

@ungarida

This comment was marked as off-topic.

@ungarida

This comment was marked as off-topic.

@ungarida

This comment was marked as off-topic.

@broksonic21

This comment was marked as outdated.

@tghpereira

This comment was marked as off-topic.

@ungarida

This comment was marked as off-topic.

@ungarida

This comment was marked as off-topic.

@timneutkens
Copy link
Member

@tghpereira again, please read my earlier posts...

If you're running into issues with memory usage in development please follow this issue instead: #46756 (similar to this one, please provide the source code or a heap profile).

renchap added a commit to renchap/joinmastodon that referenced this issue Aug 17, 2023
This should fix the memleak issue we are seeing
See vercel/next.js#49929
renchap added a commit to renchap/joinmastodon that referenced this issue Aug 17, 2023
This should fix the memleak issue we are seeing
See vercel/next.js#49929
renchap added a commit to renchap/joinmastodon that referenced this issue Aug 17, 2023
This should fix the memleak issue we are seeing
See vercel/next.js#49929
renchap added a commit to renchap/joinmastodon that referenced this issue Aug 17, 2023
This should fix the memleak issue we are seeing
See vercel/next.js#49929
@timneutkens
Copy link
Member

Hey everyone,
Got another update on this, we've landed the changes to reduce the amount of processes from 3 to 2:

  • One for routing, App Router rendering
  • One for Pages Router rendering (see my previous posts for reasoning why this needs a separate process)

It's out on next@canary, please give it a try.

We've also made a change to the implementation using Sharp to reduce the amount of concurrency it handles (usually it would take all cpus). That should help a bit with peak memory usage when using Image Optimization. I'd like to get a reproduction for the Image Optimization causing high memory usage so that it can be investigated in a new issue so if someone has that please provide it.

With these changes landed I think it's time to close this issue as these changes cover the majority of comments posted. We can post a new issue specifically tracking memory usage with image optimization. There is a separate issue for development memory usage already.

@karlhorky
Copy link
Contributor

karlhorky commented Aug 23, 2023

karlhorky added a commit to upleveled/security-vulnerability-examples-next-js-postgres that referenced this issue Aug 23, 2023
@karlhorky
Copy link
Contributor

karlhorky commented Aug 23, 2023

cc Fly.io folks @michaeldwan @rubys @jeromegn @dangra @mrkurt so that you're aware that deploying Next.js apps with App Router can lead to OOM (Out of Memory) errors on Fly.io with the free tier ("Free allowances") with 256MB RAM - in case this would represent a business reason for Fly.io to upgrade the base free allowance RAM to 512MB

As mentioned in my last message above, we have now upgraded to the latest Next.js version, and I have yet to see a crash on Fly.io because of OOM errors, but in case the issue persists after some time, you may also hear this from other customers in the future.

@dangra
Copy link

dangra commented Aug 23, 2023

@karlhorky thanks for letting us know.

FTR: for the case of apps running on 256MB machines, adding swap memory usually helps https://fly.io/docs/reference/configuration/#swap_size_mb-option

@karlhorky
Copy link
Contributor

Thanks for the extra tip about the swap memory - we also tried that as well, after getting that tip in the community post

@dangra
Copy link

dangra commented Aug 24, 2023

@karlhorky all good, the only nuance is that that post describes how to setup swap space manually. While the link I shared only requires adding a swap_size_mb = 512 directive in fly.toml and redeploy. The swap space will be automatically setup before your app starts, no changes to Dockerfile needed.

@timneutkens
Copy link
Member

timneutkens commented Aug 24, 2023

Going to close this issue as mentioned yesterday as all changes / investigation has been landed and there haven't been new reports since shipping my changes earlier.

Keep in mind that Node.js below 18.17.0 has a memory leak in fetch() so you have to upgrade Node.js too.

I've opened a separate issue about the Image Optimization memory usage, we'll need a reproduction there, if it's not provided the issue will auto-close so please provide one, thank you! Link: #54482.

Thanks to everyone that provided a reproduction that we could actually investigate.

@vercel vercel locked and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template. linear: next Confirmed issue that is tracked by the Next.js team.
Projects
None yet
Development

No branches or pull requests