Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

vm: memory leak in vm.runInNewContext #6552

Closed
crosleyj opened this issue Nov 21, 2013 · 19 comments
Closed

vm: memory leak in vm.runInNewContext #6552

crosleyj opened this issue Nov 21, 2013 · 19 comments

Comments

@crosleyj
Copy link

There is a memory leak in vm.runInNewContext. If you call it in a loop your machine will quickly run out of memory. It can be reproduced with the code below. Run with "node --trace-gc"--trace-gc-verbose to see garbage collection start to have problems. Running a few processes with the code below will likely cause your box to stop working when it runs out of memory. Use "top" in another console to see the memory usage quickly go up.

We have a use case where we want to use vm.runInNewContext on each request to a web service, so we quickly run out of memory and the service stops working under load.

We are running node v0.10.18.

var vm = require('vm');

var i = 0;
while (true) {
    i++;
    console.log(process.pid + ' still running ' + i);
    vm.runInNewContext('helpers="hello"');
}
@OrangeDog
Copy link

Your code never yeilds - it's an infinite blocking busy loop. The garbage collector can never run. Whatever you were doing, you would run out of memory.

@crosleyj
Copy link
Author

Thanks for letting me know. I apologize for submitting code with that problem and not knowing that. I've verified that if we run node with the --expose-gc flag and do a manual global.gc() in the reproduction code it doesn't increase in memory, and the same is true if we do a global.gc() on every request in our service.

However, in our service, if we don't do a global.gc(), then the memory is constantly increasing from this vm.runInNewContext call. If I comment it out, I don't get the memory increase. It's possible that our service is also not yielding. I don't actually know how to make node js yield without manually calling the garbage collector. Is there a way to change my reproduction code to make sure it will yield without manually invoking global.gc()?

Note: I am seeing the processes occasionally try to garbage collect in our service from the --trace-gc flag after x number of requests, but if I'm not calling global.gc() manually it doesn't seem to reduce the memory that the process is using at all.

@OrangeDog
Copy link

I don't actually know how to make node js yield

Do something asynchronous (nextTick doesn't count). Don't write infinite loops.

the memory is constantly increasing

How high does it go? V8 won't start aggressively collecting garbage unless it's running out of memory or past the high water mark.

@crosleyj
Copy link
Author

Thanks for the quick and I tried putting some async callbacks in there and you are correct that the memory does eventually stabilize, in the case of the sample code at about 400Mb-600Mb.

So maybe this problem simply results from my lack of understanding about V8 garbage collection. In the case of our service, we are running one parent process with many child processes. At some point it stops working when garbage collection happens and I've blamed this on a memory leak. If I used only 2 worker processes for the service and let it run, it does garbage collect periodically and the service never stops working completely, but it does take up almost the entire memory of the box. Here is the result of a top command near the end of that testing. You can see that each of the two node workers takes up about 1.3G in memory, and overall they only leave a few Mb of memory free on the box. Then they scavenge it a bit at a time so that they each always have just enough memory to continue. Is this normal behavior?:

If it is, then this probably results from something about doing garbage collection with many node processes at once. Perhaps it gets in a similar state to this, with little memory left on the machine, and then several processes try to allocate space and fail, and then something goes wrong with memory collisions or cleanup. I'll continue to try to pinpoint exactly what is going on there. Do you know of anything that might go wrong with garbage collection with many node processes taking load?

Mem: 3844596k total, 3774756k used, 69840k free, 124224k buffers
Swap: 0k total, 0k used, 0k free, 417700k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3690 nodejs 20 0 1777m 1.3g 5640 R 45.0 36.2 7:23.74 node
3551 nodejs 20 0 1788m 1.3g 5640 R 44.3 36.7 10:00.90 node
2979 nodejs 20 0 1023m 123m 6820 R 9.0 3.3 2:52.21 node

Here is one of their garbage collections:

Map space, used: 34349 KB, available: 12135 KB, committed: 46484 KB
[3690] Cell space, used: 29199 KB, available: 4113 KB, committed: 33383 KB
[3690] Large object space, used: 1536 KB, available: 87499 KB, committed: 1552 KB
[3690] All spaces, used: 916325 KB, available: 438690 KB, committed: 1372336 KB
[3690] Total time spent in GC : 131349 ms
[3690] 2219342 ms: Scavenge 919.8 (1381.5) -> 912.5 (1385.5) MB, 106 ms [allocation failure].
[3690] Memory allocator, used: 1418788 KB, available: 80348 KB
[3690] New space, used: 4030 KB, available: 12353 KB, committed: 32768 KB
[3690] Old pointers, used: 620719 KB, available: 396608 KB, committed: 1017331 KB
[3690] Old data space, used: 51267 KB, available: 3281 KB, committed: 54610 KB
[3690] Code space, used: 192060 KB, available: 1201 KB, committed: 194220 KB
[3690] Map space, used: 35115 KB, available: 11369 KB, committed: 46484 KB
[3690] Cell space, used: 29709 KB, available: 3385 KB, committed: 33383 KB
[3690] Large object space, used: 1536 KB, available: 79307 KB, committed: 1552 KB
[3690] All spaces, used: 934439 KB, available: 428198 KB, committed: 1380351 KB
[3690] Total time spent in GC : 131455 ms
[3690] 2222939 ms: Scavenge 937.7 (1389.5) -> 930.5 (1393.5) MB, 114 ms [allocation failure].
[3690] Memory allocator, used: 1426980 KB, available: 72156 KB
[3690] New space, used: 4030 KB, available: 12353 KB, committed: 32768 KB
[3690] Old pointers, used: 633719 KB, available: 387642 KB, committed: 1021362 KB
[3690] Old data space, used: 52058 KB, available: 1888 KB, committed: 54610 KB
[3690] Code space, used: 195412 KB, available: 1203 KB, committed: 197208 KB
[3690] Map space, used: 35892 KB, available: 10592 KB, committed: 46484 KB
[3690] Cell space, used: 30227 KB, available: 2483 KB, committed: 33383 KB
[3690] Large object space, used: 1536 KB, available: 71115 KB, committed: 1552 KB
[3690] All spaces, used: 952876 KB, available: 416162 KB, committed: 1387370 KB
[3690] Total time spent in GC : 131569 ms
[3690] 2226527 ms: Scavenge 956.1 (1397.5) -> 948.9 (1401.5) MB, 80 ms [allocation failure].
[3690] Memory allocator, used: 1435172 KB, available: 63964 KB
[3690] New space, used: 4030 KB, available: 12353 KB, committed: 32768 KB
[3690] Old pointers, used: 647081 KB, available: 379319 KB, committed: 1026401 KB
[3690] Old data space, used: 52843 KB, available: 1020 KB, committed: 54610 KB
[3690] Code space, used: 198739 KB, available: 1204 KB, committed: 200196 KB
[3690] Map space, used: 36662 KB, available: 9821 KB, committed: 46484 KB
[3690] Cell space, used: 30740 KB, available: 2483 KB, committed: 33383 KB
[3690] Large object space, used: 1536 KB, available: 62923 KB, committed: 1552 KB
[3690] All spaces, used: 971635 KB, available: 406200 KB, committed: 1395396 KB
[3690] Total time spent in GC : 131649 ms
[3690] 2230094 ms: Scavenge 973.8 (1405.5) -> 966.6 (1409.5) MB, 104 ms [allocation failure].
[3690] Memory allocator, used: 1443364 KB, available: 55772 KB
[3690] New space, used: 4030 KB, available: 12353 KB, committed: 32768 KB
[3690] Old pointers, used: 659870 KB, available: 370549 KB, committed: 1030432 KB
[3690] Old data space, used: 53625 KB, available: 846 KB, committed: 54610 KB
[3690] Code space, used: 202053 KB, available: 1204 KB, committed: 204180 KB
[3690] Map space, used: 37431 KB, available: 9051 KB, committed: 46484 KB
[3690] Cell space, used: 31253 KB, available: 1475 KB, committed: 33383 KB
[3690] Large object space, used: 1536 KB, available: 54731 KB, committed: 1552 KB
[3690] All spaces, used: 989801 KB, available: 395480 KB, committed: 1403411 KB
[3690] Total time spent in GC : 131753 ms
[3690] 2233719 ms: Scavenge 991.7 (1413.5) -> 984.5 (1417.5) MB, 95 ms [Runtime::PerformGC].

@crosleyj
Copy link
Author

I now believe this may be an issue with the garbage collector change is Node v10. I ran the code below using node v0.8.5 and node v0.10.22. In v0.8.5 the memory starts at about 35Mb and never increases significantly. Running with --trace-gc shows that it is doing Mark-sweep collections very frequently and successfully collection garbage. Running with v0.10.22 shows the process increasing in memory usage by about 30Mb per second until it the box runs out of memory. Looking at the memory collections, it is doing them much loss frequently and it is mostly doing Scavenge, rarely Mark-sweep. Even when it is doing these it is not collecting much of the memory.

This may have been a desired change between these node versions. Is Node now meant to basically take up all the memory on the box until it needs to scavenge some? If not, then this appears to be a garbage collection bug. At least for our use case this causes all the node processes to go into a state that they recover from very slowly when they finally do start scavenging garbage.

var vm = require('vm');
var Q = require('q');

var i = 0;
var sandbox = {};

setInterval( function() {
for ( var i = 0; i < 100; i++) {
createNewVM();
}
}, 5);

function createNewVM() {
var deferred = Q.defer();
i++;
setTimeout( function() {
vm.runInNewContext('2+2', sandbox);
deferred.resolve();
i--;
}, 5);
return deferred.promise;
}

setInterval(function() {
console.log('i: ' + i);
}, 1000);

@bnoordhuis
Copy link
Member

This may have been a desired change between these node versions. Is Node now meant to basically take up all the memory on the box until it needs to scavenge some?

Not all - the default upper limit on most systems is around 1.6 GB. You can limit it further with --old_max_space_size=n where n is in MB.

I see you've already figured it out but to confirm, yes, the garbage collector in v0.10 allocates memory more aggressively. A common complaint about the GC in v0.8 is that it's slow (and it is), that's why the GC in v0.10 trades space for time.

@crosleyj
Copy link
Author

Thanks for clarifying that. Sorry to keep bothering you with this but I have one more test case I ran that I think invalidates that the new garbage collection performance is better specifically if vm.runInNewContext is run. Please take a look at this once more.

I wrote the code at the end of this post to try to call vm.runInNewContext 100 times every 5 milliseconds. Then every second I outputted how many of these requests had actually processed and the total execution time, as well as how many requests I had made for this that had not yet been processed.

When run with Node v0.10.10, I get the output below. You can see that it is only managing to request 100 of these per interval that it outputs. When you run with --trace-gc the rest seems to go to ineffective garbage collection.

requested: 100. stillWaiting: 100. Execution time: 2062
requested: 200. stillWaiting: 100. Execution time: 4237
requested: 300. stillWaiting: 100. Execution time: 6329
requested: 400. stillWaiting: 100. Execution time: 8550
requested: 500. stillWaiting: 100. Execution time: 10714
requested: 600. stillWaiting: 100. Execution time: 12989
requested: 700. stillWaiting: 100. Execution time: 15097
requested: 800. stillWaiting: 100. Execution time: 17408
requested: 900. stillWaiting: 100. Execution time: 19488
requested: 1000. stillWaiting: 100. Execution time: 21629

When run with Node v0.8.5, I get the output below. Despite garbage collecting a lot more often, it manages to make many more of these requests.
requested: 2200. stillWaiting: 200. Execution time: 994
requested: 4400. stillWaiting: 200. Execution time: 2001
requested: 6800. stillWaiting: 200. Execution time: 3066
requested: 9200. stillWaiting: 200. Execution time: 4078
requested: 11700. stillWaiting: 200. Execution time: 5098
requested: 14100. stillWaiting: 200. Execution time: 6115
requested: 16500. stillWaiting: 200. Execution time: 7120
requested: 19100. stillWaiting: 100. Execution time: 8179
requested: 21600. stillWaiting: 200. Execution time: 9163
requested: 24300. stillWaiting: 100. Execution time: 10213

Next, I wanted to make sure this only happened with vm.runInNewContext and not other memory intensive objects. I replace vm.runInNewContext with "theThing" commented out in the code below. When I did this, the new version of node collected garbage more effectively and the two versions run comparably.

Finally, I wanted to look at the while(true) loop once more from my original posting for an even simpler test case, to verify that it had been an issue of not releasing to the garbage collector. When I replace vm.runInNewContext in the while true loop with "theThing", as in the code below, garbage collection does happen and works, as in:
[29227] 32 ms: Scavenge 1.7 (34.0) -> 1.5 (35.0) MB, 0 ms [Runtime::PerformGC].
[29227] 35 ms: Mark-sweep 1.5 (35.0) -> 1.4 (36.0) MB, 2 ms [Runtime::PerformGC] [GC in old space requested].
[29227] 68 ms: Scavenge 2.4 (36.0) -> 2.4 (37.0) MB, 5 ms [Runtime::PerformGC].
[29227] 74 ms: Scavenge 3.4 (37.0) -> 3.4 (38.0) MB, 0 ms [Runtime::PerformGC].
[29227] 91 ms: Scavenge 5.3 (38.0) -> 4.3 (39.0) MB, 0 ms [Runtime::PerformGC].
[29227] 103 ms: Scavenge 6.3 (39.0) -> 5.3 (40.0) MB, 0 ms [Runtime::PerformGC].
[29227] 114 ms: Scavenge 7.2 (40.0) -> 6.2 (40.0) MB, 0 ms [Runtime::PerformGC].
[29227] 131 ms: Scavenge 9.2 (40.0) -> 6.2 (40.0) MB, 0 ms [Runtime::PerformGC].
[29227] 149 ms: Scavenge 9.2 (40.0) -> 6.2 (40.0) MB, 0 ms [Runtime::PerformGC

// 1) while true loop
var theThing = null;
while(true) {
    var originalThing = theThing;
    theThing = {
        longStr: new Array(1000000).join('*')
    };
}



// 2) interval code
var vm = require('vm');
var Q = require('q');

var requested = 0;
var stillWaiting = 0;
var sandbox = {};
var theThing = null;

var start = new Date().getTime();

// Try to run vm.runInNewContext 100 times every x milliseconds, x=5
setInterval( function() {
  //console.log('Creating another 100 requests for vm');
  for ( var i = 0; i < 100; i++) {
    createNewVM();
  }
}, 5);

// Run vm.runInNewContext
function createNewVM() {
    stillWaiting++;
    var deferred = Q.defer();
    setTimeout( function() { 
        requested++;
        vm.runInNewContext('2+2');
       //            var originalThing = theThing;
       //            theThing = {
      //              longStr: new Array(1000000).join('*')
      //            };
        stillWaiting--;
        deferred.resolve();
    }, 5);
    return deferred.promise;
}

// Every second, log the number of times we've run createNewVM and the execution time
setInterval(function() {
  var end = new Date().getTime();
  var time = end - start;
  console.log('requested: ' + requested + '. stillWaiting: ' + stillWaiting + '. Execution time: ' + time);
}, 1000);

@bnoordhuis bnoordhuis reopened this Dec 1, 2013
@crosleyj
Copy link
Author

crosleyj commented Dec 2, 2013

Thanks for reopening. If anyone on the Node team could take a look that would be appreciated.

@indutny indutny self-assigned this Mar 2, 2014
@mitkodev
Copy link

mitkodev commented Mar 7, 2014

any new info on this?

@pedrofranceschi
Copy link

+1

andreialecu added a commit to andreialecu/deployd that referenced this issue Jan 27, 2015
There is a bug in node with vm.runInNewContext: nodejs/node-v0.x-archive#6552
This was used for every script call by deployd, and for example GET scripts are being run for every record returned -> so if you need to return 1000 records, the scripts would run 1000 times resulting in big memory leaks.

With this change, even 15000 executions of a script take now just 300 ms on my machine and use virtually no extra memory, compared to huge memory leaks and over several minutes with the old code.
@andreialecu
Copy link

+1

Note that this also occurs with runInContext() even if you reuse the same context that was created just once.

runInThisContext() however has no issues.

Some code exhibiting the problem: https://gist.github.com/billywhizz/813257 (tested on node 0.10.26)

ericfong added a commit to deployd/deployd that referenced this issue Mar 8, 2015
There is a bug in node with vm.runInNewContext: nodejs/node-v0.x-archive#6552
This was used for every script call by deployd, and for example GET scripts are being run for every record returned -> so if you need to return 1000 records, the scripts would run 1000 times resulting in big memory leaks.

With this change, even 15000 executions of a script take now just 300 ms on my machine and use virtually no extra memory, compared to huge memory leaks and over several minutes with the old code.

Minor refactoring

Remove unneceesary try/catch and add another test.

Refactor to remove dependency on 'vm'.
andreialecu added a commit to deployd/deployd that referenced this issue Mar 8, 2015
There is a bug in node with vm.runInNewContext: nodejs/node-v0.x-archive#6552
This was used for every script call by deployd, and for example GET scripts are being run for every record returned -> so if you need to return 1000 records, the scripts would run 1000 times resulting in big memory leaks.

With this change, even 15000 executions of a script take now just 300 ms on my machine and use virtually no extra memory, compared to huge memory leaks and over several minutes with the old code.

Minor refactoring

Remove unneceesary try/catch and add another test.

Refactor to remove dependency on 'vm'.
@astoilkov
Copy link

+1 for the issue. The solution for deployd does not work for me because I am executing a script that is assigning a global variable and then is not using this to reference it:

this.newValue = 'value';

// fails with newValue is not defined
console.log(newValue);

@astoilkov
Copy link

My problem is that I am executing a client-side JavaScript code using runInContext which is leaking about 1 MB every time i call runInContext. I am creating browser variables like window and document in the context. I am not aware of another solution which could run the code in a context where window and document could be available in the global context and in the same time the Node.js global scope to not be accessible.

function executeCode(browserEnv, html, code) {
  var context = vm.createContext(browserEnv.getObject());
  var script = vm.createScript(code);

  extend(context, {
    server: {
      html: html,
      data: {},
      rendered: '',
      applications: []
    }
  });

  script.runInContext(context);

  return context.server.rendered;
}

Tell me if you need more information on the problem.

@andreialecu
Copy link

@astoilkov Sorry, I got confused about what issue this project is in. :)

If this is a blocking bug for you and you can use io.js instead of node, it should have this fixed, see here: /~https://github.com/iojs/io.js/blob/v1.x/CHANGELOG.md#vm

@jasnell
Copy link
Member

jasnell commented May 28, 2015

@joyent/node-coreteam ... looks like a fix may have gone into io.js. We may want to investigate further and backport

@viking
Copy link

viking commented Dec 14, 2015

This is still a problem with Node 4 and 5. Any updates?

@viking
Copy link

viking commented Dec 14, 2015

Here is a test script:

var vm = require("vm");

function run(index) {
  var context = vm.createContext();
  vm.runInContext("1+1", context);
  console.log("run " + index + ":", process.memoryUsage().heapTotal);
}

for (var i = 0; i < 20; i++) {
  run(i);
}

@viking
Copy link

viking commented Dec 14, 2015

Sorry, just realized this project was archived. I'll look in the new project.

@ChALkeR
Copy link
Member

ChALkeR commented Dec 15, 2015

Is this issue the same as nodejs/node#3113 + nodejs/node#1741?
If so, I guess that this could be closed and discussion moved there.

gibfahn pushed a commit to ibmruntimes/node that referenced this issue May 6, 2016
Notable changes

* buffer: safeguard against accidental kNoZeroFill (Сковорода Никита Андреевич) [nodejs/node-private#35](/~https://github.com/nodejs/node-private/pull/35)
* deps: upgrade openssl sources to 1.0.2h (Shigeki Ohtsu) [nodejs#6552](nodejs/node#6552)
utterstep pushed a commit to lutik-inc/node that referenced this issue Jun 1, 2016
Notable changes

* buffer: safeguard against accidental kNoZeroFill (Сковорода Никита Андреевич) [nodejs/node-private#35](/~https://github.com/nodejs/node-private/pull/35)
* deps: upgrade openssl sources to 1.0.2h (Shigeki Ohtsu) [nodejs#6552](nodejs/node#6552)
@Trott Trott closed this as completed Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests