Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iojs.exe randomly terminates on Windows #1512

Closed
alexlamsl opened this issue Apr 23, 2015 · 63 comments
Closed

iojs.exe randomly terminates on Windows #1512

alexlamsl opened this issue Apr 23, 2015 · 63 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. windows Issues and PRs related to the Windows platform.

Comments

@alexlamsl
Copy link

The iojs.exe process in question is running one of http and https servers.

The problem seems to occur if a separate iojs process is running, and is subsequently terminated, e.g. manually by pressing Ctrl+C within a Command Prompt window, then the aforementioned process would terminate as well, whether it is running in the background or in (a separate) Command Prompt window.

@Fishrock123
Copy link
Contributor

@alexlamsl can you give us any other info? io.js and windows versions? Memory usage?

@alexlamsl
Copy link
Author

Sorry @Fishrock123 - mistyped TAB then Enter and submitted a mostly empty comment above (=.=")

@Fishrock123 Fishrock123 added the windows Issues and PRs related to the Windows platform. label Apr 23, 2015
@alexlamsl
Copy link
Author

When the iojs process terminates unexpected, most of the time there isn't any output to stdin or stderr - so far only once have I see the following output:

Assertion failed: (req_.data) == (this), file c:\workspace\iojs+release\nodes\iojs-win2008r2-release-x64\src\req-wrap-inl.h, line 29

I have encountered this issue on io.js 1.7.1 and 1.8.1 - I have not tried any earlier versions.

@Fishrock123
Copy link
Contributor

@alexlamsl does it happen if you have any two iojs processes running and quite one, or only when one is running http?

@alexlamsl
Copy link
Author

In addition, as I first suspected this issue might have something to do with https, I ran other iojs processes to continuously hit the first process via https. It ran fine after churning through ~50GB of data in both occasions (takes about 3 hours), then only when I press Ctrl+C on the tester process would I get the original process to terminate.

I tried running the tester process on a separate PC over the network, and cannot reproduce this problem so far.

@alexlamsl
Copy link
Author

Under normal circumstances, the https process would run in the background forever, and every now and then some scheduled iojs process would run to perform some http requests to external resources (NOT to the background https process), and they are expected to exit gracefully.

I suspect there might be occasions when those schedule processes terminate in an erroneous condition, and triggers this random termination behaviour on the https process.

@alexlamsl
Copy link
Author

Running just iojs (the prompt mode?) and press Ctrl+C twice to quite does not trigger the https process to terminate.

@alexlamsl
Copy link
Author

Something else that is probably irrelevant as well - when I listen to clientError events in an attempt to debug, I get these every now and then:

{ [Error: Parse Error] bytesParsed: 0, code: 'HPE_INVALID_METHOD' }
{ _connecting: false,
  _hadError: false,
  _handle: null,
  _parent: null,
  _host: null,
  _readableState:
   { objectMode: false,
     highWaterMark: 16384,
     buffer: [],
     length: 0,
     pipes: null,
     pipesCount: 0,
     flowing: true,
     ended: false,
     endEmitted: false,
     reading: true,
     sync: false,
     needReadable: true,
     emittedReadable: false,
     readableListening: false,
     defaultEncoding: 'utf8',
     ranOut: false,
     awaitDrain: 0,
     readingMore: true,
     decoder: null,
     encoding: null,
     resumeScheduled: false },
  readable: false,
  domain: null,
  _events:
   { end: [ [Object], [Function: socketOnEnd] ],
     finish: [Function: onSocketFinish],
     _socketEnd: [Function: onSocketEnd],
     drain: [ [Function: ondrain], [Function: socketOnDrain] ],
     timeout: [Function],
     error: [Function: socketOnError],
     close: [Function: serverSocketCloseListener],
     data: [Function: socketOnData],
     connect: { [Function: g] listener: [Function] } },
  _eventsCount: 9,
  _maxListeners: undefined,
  _writableState:
   { objectMode: false,
     highWaterMark: 16384,
     needDrain: false,
     ending: false,
     ended: false,
     finished: false,
     decodeStrings: false,
     defaultEncoding: 'utf8',
     length: 0,
     writing: false,
     corked: 0,
     sync: true,
     bufferProcessing: false,
     onwrite: [Function],
     writecb: null,
     writelen: 0,
     bufferedRequest: null,
     lastBufferedRequest: null,
     pendingcb: 0,
     prefinished: false,
     errorEmitted: true },
  writable: false,
  allowHalfOpen: true,
  destroyed: true,
  bytesRead: 38,
  _bytesDispatched: 0,
  _pendingData: null,
  _pendingEncoding: '',
  server:
   { domain: null,
     _events:
      { request: [Function],
        connection: [Function: connectionListener],
        clientError: [Object] },
     _eventsCount: 3,
     _maxListeners: undefined,
     _connections: 0,
     _handle:
      { fd: -1,
        reading: false,
        owner: [Circular],
        onread: null,
        onconnection: [Function: onconnection],
        writeQueueSize: 0 },
     _usingSlaves: false,
     _slaves: [],
     _unref: false,
     allowHalfOpen: true,
     pauseOnConnect: false,
     httpAllowHalfOpen: false,
     timeout: 120000,
     _connectionKey: '6::::80' },
  _idleTimeout: -1,
  _idleNext: null,
  _idlePrev: null,
  _idleStart: 4540549,
  parser:
   { '0': [Function: parserOnHeaders],
     '1': [Function: parserOnHeadersComplete],
     '2': [Function: parserOnBody],
     '3': [Function: parserOnMessageComplete],
     _headers: [],
     _url: '',
     socket: [Circular],
     incoming: null,
     maxHeaderPairs: 2000,
     onIncoming: [Function: parserOnIncoming] },
  _paused: false,
  read: [Function],
  _consuming: true }

@alexlamsl
Copy link
Author

Memory usage is around 60~250MB, on a machine that has 64GB, running Windows Server 2012 R2

@alexlamsl
Copy link
Author

If I were to run 3 tester processes simultaneously on the same machine as the https process, quitting one of them would trigger this issue.

However, the other 2 tester processes would continue to run, albeit not being able to connect to anything obviously, as the https process has (unexpectedly) terminated.

@alexlamsl
Copy link
Author

I just tried forceful termination of iojs processes which I spawn and does not use http - does not reproduce this issue.

@alexlamsl
Copy link
Author

@Fishrock123 any other combination of tests you would like me to try?

I will try and see if I can narrow something down over the weekend - but without any console output, I find a bit hard to even give an educated guess atm.

@silverwind
Copy link
Contributor

Just a wild guess, but could you remove the node.exe hardlink and rename iojs.exe to node.exe and check if it still occurs?

I once had a similar issue on Linux with a node process backgrounded through the shell, and launching another node would terminate the first. Launching the second process with iojs did not kill it.

@alexlamsl
Copy link
Author

@silverwind on Windows, iojs.exe and node.exe looks like two byte-by-byte identical files (as opposed to hardlinks)

Anyway, I tried removing node.exe, then renaming iojs.exe to node.exe, then run the tests:

// leave this one running
// https server that does fs.createReadStream(...).pipe(response)
> node .


// run for ~5 mins, then Ctrl+C
// hits first server with 10 simultaneous, continuous https requests
// ~400req/s, ~20MB/s
> node test.js

Problem still persists (in that the first process sometimes terminates, and when it does it may print an assertion error).

Assertion failed: (req_.data) == (this), file c:\workspace\iojs+release\nodes\iojs-win2008r2-release-x64\src\req-wrap-inl.h, line 29

@alexlamsl
Copy link
Author

So far I can only get this to happen using https - http seems to be okay (or more resilient, as the behaviour is not 100% repeatable)

@silverwind
Copy link
Contributor

@silverwind on Windows, iojs.exe and node.exe looks like two byte-by-byte identical files (as opposed to hardlinks)

I'm not sure you can discern a hardlink from a copy in the GUI, but it should be one. Anyways, it's not the cause of this issue at least, and I think a reduced test case is needed.

@alexlamsl
Copy link
Author

@silverwind @Fishrock123 This is my best attempt thus far:

server.js

var fs = require('fs');
require('https').createServer({
  key:fs.readFileSync('test.key'),
  cert:fs.readFileSync(test.crt'),
  rejectCert:false,
  rejectUnauthorized:false
}, function(request, response) {
  fs.createReadStream('test.crt').pipe(response);
}).listen(443);

client.js

var https = require('https');

function get() {
  https.get({
    url:'https://localhost/',
    rejectUnauthorized:false
  }, function(response) {
    response.resume();
    get();
  }).on('error', console.error);
}

for (var i = 0; i < 10; i++) {
  get();
}

Steps:

  • Run iojs server.js under Command Prompt ("Window 1")
  • Run iojs client.js under Command Prompt ("Window 2")
  • Leave them running for ~5 minutes
  • In "Window 2", press Ctrl+C

Expected:

  • iojs in "Window 2" terminates
  • iojs in "Window 1" continues to run

Actual (sometimes, not always):

  • both iojs processes terminates
  • sometimes an assertion error is printed in "Window 1"

@alexlamsl
Copy link
Author

On the chances of hitting this bug, bigger files seem to help:

  // 2KB, hits bug <20% of the time
  fs.createReadStream('test.crt').pipe(response);

  // 628KB, reproducible >50% of the time
  fs.createReadStream('mongodb.log').pipe(response);

Sleeping does not help:

  // no "improvement" in chances
  setTimeout(function() {
    fs.createReadStream('test.crt').pipe(response);
  }, 200);

@silverwind silverwind added confirmed-bug Issues with confirmed bugs. c++ Issues and PRs that require attention from people who are familiar with C++. labels Apr 26, 2015
@silverwind
Copy link
Contributor

Was able to reproduce, test repo here: /~https://github.com/silverwind/iojs-issue-1512.git. With the repo cloned, run server.js, and repeatedly start and CTRL-C the client.js until the server shuts down.

It looks to be Windows only, couldn't get it to reproduce on Linux. Also couldn't reproduce with http instead of https. No other indication except sometimes this:

Assertion failed: (req_.data) == (this), file c:\workspace\iojs+release\nodes\iojs
-win2008r2-release-x64\src\req-wrap-inl.h, line 29

@silverwind
Copy link
Contributor

The assertion points to a file added in #667, but it might've been in place before. Might be worth to try v1.2.0.

@alexlamsl
Copy link
Author

@silverwind Using the test case I've got (which I checked and believe to be identical to yours):

Works fine:

  • 1.2.0
  • 1.3.0
  • 1.4.3
  • 1.5.1

Crashes:

  • 1.6.4
  • 1.7.1
  • 1.8.1

@Fishrock123
Copy link
Contributor

That would mean it's somewhere in here: v1.5.1...v1.6.4

@alexlamsl can you narrow it down more between 1.5.1 and 1.6.4?

@alexlamsl
Copy link
Author

@Fishrock123 tested on a different machine (Win10 Pro x64, 4GB RAM)

Working:

  • 1.5.0
  • 1.5.1
  • 1.6.0
  • 1.6.1
  • 1.6.2

Crashes:

  • 1.6.3
  • 1.6.4
  • 1.7.1
  • 1.8.1

When the Assertion failed is printed out (sometimes), the messages are identical.

@silverwind
Copy link
Contributor

I'll git bisect this later, unless someone finds the offending commit in the 1.6.3 merge window.

@alexlamsl
Copy link
Author

@silverwind not sure how much this helps, but I have an updated test case which decouples fs from this issue (client.js is the same as before):

server.js

var fs = require('fs');
var data = fs.readFileSync('mongodb.log');
require('https').createServer({
  key:fs.readFileSync(test.key'),
  cert:fs.readFileSync('test.crt')
  rejectCert:false,
  rejectUnauthorized:false
}, function(request, response) {
  response.end(data);
}).listen(443);

@silverwind
Copy link
Contributor

You could also create a big dataset through new Array(1e6).join('.')

@alexlamsl
Copy link
Author

Yeah, doing that works, or just this even:

var data = new Buffer(1e6);

indutny added a commit that referenced this issue May 1, 2015
Dispatch requests in the implementation of the stream, not in the code
creating these requests. The requests might be piled up and invoked
internally in the implementation, so it should know better when it is
the time to dispatch them.

In fact, TLS was doing exactly this thing which led us to...

Fix: #1512
PR-URL: #1563
Reviewed-By: Shigeki Ohtsu <ohtsu@iij.ad.jp>
@Fishrock123
Copy link
Contributor

Should be temporarily fixed in 30b7349, and properly in a future libuv patch. Re-open if still an issue.

rvagg added a commit that referenced this issue May 4, 2015
PR-URL: #1532

Notable Changes:

* crypto: significantly reduced memory usage for TLS (Fedor Indutny & Сковорода
  Никита Андреевич) #1529
* net: socket.connect() now accepts a 'lookup' option for a custom DNS
  resolution mechanism, defaults to dns.lookup() (Evan Lucas) #1505
* npm: Upgrade npm to 2.9.0. See the v2.8.4 and v2.9.0 release notes for
  details. Notable items:
  - Add support for default author field to make npm init -y work without
    user-input (@othiym23) npm/npm/d8eee6cf9d
  - Include local modules in npm outdated and npm update (@ArnaudRinquin)
    npm/npm#7426
  - The prefix used before the version number on npm version is now configurable
    via tag-version-prefix (@kkragenbrink) npm/npm#8014
* os: os.tmpdir() is now cross-platform consistent and will no longer returns a
  path with a trailling slash on any platform (Christian Tellnes) #747
* process:
  - process.nextTick() performance has been improved by between 2-42% across the
    benchmark suite, notable because this is heavily used across core (Brian White) #1548
  - New process.geteuid(), process.seteuid(id), process.getegid() and
    process.setegid(id) methods allow you to get and set effective UID and GID
    of the process (Evan Lucas) #1536
* repl:
  - REPL history can be persisted across sessions if the NODE_REPL_HISTORY_FILE
    environment variable is set to a user accessible file,
    NODE_REPL_HISTORY_SIZE can set the maximum history size and defaults to 1000
    (Chris Dickinson) #1513
  - The REPL can be placed in to one of three modes using the NODE_REPL_MODE
    environment variable: sloppy, strict or magic (default); the new magic mode
    will automatically run "strict mode only" statements in strict mode
    (Chris Dickinson) #1513
* smalloc: the 'smalloc' module has been deprecated due to changes coming in V8
  4.4 that will render it unusable
* util: add Promise, Map and Set inspection support (Christopher Monsanto) #1471
* V8: upgrade to 4.2.77.18, see the ChangeLog for full details. Notable items:
  - Classes have moved out of staging; the class keyword is now usable in strict
    mode without flags
  - Object literal enhancements have moved out of staging; shorthand method and
    property syntax is now usable ({ method() { }, property })
  - Rest parameters (function(...args) {}) are implemented in staging behind the
    --harmony-rest-parameters flag
  - Computed property names ({['foo'+'bar']:'bam'}) are implemented in staging
    behind the --harmony-computed-property-names flag
  - Unicode escapes ('\u{xxxx}') are implemented in staging behind the
    --harmony_unicode flag and the --harmony_unicode_regexps flag for use in
    regular expressions
* Windows:
  - Random process termination on Windows fixed (Fedor Indutny) #1512 / #1563
  - The delay-load hook introduced to fix issues with process naming (iojs.exe /
    node.exe) has been made opt-out for native add-ons. Native add-ons should
    include 'win_delay_load_hook': 'false' in their binding.gyp to disable this
    feature if they experience problems . (Bert Belder) #1433
* Governance:
  - Rod Vagg (@rvagg) was added to the Technical Committee (TC)
  - Jeremiah Senkpiel (@Fishrock123) was added to the Technical Committee (TC)
@alexlamsl
Copy link
Author

@Fishrock123 I have just tried iojs 2.0.0, and the problem persists.

@alexlamsl
Copy link
Author

Reproduced >10 times without any errors in the console - iojs just terminates unexpectedly.

@mathiask88
Copy link
Contributor

Yes, the access violation is still there.

@indutny
Copy link
Member

indutny commented May 5, 2015

May I ask you to give a try to indutny@df28a73 commit? Does applying it fix it?

@mathiask88
Copy link
Contributor

@indutny I tried https://gist.github.com/saghul/e05b99b1b16a7c9c15e9 but got some errors appying the patch

Error   3   error C2039: 'u' : is not a member of 'uv_write_s'  ...\uv\src\win\tcp.c    872 1   libuv
Error   4   error C2039: 'stream' : is not a member of 'uv_tcp_s'   ...\uv\src\win\tcp.c    874 1   libuv

so I changed the patch like this (Really not knowing what I am doing there but I wanted it to compile to test if it works if libuv does not return the error synchronously)

    req->queued_bytes = 0;
    handle->reqs_pending++;
    handle->write_reqs_pending++;
    REGISTER_HANDLE_REQ(loop, handle, req);
    SET_REQ_ERROR(req, WSAGetLastError());
    uv_insert_pending_req(loop, (uv_req_t*)req);

And I tested a while and it seems to work. Sure, I can test your commit.

@alexlamsl
Copy link
Author

@indutny the patchset is in C++, and I don't have the compiler infrastructure ready over here to test it. Sorry.

@mathiask88
Copy link
Contributor

@indutny I cherrypicked your commit, rebuilt and the server is still crashing but due to an other error HEAP[iojs.exe]: HEAP: Free Heap block 000000C5AD74CA50 modified at 000000C5AD74CA98 after it was freed

@silverwind
Copy link
Contributor

@mathiask88 you were using the patch for latest libuv master, https://gist.github.com/saghul/52677dba652ab8e78051 is for the version io.js uses (#1563).

@indutny
Copy link
Member

indutny commented May 5, 2015

Will try figure it out today on io.js level. I think that df28a73 should work after some tweaking.

@mathiask88
Copy link
Contributor

@silverwind Ah, ok thanks. Then I did everything right ;) However with the libuv patch this problem seems to be gone.

@silverwind
Copy link
Contributor

I think @saghul intends to include the original fix for libuv/libuv#339 in libuv 1.5.0 (#1563 (comment)), maybe it' best to wait on that before applying more 'hacks'. 😉

@silverwind
Copy link
Contributor

And it landed: libuv/libuv@f880023.

bnoordhuis pushed a commit that referenced this issue May 6, 2015
Fixes: #1397
Fixes: #1512
Fixes: #1621
Fixes: #862
PR-URL: #1646
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@bnoordhuis
Copy link
Member

Should be fixed by 04cc03b. Closing, holler if I should reopen it.

@silverwind
Copy link
Contributor

I verified my test case again, seems to be fixed for real this time.

@alexlamsl
Copy link
Author

Verified on iojs 2.0.1. Thanks everyone!

Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 19, 2015
Dispatch requests in the implementation of the stream, not in the code
creating these requests. The requests might be piled up and invoked
internally in the implementation, so it should know better when it is
the time to dispatch them.

In fact, TLS was doing exactly this thing which led us to...

Fix: nodejs#1512
PR-URL: nodejs#1563
Reviewed-By: Shigeki Ohtsu <ohtsu@iij.ad.jp>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 19, 2015
Fixes: nodejs#1397
Fixes: nodejs#1512
Fixes: nodejs#1621
Fixes: nodejs#862
PR-URL: nodejs#1646
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@reverofevil
Copy link

@silverwind Nah, you didn't.

Assertion failed: (consumed_) == (false), file c:\workspace\iojs+release\nodes\w
in2008r2-release-x64\src\stream_base.h, line 203

Node 5.0.0

@alexlamsl
Copy link
Author

@polkovnikov-ph I believe you are hitting a different bug - may be file a new issue to track it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

7 participants