Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

clarify the 'close' event #8209

Closed
wants to merge 1 commit into from
Closed

clarify the 'close' event #8209

wants to merge 1 commit into from

Conversation

jeromew
Copy link

@jeromew jeromew commented Aug 20, 2014

After a discussion with @jonathanong and @dougwilson related to the raw-body module, we realized that maybe the documentation for the "close" event is not correct on node.js documentation.

This PR proposes to start a discussion on the meaning of the 'close' event or accept the definition given by @jonathanong which seem to be more in line with what is done in stream modules on npm.

The current definition seem to imply that the close event is only related to the closing of an underlying ressource, meaning that we could receive and 'end' event after the 'close' event if the underlying resource has been close but we still have some data in the internal buffers.

The current definition is also silent on the case when there are many underlying resources (for example many file descriptors). Should we have many 'close' event or just one ?

added to that, it seems that a dest.'close' event is also used to unpipe automatically a destination in /~https://github.com/joyent/node/blob/master/lib/_stream_readable.js#L564 which probably makes sense but seem to be far fetched if 'close' is only related to underlying ressources of the Readable part of a stream.

I don't know if they are all interested in this thread but their stream knowledge could be very beneficial if they can share their experience working with streams: @dominictarr, @substack, @chrisdickinson

@jeromew
Copy link
Author

jeromew commented Aug 20, 2014

link to original discussion: stream-utils/raw-body#27 (comment)

@dominictarr
Copy link

I agree the official documentation for "close" has never really been good. Officially it said "when the underlieing resource has been disposed of" but after reading all the classic streams code I formed an interpretation of the meaning of close not from the documentation but from the actual way that streams actually behaved.

I see "close" as "this stream is totally ended, and will never emit another event" anything active about the stream has finished, flushed to disk, flushed to the kernel etc. Streams are an abstraction - not every stream has an underling resource - but that doesn't mean they shouldn't have a close event.

The close event is still useful - but it's more useful if all streams have it.

It sounds like @jonathanong has come to the same conclusion.

There was a brief period in 0.8 where all core streams had close but
i think that went away with streams2.

@jeromew
Copy link
Author

jeromew commented Aug 20, 2014

@dominictarr, since node core cannot know exactly when a userland readable stream has finished its operations on its underlying resources, the 'close' event cannot always be emitted by node core.

maybe we could require that 'close' should always be sent. This way it could be used as a cleanup-catch-all as @jonathanong seem to be using it.

Do you see a need for optional reserved events related to the underlying ressources (like 'attached' or 'detached' ?)

I think the semantic of 'close' is not clear because we have a kind of silent pre-roll, post-roll currently in streams2 :

  1. setup readable stream
  2. open underlying ressource
  3. pre-roll: internal buffers may be filled up to highWaterMark
  4. waiting for a consumer to be piped
  5. piping
  6. no more data on underlying ressource
  7. post-roll: internal buffers are flushed
  8. 'end' is emitted

The documentation seem to state that 'close' may be sent right after 6 ; you seem to agree with @jonathanong that 'close' should not be emitted before 8.

in the hope that I am not just creating more noise, I add @isaacs in this thread maybe he can help us understand the role of the 'close' event since from what I remember he has worked very hard on 'stream2' at the time.

@dominictarr
Copy link

close is usually more significant on a writable stream - when close is emitted on a fs writable then it means you are guaranteed to be able to open that file and read the same thing back out.

emitting close should be the last thing that happens. I don't think there needs to be a standard for streams that happen to have many underling resources. not only is that rare, but if a user needs to know about that, they are already coupled to the particular type of stream.

@trevnorris
Copy link

I think this would be self resolved by finishing up #6802.

See, right now the 'close' event is run in a process.nextTick(), even though it hasn't actually been fully closed. Thus far it hasn't been a priority so I've let the PR get behind, but IMO it's important to run the callback in the correct phase of the event loop.

@jeromew
Copy link
Author

jeromew commented Aug 21, 2014

@trevnorris I tend to agree that calling the callback in the correct phase of the event loop is a + (I have no idea of the impact such a move could have).

nevertheless, finishing #6802 would not change the fact that we are not clear on the semantics of the 'close' event.

If I read the current documentation, I understand that we could have a case where a descriptor is closed because no more data will be fecthed from it (and the 'close' callback is called). Some data could still be in internal buffer (because the downstream stream has reached its highWaterMark).
And when this data is flushed, and 'end' will be emitted.

Thus, in my understanding, the doc says it is ok to have 'close' before 'end'.

This is what @dominictarr and @jonathanong say we should not have because 'close' should be the last event if we want it to be useful.

@trevnorris do you agree with the fact that the 'close' event should be the last event ever emited by a stream (be it readable, writable or both) ?

@trevnorris
Copy link

@jeromew That's not possible to guarantee with the new pull based Streams3 API. When incoming data was immediately fed to the user on the 'data' event, then yes I would agree. Though since the new 'readable' API it's possible for the user to close the connection and still have data internally buffered and waiting to be read.

'close' and 'end' are related to different resources. As soon as the connection is fully closed then I expect to be notified. In fact it is an advantage to the developer to know that it's impossible for more data to be received from, say, a readable stream but that there is still more data internally queued and waiting to be read.

So I disagree that 'close' must always fire after 'end'. It would be incorrect. The event should be emitted when the event is completed. Not artificially withheld. That would mislead the developer to how many resources are actually in use. e.g. the connection closes and low level resources are cleaned up, classes are destructed, etc. The only thing that remains is the Streams JS object that contains the buffered data waiting to be read by the user.

The current definition is also silent on the case when there are many underlying resources (for example many file descriptors). Should we have many 'close' event or just one ?

Can you please give an example scenario where a single stream is linked to multiple file descriptors? If that's possible then, IMO, it's a design flaw.

@dominictarr

I see "close" as "this stream is totally ended, and will never emit another event"

That's impossible to guarantee. Streams inherit from EventEmitter and can emit whenever and however they want. In fact, I could even hijack a TCP Stream and emit data from JS even though the data hasn't come over TCP. That's the double edge sword of an abstraction layer.

But honestly, this is all just bikeshedding the name of events. For the sake of argument let's rename 'close' to 'cannotReceiveMoreDataFromIntendedResource'. The event itself is important. The timing of when the event is fired is important. So it seems there's a bigger question of whether more events should be added, and what they should be named.

@dominictarr
Copy link

@trevnorris hmm, well that interpretation certainly worked well in classic streams.

I think the problem here is that we are discussing this totally in abstract - such discussions normally get nowhere, and or wind up going down some weird path that you'd probably never actually use in practice.

So - we should ask, what do we use the close event for, in practice.

I can think of a few examples where I have used the close event.
I've written to an fs stream, and waited for close to know that it's certainly on disk,
(or at least, passed to the kernel) and so a any fs.read will get the same data.

I also used the close event in the same way leveldb, writing to a stream, and waiting for the close event before reading again. Hence, the interpretation that 'close' means the stream is done.

I am having trouble thinking of a situation where I have needed close on a readable stream.
normally I'd just set a listener on close to check if the stream has ended.
(in 0.8 and before, some net streams could close without emitting end, if they terminated before getting a fin packet.)

@jeromew
Copy link
Author

jeromew commented Aug 22, 2014

I can answer with both a theoric & concrete example.

@trevnorris you ask what kind of stream could have 2 fd open. I can totally imagine a userland readable stream opening 2 files containing respectively data for left and right speakers ; ans mixin them. or a userland writable stream opening 2 files to separate audio from video in a video file.
That may be too far fetched I don't know but those are things I imagine node.js could do with streams.

clearly your description corresponds to the current documentation for readable streams (by the way, the "close" event documentation is located in the Readable part) :

  • there is a subsystem resource that we opened because we want to read from it
  • we close is as fast as we can to avoid keeping things open and emit 'close' immediatly (net: WIP run close callbacks in correct eloop phase #6802 would certainly help with that)
  • js land can take its time to finish pushing the internal buffers (it could take time in the case of a very slow consumer with low highWaterMark)
  • then the 'end' is emitted

so in a r.pipe(w) case, r will emit close (optional) then end

Reading @dominictarr, the 'close' event seem to have also been used in userland for Writable streams. In the r.pipe(w) scenario

  • r emits 'end' event which signals that the readable part of the stream has been totally flushed
  • the stream has to finish writing to its ressources
  • and then w emits 'close' event as a guarantee that everything is finished and that the data we just wrote can be read safely from the resource (modulo kernel)

@trevnorris do you agree that 'close' can be rightfully sent by a writable stream ?

what happens with a duplex stream that writes to the input pin of a dsp and reads from the output pin ? should it send 2 events ?

Now my practical issue from which all of this is coming is an incompatibility between the 2 modules

duplex-child-process tries to wrap a child_process.spawn as a duplex stream.
`raw-body' is used to re-assemble a buffer from the stream.

The problem comes when I use a duplex-child-process proc with raw-body' in the scenariogetRawBody(proc)`

raw-body listens to both 'end' and 'close' events from proc (cf /~https://github.com/stream-utils/raw-body/blob/master/index.js#L89) but 'close' does a cleanup that wipes the already received buffer parts. so if proc emits 'close' then 'end', it breaks.

before I did a hack (0.0.5) to make sure 'close' is sent after 'end', duplex-child-process used to send 'close' when spawn sends 'exit', because we could consider that the resource (the process) is closed. The readable part of the process may end later when it is piped somewhere. It may as well never 'end' if it is not piped somewhere.

In the case getRawBody(proc), since there is a listener on 'data', 'end' is emitted.

In my mind, duplex-child-process should be allowed to send 'end' after 'close' but this is clearly incompatible with raw-body

According to @jonathanong 'close' should always be sent after 'end' and this is the way it goes in many modules.

Does it makes things more clear on my initial pain point and my API-wise headache ?

Here are a few questions for which the answer does not seem clear to me if you still have some courage (thanks a lot @trevnorris and @dominictarr for giving your very insightfull help on this)

  • do we agree that writable streams are allowed to emit 'close' events ? (cf http://nodejs.org/api/stream.html)
  • do you think that the fact that raw-body breaks when it receives 'close' before 'end' is a bug ? Can we guarantee that 'end' is always fired on an incomingConnection in ALL cases ? It seems that 'close' is used here as an anti-leak fallback in the case that 'end' is not sent.
  • in the case of a duplex that has a resource on the write and a resource on the read side, should it be allowed to emit 2 'close' events ? should we add a reference to the closed resource in the 'close' event callback ?
  • in the r.pipe(w) scenario, if w emits 'close' then it is unpiped automatically. what does it say about 'close' ?

@dominictarr
Copy link

@jeromew the most important part of my question is how do the clients of the streams interface with the close event?
In the case of a T stream (that splits one input into two streams), I think this could be defined much better if it just took two streams instead of two files. That way you could use it to write to two network sockets, also. It should emit close when all of it's component streams have emitted close.

@jasnell
Copy link
Member

jasnell commented Aug 14, 2015

Closing this here. A new PR has been opened at http://github.com/nodejs/node

@jasnell jasnell closed this Aug 14, 2015
jasnell added a commit to jasnell/node that referenced this pull request Aug 26, 2015
jasnell added a commit to nodejs/node that referenced this pull request Aug 26, 2015
per: nodejs/node-v0.x-archive#8209

originally submitted by @jeromew

Reviewed By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
PR-URL: #2378
jasnell added a commit to nodejs/node that referenced this pull request Aug 26, 2015
per: nodejs/node-v0.x-archive#8209

originally submitted by @jeromew

Reviewed By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
PR-URL: #2378
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants