-
-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Browser crashes (OOM?) when using a custom ReadableStream
#145
Comments
The streamsaver isn't a silver bullet when it comes to pulling/pushing data to the service worker over MessageChannel. it's sort of half implemented. MessageChannel is used when transferable stream isn't available, dose it flushes the queue quite instantaneous. The writable/readable Strategy where not implemented before i learned that transferable stream became avalible. It's possible to emulate this behaviour by using postMessage(). Browser are not a silver bullet when it comes to pulling data also. They are atomic. Meaning they write to a temporary file and starts pulling data and writes to a temporary file immediately even before you have selected a place/name to write to. When the stream closes then it moves/rename the temporary file to where you want your file to end up. Still doe you are able to pause the stream and it stops pulling data - this is possible in canary or with some experimental flags enabled. but that is b/c it uses transferable streams instead of MessageChannel In most scenarios you don't have all data buffered up in the browser at one single point. and haven't been any issue for someone yet. often you wait on some data to arrive, so maybe one "dirty" solution would be just to delay the writes async pull (controller) {
await sleep(100)
controller.enqueue(data)
} |
Thanks a lot for this information @jimmywarting , I didn't realize some of the implementation quirks (clearly, I haven't spent enough time reading the source code). I will try out your suggestion, and also look into changing my implementation a little: I will report back with the results. |
Hey @jimmywarting , your suggestion works perfectly - yielding to the browser for a millisecond or so seems to prevent the OOM described in this ticket. My final implementation, which worked for >10 GB sizes, is: const TOTAL_SIZE = 10_000_000_000;
const CHUNK_SIZE = Math.pow(2, 18);
const fileStream = window.streamSaver.createWriteStream('filename.txt', {
size: TOTAL_SIZE,
// These *should* ensure that only one chunk is queued before
// flushing to the output file?
writableStrategy: new CountQueuingStrategy({ highWaterMark: 1 }),
readableStrategy: new CountQueuingStrategy({ highWaterMark: 1 }),
});
function sleep(msec) {
return new Promise((resolve, reject) => {
window.setTimeout(() => resolve(), msec);
});
}
// A source that just produces NUL bytes per-pull until all data is
// pulled
class UnderlyingSource {
constructor() {
this.remaining = TOTAL_SIZE;
}
start(controller) {
}
async pull(controller) {
await sleep(1);
const buffer = new ArrayBuffer(CHUNK_SIZE);
if (this.remaining > 0) {
const readSize = Math.min(this.remaining, CHUNK_SIZE);
this.remaining -= readSize;
controller.enqueue(new Uint8Array(buffer, 0, readSize));
} else {
controller.close();
}
}
}
const ponyfill = window.WebStreamsPolyfill || {};
const reader = window.ReadableStream.prototype.pipeTo ?
new window.ReadableStream(new UnderlyingSource()) :
new ponyfill.ReadableStream(new UnderlyingSource());
reader.pipeTo(fileStream).catch(err => console.error(err)); |
ps. another performance trick would be to also make all chunks that you are sending also transferable so browser don't have to copy over the data. Lines 263 to 266 in a6ec1df
channel.port1.postMessage(chunk, [ chunk ]) But doing so can have side effects in the main thread if you plan to reuse the chunk. |
@jimmywarting great suggestion - I've been getting high CPU usage from having to copy the chunks. Looking into the Transferable docs, it looks like it's an interface that the browser manufacturers implement (as in, I can't implement it for a custom class). I don't need to recycle the chunks but the issue (I think?) is that I would need to pass either an What's the best strategy for me to adopt the above code to get transferrable semantics? |
correct, you can't implement some custom transferable on a custom classes. the readable stream chunks the Response expects a byte stream of Uint8Array chunks and nothing else so all chunks needs to be written as such. but it is also possible to transfer arraybuffer views as well. var chunk = new Uint8Array(20)
console.log(chunk.length) // 20
mc.port1.postMessage(chunk, [chunk.buffer])
console.log(chunk.length) // 0 so if you would like to go fancy and have some zip prototcol or something you can have multiple views that uses the same shared buffer const buffer = new ArrayBuffer(20)
const payload = {
header: new Uint8Array(buffer, 0, 5)
body: new Uint8Array(buffer, 5, 15)
footer: new Uint8Array(buffer, 15)
}
postMessage(payload, [buffer]) |
Now that when i think of it... it may possible be that MessageChannel is faster than transferable streams i expect transferable streams clones the data but i haven't measured it |
as expected - did't read it all. |
Would you like to help take a stab at improving StreamSaver with a PR to maybe help solve this back pressure issue (including sending back a cancel event #13) And potentially include an option for making transferable an option if (option.transfer) {
channel.port1.postMessage(chunk, [ chunk ])
} else {
channel.port1.postMessage(chunk)
} |
Looking at this upcoming project I'm spinning up on, I will almost certainly hit some performance and comparability issues that might necessitate upstreaming some features into this library, yeah. However, I'll need to talk with my colleagues, and also push my work forward a bit more, before I can commit to anything concrete - it may turn out that a completely unrelated project constraint means that we don't move forward with an implementation that uses this. If I do get the time, in or outside work, I'll have a stab at those PRs, but I can't currentently make any promises. Regardless though, the library looks very promising and has definitely opened up some new avenues I didnt previously think were possible in a browser sandbox |
As for this particular issue, your feedback has been really helpful - I now have an implementation that can stream large (40 gb) datasets as a file on chrome, edge (80+, I had to monkeypatch for earlier versions), and firefox73 |
Hi, this library seems very useful for a project I'm working on, which involves downloading extremely large (>10 GB) datasets (with client-side transformations). Skipping the details a bit, my use-case is going to involve copying a lot of buffers from wasm contexts, rather than the data source being a
Resonse
object (for example).As a first step, I wrote a prototype
underlyingSource
that generates null-initialized buffers andenqueue
s them onto the controller (code example below). This implementation works as expected for smaller filesizes on the browsers I'm targeting at the moment (Firefox73, Chromium 80); however, once I start increasingTOTAL_SIZE
above ~100 meg on my dev machine, the browser session will just crash.I understand that the implementation below could recycle buffers using something like a
ReadableByteStream
or something more appropriate, but my understanding is that the implementation should start sinking the file one chunk at a time, applying backpressure where necessary (so thatpull
isn't over-called). However, debugging shows that the implementation callspull
thousands of time, with no file prompt, and then crashes. What have I done wrong?The text was updated successfully, but these errors were encountered: