Running out of memory copying large files #164

AndreiRinea · 2020-06-15T13:28:08Z

Hi,

I am doing a PoC using StreamSaver.js in which the user supplies a (very large) file that I sequentially read and then save it using StreamSaver.js to the downloads. The issue is that using 64 GB files the browser runs out of memory (around 60 GB of RAM) and the download fails (the process seems to be terminated). Am I using StreamSaver.js wrong? Here's the code:

    const streamSaver = window.streamSaver;
    const chunkSize = 64 * 1024 * 1024; // 64 MB

    function parseFile(file, process, end, error) {
        const fileSize = file.size;
        let offset = 0;
        let chunkReaderBlock = null;

        let readEventHandler = function (evt) {
            if (evt.target.error == null) {
                offset += evt.target.result.byteLength;
                process(evt.target.result);
            } else {
                error(evt.target.error);
                return;
            }
            if (offset >= fileSize) {
                end();
                return;
            }
            // noinspection JSValidateTypes
            chunkReaderBlock(offset, chunkSize, file);
        }

        chunkReaderBlock = function (_offset, length, _file) {
            const fileReader = new FileReader();
            const blob = _file.slice(_offset, length + _offset);
            fileReader.onload = readEventHandler;
            fileReader.readAsArrayBuffer(blob);
        }

        chunkReaderBlock(offset, chunkSize, file);
    }

    function copyFile() {
        const file = document.getElementById("fileInput").files[0]; // skipped file selection validation

        const writeStream = streamSaver.createWriteStream(file.name/*, {size: file.size}*/);
        const writer = writeStream.getWriter();

        parseFile(file, async ab => {
            await writer.write(new Uint8Array(ab));
        }, async () => {
            await writer.close();
        }, async () => {
            await writer.abort();
        });
    }

The HTML around it is:

    <html>
    <header>
        <title>File copying code sample</title>
    </header>
    <body>
    <script src="https://cdn.jsdelivr.net/npm/web-streams-polyfill@2.0.2/dist/ponyfill.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/streamsaver@2.0.3/StreamSaver.min.js"></script>
    
    <script type="text/javascript">


and 
    </script>
    <h3>(Very large) file copying client-side example</h3>
    <input type="file" id="fileInput"/>
    <button onclick="copyFile()">Copy file</button>
    </body>
    </html>

AndreiRinea · 2020-06-15T16:41:20Z

as another test I commented out the await writer.write(new Uint8Array(ab)); line and that way it doesn't end up in Out-of-memory state. Obviously it won't write the destination file either, but it makes me think that either the Uint8Array instance is generating the leak or write.write(...)...

jimmywarting · 2020-06-18T22:40:51Z

The backpressure isn't that perfect in StreamSaver
there is two ways chunks can be transferred to the service worker.

Transferable streams this is the first method StreamSaver will use if it's available
2nd method is with postMessage - this have some issues since they aren't tightly coupled (piped)

Now, you will likely not use Transferable streams since it's only behind a a flag in browser that has it ATM.

So when you write a chunk and use await all you are really doing is sending that piece of chunk with a postMessage and then it's all done. there is nothing that the promise waits for, like "is it done writing to the disk?" - all it really is doing is kinda:

async function write (chunk) {
  messageChannel.port1.postMessage(chunk)
}

... with some abstraction on top... the postMessage could be improved by using transferable chunks to transfer it faster to the service worker instead of using the default "clonable" algorithm

async function write (chunk) {
  messageChannel.port1.postMessage(chunk, [chunks])
}

but i haven't implemented it. i have no idea what the developers do with the chunk afterwards (they might reuse it - it could be a option doe)

the other issue is that the service worker don't poll (ask for more data when it's ready to accept more data) or sending back some kind of message that you are filling the stream bucket too fast.

All of this problem can go away if you have support for transferable streams. so i haven't putt in too much effort into making a solid "main > service worker"-pipeline

You are not experience a memory leak, you are just simply over flooding the stream bucket faster than what it can write to the disk.

jimmywarting · 2020-06-18T22:42:23Z

going to close this since it's pretty much the same as #145
So we could move the discussion over there instead.

jimmywarting · 2020-06-18T22:46:18Z

if you want to be fancy, you could just do:

function copyFile() {
  const [ file ] = document.querySelector('#fileInput').files
  const writeStream = streamSaver.createWriteStream(file.name)

  file.stream().pipeTo(writeStream)
  // or 
  new Response(file).body.pipeTo(writeStream)
}

but that has lower browser support unfortunately

AndreiRinea · 2020-06-19T23:23:35Z

Thank you!

jimmywarting closed this as completed Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running out of memory copying large files #164

Running out of memory copying large files #164

AndreiRinea commented Jun 15, 2020 •

edited by jimmywarting

Loading

AndreiRinea commented Jun 15, 2020

jimmywarting commented Jun 18, 2020 •

edited

Loading

jimmywarting commented Jun 18, 2020

jimmywarting commented Jun 18, 2020 •

edited

Loading

AndreiRinea commented Jun 19, 2020

Running out of memory copying large files #164

Running out of memory copying large files #164

Comments

AndreiRinea commented Jun 15, 2020 • edited by jimmywarting Loading

AndreiRinea commented Jun 15, 2020

jimmywarting commented Jun 18, 2020 • edited Loading

jimmywarting commented Jun 18, 2020

jimmywarting commented Jun 18, 2020 • edited Loading

AndreiRinea commented Jun 19, 2020

AndreiRinea commented Jun 15, 2020 •

edited by jimmywarting

Loading

jimmywarting commented Jun 18, 2020 •

edited

Loading

jimmywarting commented Jun 18, 2020 •

edited

Loading