Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of memory copying large files #164

Closed
AndreiRinea opened this issue Jun 15, 2020 · 5 comments
Closed

Running out of memory copying large files #164

AndreiRinea opened this issue Jun 15, 2020 · 5 comments

Comments

@AndreiRinea
Copy link

AndreiRinea commented Jun 15, 2020

Hi,

I am doing a PoC using StreamSaver.js in which the user supplies a (very large) file that I sequentially read and then save it using StreamSaver.js to the downloads. The issue is that using 64 GB files the browser runs out of memory (around 60 GB of RAM) and the download fails (the process seems to be terminated). Am I using StreamSaver.js wrong? Here's the code:

    const streamSaver = window.streamSaver;
    const chunkSize = 64 * 1024 * 1024; // 64 MB

    function parseFile(file, process, end, error) {
        const fileSize = file.size;
        let offset = 0;
        let chunkReaderBlock = null;

        let readEventHandler = function (evt) {
            if (evt.target.error == null) {
                offset += evt.target.result.byteLength;
                process(evt.target.result);
            } else {
                error(evt.target.error);
                return;
            }
            if (offset >= fileSize) {
                end();
                return;
            }
            // noinspection JSValidateTypes
            chunkReaderBlock(offset, chunkSize, file);
        }

        chunkReaderBlock = function (_offset, length, _file) {
            const fileReader = new FileReader();
            const blob = _file.slice(_offset, length + _offset);
            fileReader.onload = readEventHandler;
            fileReader.readAsArrayBuffer(blob);
        }

        chunkReaderBlock(offset, chunkSize, file);
    }

    function copyFile() {
        const file = document.getElementById("fileInput").files[0]; // skipped file selection validation

        const writeStream = streamSaver.createWriteStream(file.name/*, {size: file.size}*/);
        const writer = writeStream.getWriter();

        parseFile(file, async ab => {
            await writer.write(new Uint8Array(ab));
        }, async () => {
            await writer.close();
        }, async () => {
            await writer.abort();
        });
    }

The HTML around it is:

    <html>
    <header>
        <title>File copying code sample</title>
    </header>
    <body>
    <script src="https://cdn.jsdelivr.net/npm/web-streams-polyfill@2.0.2/dist/ponyfill.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/streamsaver@2.0.3/StreamSaver.min.js"></script>
    
    <script type="text/javascript">


and 
    </script>
    <h3>(Very large) file copying client-side example</h3>
    <input type="file" id="fileInput"/>
    <button onclick="copyFile()">Copy file</button>
    </body>
    </html>
@AndreiRinea
Copy link
Author

as another test I commented out the await writer.write(new Uint8Array(ab)); line and that way it doesn't end up in Out-of-memory state. Obviously it won't write the destination file either, but it makes me think that either the Uint8Array instance is generating the leak or write.write(...)...

@jimmywarting
Copy link
Owner

jimmywarting commented Jun 18, 2020

The backpressure isn't that perfect in StreamSaver
there is two ways chunks can be transferred to the service worker.

  • Transferable streams this is the first method StreamSaver will use if it's available
  • 2nd method is with postMessage - this have some issues since they aren't tightly coupled (piped)

Now, you will likely not use Transferable streams since it's only behind a a flag in browser that has it ATM.

So when you write a chunk and use await all you are really doing is sending that piece of chunk with a postMessage and then it's all done. there is nothing that the promise waits for, like "is it done writing to the disk?" - all it really is doing is kinda:

async function write (chunk) {
  messageChannel.port1.postMessage(chunk)
}

... with some abstraction on top... the postMessage could be improved by using transferable chunks to transfer it faster to the service worker instead of using the default "clonable" algorithm

async function write (chunk) {
  messageChannel.port1.postMessage(chunk, [chunks])
}

but i haven't implemented it. i have no idea what the developers do with the chunk afterwards (they might reuse it - it could be a option doe)

the other issue is that the service worker don't poll (ask for more data when it's ready to accept more data) or sending back some kind of message that you are filling the stream bucket too fast.


All of this problem can go away if you have support for transferable streams. so i haven't putt in too much effort into making a solid "main > service worker"-pipeline


You are not experience a memory leak, you are just simply over flooding the stream bucket faster than what it can write to the disk.

@jimmywarting
Copy link
Owner

going to close this since it's pretty much the same as #145
So we could move the discussion over there instead.

@jimmywarting
Copy link
Owner

jimmywarting commented Jun 18, 2020

if you want to be fancy, you could just do:

function copyFile() {
  const [ file ] = document.querySelector('#fileInput').files
  const writeStream = streamSaver.createWriteStream(file.name)

  file.stream().pipeTo(writeStream)
  // or 
  new Response(file).body.pipeTo(writeStream)
}

but that has lower browser support unfortunately

@AndreiRinea
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants