Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Compression: ZipArchive loads entire file in memory on .Dispose #1543

Open
Tracked by #62658
qmfrederik opened this issue Sep 13, 2016 · 5 comments
Open
Tracked by #62658
Labels
area-System.IO.Compression enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@qmfrederik
Copy link
Contributor

When you open a ZipArchive in Update mode, the entire zip file will be loaded in memory when the .Dispose method is invoked.

This is because .Dipose calls .WriteFile, which:

  • Calls LoadLocalHeaderExtraFieldAndCompressedBytesIfNeeded for all entries, which loads the compressed data into memory for those entries
  • Sets the size of the .zip archive to 0, by calling _archiveStream.SetLength(0);
  • Writes out all entries one by one.

As a result:

  • A lot of memory is used, the compressed data for each entry is loaded into memory
  • A lot of unnecessary disk I/O is performed, because all entries are written out again, even if they were not modified.

An alternative may be to incrementally update the zip archive, and only update entries which changes.

@jakubsuchybio
Copy link

This is a real pain. We use full .NET Framework and we have some really large zip files where we need to update small files inside, but because of this loading into memory, we are getting OutOfMemoryException, because we have 32b application and we cannot change into 64bit, because of dependent driver DLLs that are in 32b.

@carlossanlop
Copy link
Member

Triage:
We should scope this to fix only loading entries that have changes.

@carlossanlop carlossanlop transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO.Compression untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added this to the Future milestone Jan 9, 2020
@Jlalond
Copy link
Contributor

Jlalond commented Jan 17, 2020

@carlossanlop Hmm Looks like it's only writing files on creates/updates.

But it looks like it's still iterating over them all , I don't see any properties in ZipArchiveEntry to denote if it's been modified, I'm going to keep doing research once I get home, but I think this would be fun to pick up

@IDisposable
Copy link
Contributor

Anyone working on this? I could pick it up if not.

@ulrichb
Copy link

ulrichb commented Nov 7, 2023

Just found this ticket after creating a duplicate (#94455), and wanted to crosspost our use case where the current behavior is a big issue:

a) We're dealing with potentially large user-provided ZIP files (in the GB range),
b) we need to update entries (actually not directly but via System.IO.Packaging.ZipPackage for OPC file processing), and
c) the whole can happen in parallel.

This means with the current ZipArchive implementation we need to reserve dozens of GBs of virtual memory just for System.IO.Packaging.ZipPackage processing, otherwise we're risking container memory limit violations and therefore OOM exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO.Compression enhancement Product code improvement that does NOT require public API changes/additions help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

9 participants