Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brotli defaults to quality 11 #26097

Closed
Genbox opened this issue May 7, 2018 · 12 comments
Closed

Brotli defaults to quality 11 #26097

Genbox opened this issue May 7, 2018 · 12 comments

Comments

@Genbox
Copy link

Genbox commented May 7, 2018

The newly added compression algorithm Brotli as tracked by #23936 defaults to quality 11 as seen here:
/~https://github.com/dotnet/corefx/blob/8f9cfa462dfc2ec5bdbbe2462693036861fb0199/src/System.IO.Compression.Brotli/src/System/IO/Compression/BrotliUtils.cs#L13

From a quick look at Brotli's code (/~https://github.com/google/brotli), it seems level 5 and 6 is intended for optimal ratio/speed and the Brotli team seems to tweak the algorithm to make sure it is the case.

I tested against a 518,461,845 byte file I use for compression benchmarks.

  • With Brotli and CompressionLevel.Optimal (quality = 11 | goes from 0 to 11) it takes ~14 minutes to compress.
  • With Deflate and CompressionLevel.Optimal (quality = 6 | goes from 0 to 9) it takes ~6 seconds to compress.

It seems that defaulting to 11 makes the algorithm very slow for the common case.

@Genbox
Copy link
Author

Genbox commented May 7, 2018

That being said. CompressionLevel only has 3 levels:

  1. Optimal
  2. Fastest
  3. NoCompression

I would suggest having 4 levels:

  1. NoCompression
  2. Fastest
  3. Optimal
  4. Best

Note that I've also ordered the levels to reflect their compression ratio, which is currently not the case. If needed, I can create a separate issue for this suggestion.

@jnm2
Copy link
Contributor

jnm2 commented May 8, 2018

‘Optimal’ and ‘best’ sound like synonyms to me.

@Genbox
Copy link
Author

Genbox commented May 8, 2018

@jnm2 Optimal does mean the balance between speed and ratio, and best does mean the highest ratio, but you are right; it is not intuitive to developers. Words like "minimal", "normal" and "maximum" are used in common compression applications. Most applications have about 5 levels:

  1. Store
  2. Fast
  3. Normal
  4. High
  5. Maximum

I think it would make better sense to use those levels than the current ones. However, this issue is really about the default quality level of 11 in the BrotliEncoder which is just crazy.

@Genbox
Copy link
Author

Genbox commented May 8, 2018

@asthana86 could you label this with area-System.IO.Compression instead? I would also say that it should be marked as a bug as this setting prevents all use of the new algorithm.

@jnm2
Copy link
Contributor

jnm2 commented May 8, 2018

@Genbox

Optimal does mean the balance between speed and ratio, and best does mean the highest ratio, but you are right; it is not intuitive to developers.

It's just as easy for me to talk about the best balance between speed and ratio, versus the optimal compression ratio. 😊

@stephentoub
Copy link
Member

@joshfree, @ianhays, anything that needs to be done here for 2.1?

@ianhays
Copy link
Contributor

ianhays commented May 11, 2018

Google's C brotli implementation uses 11 as the default quality. Our C# brotli code is built on top of Google's C code, so the default compression level is the same.

If you want another value like 5 or 6, you can always just pass an integer for the exact level you want, whether using BrotliEncoder or BrotliStream

Optimal does mean the balance between speed and ratio, and best does mean the highest ratio, but you are right; it is not intuitive to developers.

Optimal does not mean optimal balance, it means optimal compression ratio i.e. best compression. See the docs for CompressionLevel

Closing this as by-design.

@ianhays ianhays closed this as completed May 11, 2018
@ianhays
Copy link
Contributor

ianhays commented May 11, 2018

FWIW, I agree with you about the need for a middle-ground CompressionLevel that represents the best balance between speed and compression ratio. There were enough compat concerns about adding a new enum that we decided against it.

However, that did result in us flip-flopping between making Optimal for DeflateStream map to 9 before changing it back to 6 for a balance later. For DeflateStream there are heavy diminishing returns in increased Compression Ratio past level 6 and huge speed hits, so we determined it to be not worth it.

For Brotli, I chose 11 as the Optimal for two main reasons:

  • The docs define Optimal as the absolute best compression without regard to compression time, and for the scenarios we expect Brotli to be primarily used in (server-side compression, browser-based decompression, preference for smallest files) every little bit of compression ratio is important.
  • 11 is the default C encoder compression level, and I like keeping our defaults for our wrapper implementations the same as the defaults for their underlying native implementations.

Just for fun (because it's far too breaking to do), if I was designing CompressionLevel as a new enum, I would have made it:

  • NoCompression (Deflate 0, Brotli 0)
  • Fastest (Deflate 1, Brotli 1)
  • Balanced (Deflate 6, Brotli 9)
  • Best (Deflate 9, Brotli 11)

@Genbox
Copy link
Author

Genbox commented May 12, 2018

It is not acceptable to choose the highest level as the default for a library like .NET Core. You say the use case is server-side compression, browser-based decompression and preference for smallest files, however, they are irrelevant this is just a technology and will be used for whatever by users.

From the announcement here: https://blogs.msdn.microsoft.com/dotnet/2018/05/07/announcing-net-core-2-1-rc-1/ it says: "The BrotliStream behavior is the same as that of DeflateStream or GZipStream to allow easily converting DeflateStream/GZipStream code to use BrotliStream."

That's not the case! We switched from DeflateStream to BrotliStream and our application took between 30x and 20000x the time to compress the same data due to this default value. Compatability is not just about APIs, it is also about similar performance characteristics, especially when there is a global announcement that BrotliStream is better than DeflateStream, and people can just switch...

I understand you like to have backwards compatibility to the original C code, but there is no need to move Google's choice of default into .NET Core, especially considering the huge impact it will have on applications worldwide once 2.1 is out.

@saucecontrol
Copy link
Member

saucecontrol commented Mar 20, 2019

Just for fun (because it's far too breaking to do), if I was designing CompressionLevel as a new enum, I would have made it:

  • NoCompression (Deflate 0, Brotli 0)
  • Fastest (Deflate 1, Brotli 1)
  • Balanced (Deflate 6, Brotli 9)
  • Best (Deflate 9, Brotli 11)

Was this considered "far too breaking" only for a point release, or would it also too breaking for 3.0?

Although it's possibe to explicitly set the desired Brotli quality level in BrotliEncoder, there are other places, such as ASP.NET Core's BrotliCompressionProvider, where config is restricted to the CompressionLevel enum.

In fact, the Response Compression Middleware docs actually recommend using CompressionLevel.Optimal with Brotli, which could be catastrophic for performance.

The Brotli Compression Provider defaults to the fastest compression level (CompressionLevel.Fastest), which might not produce the most efficient compression. If the most efficient compression is desired, configure the middleware for optimal compression.

It's also not ideal that zlib level 9 is no longer available at all.

The revised enum suggested above would really clear things up.

@droyad
Copy link

droyad commented Nov 26, 2019

For BrotliStream and likely other places that are restricted to CompressionLevel, you can use the work around described by @Taritsyn in #1556: new BrotliStream(dest, (CompressionLevel) level).

I moved from another library that implemented a BrotliStream and the performance got 30x worse. I did some (rought) testing and for my data (45Mb JSON file) I go the following results, a massive 385x difference between Fastest and Optimal:

image

image

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@prj
Copy link

prj commented Dec 6, 2020

This issue makes the provided .net functionality completely unusable. Quality 1 is pretty much the same as gzip, with perhaps some performance speedup and Quality 11 is too slow to be ever practical.

The comment by @ianhays absolutely baffles me. Are you guys making an API as a brain exercise in design or do you expect people to actually use it?

The only way to use it through a hack: new BrotliStream(target, (CompressionLevel)9); for example.
Optimal should always be 9 on Brotli. There is never any reason to use anything higher. For anything higher you should use a different algorithm.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants