-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Achieve better audio compression using variable block size #639
Comments
Somebody already created a somewhat brute-force variable blocksize encoder by modifying libFLAC. I tested this and these were my results. As you can see, gains are not tremendous, we're talking about 0.3% here. I cannot read your graph, for some reason Firefox, GIMP and Inkscape just render a big black rectangle. Anyway, I've already tried an university library a few times, went through google scholar and the likes and tried a few algorithms, but I haven't found an algorithm that can pick reasonable blocksizes reliably. I will not implement brute-force or search, it needs to be somewhat intelligent. Also, there is a problem of compatibility. Quite a lot of (hardware) decoders are unable to decode variable blocksize files. See here. To name a big one, Android devices refuse playback. That means implementing this in FLAC will cause somewhat of a rift. I've made the decoder testbench in the hope that developers properly implement playback of variable blocksize files, but it doesn't seem to have reached the Android people yet (or they don't care). So, summing up: there is a potential 0.5% gain possible (I think I'm being optimistic here), at the cost of (1) tremendously slow encoding (as long as no suitable algorithm is found), (2) adding a lot complexity, and (3) compatibility with hardware devices. Sure, I want to implement this at some point. Mostly to have libFLAC cover as much of the specification as is possible. But I need a good algorithm and a lot of time. So this won't be included anytime soon, maybe never. |
Thank you for your answer! I wasn't aware of the black box of the SVG (since the preview in GitHub is shown just fine). Sorry for the inconvenience. I also provide here all the SVG in an archive, so they should be rendered correctly: The purpose of benchmarking a faster preset (using Thank you for sharing flaccid, a FLAC encoder that I never saw nor heard of. I'll read and test it ASAP. I quite expected that hardware decoders would not support variable blocksize (even if some of them might), but Android ? I might as well have misunderstood something with how data is encoded in FLAC files: you're saying there is a 0.5% gain possible while being optimistic. Thanks for your time. |
@T-3B: Just to note, CUETools.Flake has disabled its variable blocksize mode last year, after it was discovered it encodes wrong: gchudov/cuetools.net#220 . I don't whether that bug is present in Justin Ruggles' original Flake. If there were some smart and quick way to estimate optimal block size based on signal characteristics - maybe based on an initial -b4096 run - then maybe that could also improve a fixed-block strategy. Just make the second run and pick the smallest file. In both cases (variable or optimized-fixed) I suspect the impact could be bigger on material with higher sampling rates - simply because, over the years, the reference implementation has gotten well tuned for 44.1/48 (firing a shotgun blindly has less chances of improving there, better chances at high resolution). For one thing, the default block size of 4096 corresponds to half the duration if sample rate is doubled ... Also, if one wants to stay within the streamable subset, the -b limit is raised from 4608 to 16384, so there are more opportunities to test. But, if one wants to stay within subset, one should take into account there are signals where the -r8 limit actually binds. Not so many with CDDA in my experience (although particular examples are not hard to find), but what if it so happens that "optimal partition size" is 32 on a particular high resolution signal? -8r7 is subset, but -b16384 -r9 is not, and for subset one will have to choose between doubling the partition size and halving the frame size. |
For one thing, when I say 0.5% I actually mean 0.5%-point, that is my bad. So, you're comparing the gain to the compressed size, and I'm comparing the gain to the uncompressed size. Generally, 0.5%-point is about 1% here. Also, my numbers are means to be an average over a large number of tracks, while it seems to me the 0.5% you mention is a peak value for certain tracks, some tracks do not improve at all. |
Only a minor overhead of a few bytes per frame, thanks to fixed blocksize frames being indexed by frame number whereas variable blocksize frames are indexed by sample number. Single digit kilobytes per track. IMO flaccid is best used when aiming for maximal compression, none of my experiments with fast algorithms to try and compete with fixed have been successful albeit they are very simple algorithms that to an extent rely on weak brute force so that's no surprise. I'd argue that peakset is good enough as a strong brute-force technique, it doesn't cover every possible blocksize but what it does cover it does optimally as it's not greedy, it weaves the best chain in a large window with a set of blocksizes. There are many rarely useful blocksizes which all but the most expensive algorithms should probably avoid entirely just because they are that much less likely to be beneficial. Very small or very large blocksizes rarely help, also blocksizes that aren't divisible by a large enough power of 2 (preferably 64 but arguably minimally 8 or 16) limits the maximum partition order the residual can be stored in which disadvantages them by default. Sane peakset settings naturally avoid these, so it's not as detrimental as it appears for peakset to only be considering a handful of blocksizes out of the potential ~65536. IMO the ideal non-brute variable blocksize algorithm would probably vary more than just blocksize, there's probably a corelation between blocksize (or rather time frame) and the optimal choice of for example apod settings. Not something I can see someone being able to come up with short of some ML black box magic. |
I would like to know if variable block size encoding in
flac
is planned or not.If not, I'm writing this issue to demonstrate its benefits (the main one being better compression).
Introduction
Currently,
flac
is the encoder providing the smallest files when used with the right parameters.I tested many FLAC encoders:
FLACOut.exe
,CUETools.Flake.exe
, justinruggles'Flake
,CUETools.FLACCL.cmd.exe
, and xiph'sflac
(ask for more details about the args tested if needed).The files used for the test: different genres, 44.1 and 96 kHz, 16 and 24-bit, stereo.
Only the first 3 encoders allows variable block size encoding.
Even comparing
flac
(which currently only supports constant blocksize encoding) with other encoders supporting variable block size encoding,flac
still managed to beat all others (in term of size).Command used:
flac in.wav -smepb 4096 -P 0 -r 15 -l 32 --lax -A subdivide_tukey\(6\)
The great advantage
flac
has compared to others (if I'm not misunderstanding), is its good predictors likesubdivide_tukey()
. Increasing the number given to this function always result in a smaller output file (but takes much more time to encode).Why allow variable block size encoding?
The answer is simple: better compression (smaller output file).
Intelligent selection of different block sizes within a single FLAC stream can reduce file size (at the cost of longer encoding times).
What to expect?
It is a bit difficult to answer that one.
I ran quite many benchmarks, ending up with testing different constant block sizes with
flac
- 40 short files (PCM s16 44.1kHz, from 5 to 30 seconds). I don't remember from where I got them, ask them if you want.We are using the
flac
command from above (only blocksize is changed)These are the results (y-axis = filesize divided by filesize at 4096 blocksize; x-axis = blocksize; x-step = 32):
For easier reading (samples splitted in multiple graphs), see tukey6.tar.gz.
We can see that with in best cases, we can reach 99.5% of the file size of the 4096-constant blocksize.
This is quite small (0.5% better), but it shows that there is room for improvements.
Also, we are encoding FLACs with almost best options (those providing better compression: max-lpc-order, ...).
We can therefore expect a more significant compression gain for faster presets/encodes (for example a simple
-8ep
).So I would say FLAC files with intelligent variable blocksize encoding could be compressed 1% more (compared to a constant blocksize of 4096).
How could it be implemented?
I already thought about that. Or at least only about useful/meaningful presets.
This is an extrapolation of the data I collected with different constant blocksizes. Applying it to variable block size encoding is perhaps a bad thing, but imho is still relevant.
We can have 2 different approaches:
We can either compute all blocks or estimate the final size (for example, comparing the size of the frame using 4096-blocksize with double the size of the frame using 2048-blocksize).
Conclusion
I tried to modify the
flac
code, but unfortunately it is too hard for me. I never worked on a big project like this one, and also never modified other people's code. I'm woefully inexperienced in this kind of project.Further more,
flac
actually only support constant block size encoding, and changing it to support variable block size encoding would require many modifications in the code.I'm not an expert (and far from it)! I greatly appreciate
flac
and its contributors for their good work, and please correct me for any mistake I could have written.Thank you for your attention to this demand.
The text was updated successfully, but these errors were encountered: