Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt result of flate2::read::GzEncoder #158

Closed
Martin1887 opened this issue Jun 2, 2018 · 8 comments
Closed

Corrupt result of flate2::read::GzEncoder #158

Martin1887 opened this issue Jun 2, 2018 · 8 comments

Comments

@Martin1887
Copy link

I am using flate2::read::GzEncoder to gzip a stream. However, when I try to ungzip the result I get an Err value: Custom { kind: InvalidInput, error: StringError("corrupt gzip stream does not have a matching checksum") }.

I'm doing probably something wrong. This is the code that I'm using:

// gzip Rocket body
let mut compressed = [0u8; 100];
success = GzEncoder::new(plain.into_inner(), flate2::Compression::default())
                        .read(&mut compressed[..])
                        .is_ok();

// In tests, ungzipping it
let mut s = String::new();
GzDecoder::new(&response.body_bytes().unwrap()[..])
    .read_to_string(&mut s)
    .unwrap();
assert_eq!(s, String::from(HELLO));

HELLO has the following value: "Hello world!", whose bytes are: [72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33].

&response.body_bytes().unwrap()[..] has the following value (these are the bytes of the corrupted gzip): [31, 139, 8, 0, 0, 0, 0, 0, 0, 255, 1, 12, 0, 243, 255, 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0].

I have tried to ungzipping a custom vector of bytes without the final zeroes and I get the same error.

Am I doing something wrong? Something is missing?

Thanks.

@vitalyd
Copy link

vitalyd commented Jun 2, 2018

You probably want to use read_to_end in the encoding phase:

let mut compressed = vec![];
success = GzEncoder::new(plain.into_inner(), flate2::Compression::default())
                        .read_to_end(&mut compressed)
                        .is_ok();

But if you want to read into an array using read(), you'll need to loop until you get a read of 0:

let mut compressed = [0u8; 100];
let mut enc = GzEncoder::new(plain.into_inner(), flate2::Compression::default());
let mut num_read = 0;
loop {
        let count = enc.read(&mut compressed[num_read..]).unwrap();
        if count == 0 {
            break;
        }
        num_read += count;
}

@Martin1887
Copy link
Author

Thanks for answering in such a sort time.

I used read_to_end before, but I cannot use it because the compression should be done in a streamed manner (rwf2/Rocket#550).

I think that your read example is a form of doing read_to_end using read, so it would not be suitable for my case.

The problem is not reading less bytes because my test example has less than 100 bytes. I wonder if after reading all bytes is necessary to call a function to finish the gzip or something similar.

Do you guess what I'm doing wrong?

@Martin1887
Copy link
Author

I have compared the results of using read and read_to_end, and these are the results.

Using read with a buffer of 100 bytes, the result is:

[31, 139, 8, 0, 0, 0, 0, 0, 0, 255, 1, 12, 0, 243, 255, 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Using read_to_end the result is:

[31, 139, 8, 0, 0, 0, 0, 0, 0, 255, 1, 12, 0, 243, 255, 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33, 149, 25, 133, 27, 12, 0, 0, 0]

The difference between the results are the last bytes 149, 25, 133, 27, 12, missing in the read result.

Maybe this finding can be helpful.

@vitalyd
Copy link

vitalyd commented Jun 3, 2018

For streaming, you should use flate2::write::GzEncoder; it has a finish() method (or try_finish()) that finalizes the compressed stream. Here is a playground example using an array, but you can write to any W: Write, which presumably is a socket in your (Rocket) case.

@Martin1887
Copy link
Author

Ok. I had started using flate2::write::GzEncoder, but I was not using it correctly.

Thanks.

@denizs
Copy link

denizs commented Jul 21, 2018

@vitalyd - How would you proceed to read out only the bytes written to the Cursors inner buffer? My use case involves payloads of varying size so that I'm initializing an oversized buffer, leaving me with unnecessary 0 values, which I would like to slice out to keep my footprint as small as possible.

@denizs
Copy link

denizs commented Jul 21, 2018

I solved this by switching to the bufread implementation.

@milesgranger
Copy link

What an interesting ride. I've been using a few other de/compression libs, and they always output the total bytes read into the buffer, so when I ran into this issue; when using a buffer with read::GzEncoder it doesn't output the final 8 bytes for checksum and length. (I guess this is just the way it should be, you all seem far smarter than me. :-) )

Maybe I was being too obtuse, (it is late), but write::GzEncoder wouldn't work, because my use case was also overallocating a buffer then needed to see how many bytes were written to it; not how many bytes were written from the input as write::GzEncoder::write does.

Anyway, maybe this deserves a laugh? I got my desired output... 😅 Is there a better way?

// compute checksum
let mut crc = Crc::new();
crc.update(&data);

// Encode
let mut encoder = GzEncoder::new(data, Compression::new(level));
let n_bytes = encoder.read(slice)?;

// insert checksum as bytes into output
let mut checksum_bytes = crc.sum().to_le_bytes();
slice[n_bytes..n_bytes + 4].swap_with_slice(&mut checksum_bytes);

// insert data len as bytes into output
let mut data_len_bytes = (data.len() as u32).to_le_bytes();
slice[n_bytes + 4..n_bytes + 8].swap_with_slice(&mut data_len_bytes);

// Ka-pow, total bytes affected output
Ok(n_bytes + checksum_bytes.len() + data_len_bytes.len())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants