-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read::read_to_end
time is quadratic for input over 8KB
#45851
Comments
Sorry, the description here is incorrect. Since the |
I measured Running target/release/deps/read_to_end_bench-9122fdf004b0166c
running 24 tests
test a_read_128::vec_new ... bench: 1,380 ns/iter (+/- 49) = 92 MB/s
test a_read_128::vec_with_capacity ... bench: 1,416 ns/iter (+/- 107) = 90 MB/s
test b_read_512::vec_new ... bench: 1,690 ns/iter (+/- 78) = 302 MB/s
test b_read_512::vec_with_capacity ... bench: 1,736 ns/iter (+/- 52) = 294 MB/s
test c_read_2k::vec_new ... bench: 2,085 ns/iter (+/- 113) = 982 MB/s
test c_read_2k::vec_with_capacity ... bench: 2,015 ns/iter (+/- 72) = 1016 MB/s
test d_read_8k::vec_new ... bench: 2,778 ns/iter (+/- 131) = 2948 MB/s
test d_read_8k::vec_with_capacity ... bench: 2,516 ns/iter (+/- 89) = 3255 MB/s
test e_read_32k::vec_new ... bench: 5,107 ns/iter (+/- 103) = 6416 MB/s
test e_read_32k::vec_with_capacity ... bench: 4,404 ns/iter (+/- 81) = 7440 MB/s
test f_read_128k::vec_new ... bench: 16,232 ns/iter (+/- 106) = 8074 MB/s
test f_read_128k::vec_with_capacity ... bench: 12,223 ns/iter (+/- 103) = 10723 MB/s
test g_read_512k::vec_new ... bench: 39,179 ns/iter (+/- 210) = 13381 MB/s
test g_read_512k::vec_with_capacity ... bench: 31,704 ns/iter (+/- 98) = 16536 MB/s
test h_read_1m::vec_new ... bench: 463,119 ns/iter (+/- 2,354) = 2264 MB/s
test h_read_1m::vec_with_capacity ... bench: 251,983 ns/iter (+/- 3,072) = 4161 MB/s
test i_read_2m::vec_new ... bench: 668,742 ns/iter (+/- 8,317) = 3135 MB/s
test i_read_2m::vec_with_capacity ... bench: 383,879 ns/iter (+/- 1,269) = 5463 MB/s
test j_read_4m::vec_new ... bench: 1,188,553 ns/iter (+/- 13,409) = 3528 MB/s
test j_read_4m::vec_with_capacity ... bench: 857,714 ns/iter (+/- 8,254) = 4890 MB/s
test k_read_8m::vec_new ... bench: 2,302,201 ns/iter (+/- 15,746) = 3643 MB/s
test k_read_8m::vec_with_capacity ... bench: 1,947,089 ns/iter (+/- 10,942) = 4308 MB/s
test l_read_32m::vec_new ... bench: 8,990,230 ns/iter (+/- 18,718) = 3732 MB/s
test l_read_32m::vec_with_capacity ... bench: 8,648,669 ns/iter (+/- 23,157) = 3879 MB/s
test result: ok. 0 passed; 0 failed; 0 ignored; 24 measured; 0 filtered out #![feature(test)]
extern crate test;
extern crate tempdir;
use std::fs::File;
use std::io::{Write, Read};
use test::Bencher;
fn run<F>(bencher: &mut Bencher, size: usize, new_buffer: F)
where F: Fn(&File) -> Vec<u8>
{
let dir = tempdir::TempDir::new("bench").unwrap();
let path = dir.path().join("something");
File::create(&path).unwrap().write_all(&vec![42; size]).unwrap();
bencher.bytes = size as u64;
bencher.iter(|| {
let mut file = File::open(&path).unwrap();
let mut buffer = new_buffer(&file);
file.read_to_end(&mut buffer).unwrap();
})
}
macro_rules! sizes {
($( $name: ident $size: expr )+) => {
$(
mod $name {
use super::*;
#[bench]
fn vec_new(bencher: &mut Bencher) {
run(bencher, $size, |_| Vec::new())
}
#[bench]
fn vec_with_capacity(bencher: &mut Bencher) {
run(bencher, $size, |file| {
Vec::with_capacity(file.metadata().unwrap().len() as usize)
})
}
}
)+
}
}
sizes! {
a_read_128 128
b_read_512 512
c_read_2k 2 * 1024
d_read_8k 8 * 1024
e_read_32k 32 * 1024
f_read_128k 128 * 1024
g_read_512k 512 * 1024
h_read_1m 1024 * 1024
i_read_2m 2 * 1024 * 1024
j_read_4m 4 * 1024 * 1024
k_read_8m 8 * 1024 * 1024
l_read_32m 32 * 1024 * 1024
} |
PR with a prototype: #45928 |
The default implementation of
read_to_end
grows its buffer by at mostDEFAULT_BUF_SIZE
bytes (8KB) at a time. When the input is large, this takes O(n²) time due to repeated reallocation and copying. Callingread_to_end
on a 100MB file with an empty buffer will resize and copy the buffer over 10,000 times.When the input size is known ahead of time, callers can avoid this problem by passing in a pre-allocated buffer. It would be nice if this was not necessary in cases where the standard library could do this automatically.
read_to_end
could be specialized to pre-reserve space for types that implementSeek
.File
that have other ways to determine the input size could provide their own implementations ofread_to_end
.This still leaves potentially quadratic behavior for cases where the input size can't be determined up front. This could be fixed using exponential growth, at the cost of wasting more memory due to unused buffer space.
The text was updated successfully, but these errors were encountered: