Poor performance on Connection.recv() with large values of bufsiz. #577

Lukasa · 2016-11-27T15:44:38Z

Discovered while investigating kennethreitz/requests#3729.

When allocating a buffer using CFFI's new method (as Connection.recv does via _ffi.new("char[]", bufsiz), CFFI kindly zeroes that buffer for us. That means this call has a performance cost that is linear in bufsiz. This can lead to nasty negative behaviour if really large values of bufsiz are used, e.g. 1GB.

We don't need a zeroed buffer for this: SSL_read returns a length, so we know exactly how many bytes to copy out. Unfortunately though, CFFI doesn't provide any method to obtain a non-zeroed buffer, at least not as far as I can see.

Instead, a coping method might be to silently clamp the maximum value of bufsiz to the return value of self._socket.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF). This is the size of the socket receive buffer, and therefore represents the maximum amount of data the kernel can possibly return in one recv call. On my OS X machine, this is ~129 kB by default. An alternative option is to clamp to 16kB, the maximum TLS record size, or to clamp to the nearest multiple of 16kB near SO_RCVBUF. Either of these will greatly shrink the allocations. Finally, we can just pick an arbitrary value: a few hundred KB is probably safe.

I should note, however, that this coping method still imposes a substantial cost when downloading data that is not incurred by the standard library's ssl module. Specifically, CFFI will require that we zero at least as many bytes as the size of the download. This is all wasted effort, and for multi-gigabyte files represents really quite a substantial waste of time. It would be best to do both of these things: to shrink the size of our allocations and to try to avoid zeroing data when we don't need to.

The text was updated successfully, but these errors were encountered:

Lukasa · 2016-11-27T20:02:18Z

So, Armin has pointed out that this actually already exists: ffi.new_allocator(should_clear_after_alloc=False)("char[]", bufsiz) should be a solution to the zeroing out of memory. I'm going to investigate it now.

Lukasa · 2016-11-27T20:03:37Z

Excellently, that does work. I'll start working on a PR for this tomorrow.

Lukasa changed the title ~~Poor performance on Connection.recv() with large values of amt.~~ Poor performance on Connection.recv() with large values of bufsiz. Nov 27, 2016

pdknsk mentioned this issue Nov 27, 2016

iter_content slow with large chunk size on HTTPS connection psf/requests#3729

Closed

Lukasa mentioned this issue Nov 27, 2016

Don't zero memory when we don't have to. #578

Merged

hynek closed this as completed in #578 Nov 28, 2016

github-actions bot locked as resolved and limited conversation to collaborators Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance on Connection.recv() with large values of bufsiz. #577

Poor performance on Connection.recv() with large values of bufsiz. #577

Lukasa commented Nov 27, 2016

Lukasa commented Nov 27, 2016 •

edited

Loading

Lukasa commented Nov 27, 2016

Poor performance on Connection.recv() with large values of bufsiz. #577

Poor performance on Connection.recv() with large values of bufsiz. #577

Comments

Lukasa commented Nov 27, 2016

Lukasa commented Nov 27, 2016 • edited Loading

Lukasa commented Nov 27, 2016

Lukasa commented Nov 27, 2016 •

edited

Loading