Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_file: Remove use of spin-locks #125

Merged
merged 3 commits into from
Jan 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 96 additions & 32 deletions src/use_file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,11 @@
// except according to those terms.

//! Implementations that just need to read from a file
use crate::util_libc::{last_os_error, open_readonly, sys_fill_exact, LazyFd};
use crate::util::LazyUsize;
use crate::util_libc::{open_readonly, sys_fill_exact};
use crate::Error;
use core::cell::UnsafeCell;
use core::sync::atomic::{AtomicUsize, Ordering::Relaxed};

#[cfg(target_os = "redox")]
const FILE_PATH: &str = "rand:\0";
Expand All @@ -21,10 +24,11 @@ const FILE_PATH: &str = "rand:\0";
target_os = "illumos"
))]
const FILE_PATH: &str = "/dev/random\0";
#[cfg(any(target_os = "android", target_os = "linux"))]
const FILE_PATH: &str = "/dev/urandom\0";

pub fn getrandom_inner(dest: &mut [u8]) -> Result<(), Error> {
static FD: LazyFd = LazyFd::new();
let fd = FD.init(init_file).ok_or_else(last_os_error)?;
let fd = get_rng_fd()?;
let read = |buf: &mut [u8]| unsafe { libc::read(fd, buf.as_mut_ptr() as *mut _, buf.len()) };

if cfg!(target_os = "emscripten") {
Expand All @@ -38,36 +42,96 @@ pub fn getrandom_inner(dest: &mut [u8]) -> Result<(), Error> {
Ok(())
}

cfg_if! {
if #[cfg(any(target_os = "android", target_os = "linux"))] {
fn init_file() -> Option<libc::c_int> {
// Poll /dev/random to make sure it is ok to read from /dev/urandom.
let mut pfd = libc::pollfd {
fd: unsafe { open_readonly("/dev/random\0")? },
events: libc::POLLIN,
revents: 0,
};

let ret = loop {
// A negative timeout means an infinite timeout.
let res = unsafe { libc::poll(&mut pfd, 1, -1) };
if res == 1 {
break unsafe { open_readonly("/dev/urandom\0") };
} else if res < 0 {
let e = last_os_error().raw_os_error();
if e == Some(libc::EINTR) || e == Some(libc::EAGAIN) {
continue;
}
}
// We either hard failed, or poll() returned the wrong pfd.
break None;
};
unsafe { libc::close(pfd.fd) };
ret
// Returns the file descriptor for the device file used to retrieve random
// bytes. The file will be opened exactly once. All successful calls will
// return the same file descriptor. This file descriptor is never closed.
fn get_rng_fd() -> Result<libc::c_int, Error> {
static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often this needs to happen, or whether this happening is worth optimizing, but a different way to implement this is to do something like this to keep the common path branchless: /~https://github.com/calebzulawski/multiversion/pull/6/files#diff-0a878480aac95b54dd822d02c4ad345eR76

cc @TethysSvensson - it might be a thing worth attempting in a subsequent PR after an approach that works "correctly" gets merged.

Copy link

@TethysSvensson TethysSvensson Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg The reason why that trick works in multiversion is, that the value we are trying to lazily compute is already a function pointer. This means that the best-case performance is always going to involve at least 1 indirect function call. The trick makes sure that the common path does not pay anything in addition to that 1 indirect function.

Here it looks like you are trying to lazily compute a file descriptor. As far as I can tell, the current cost in the common path is a single conditional branch on a relaxed load. This appears to me to be cheaper than alternative cost of 1 indirect function call on a relaxed load.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have one suggestion though, assuming that you would ever want to inline get_rng_fd. I think in that case, you would want to inline the initial check, while keeping the slow initializer un-inlined. I am imagining something like this:

#[inline]
fn get_rng_fd() -> Result<libc::c_int, Error> {
    static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT);

    if let Some(fd) = get_fd() {
        Ok(fd)
    } else {
        initialize_slow()
    }

    #[inline(always)]
    fn get_fd() -> Option<libc::c_int> { ... }
    #[inline(always)]
    fn initialize_slow() -> Result<libc::c_int, Error> { ... }
}

However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.

Copy link
Member Author

@josephlr josephlr Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.

This is correct, the only external function of this crate is getrandom::getrandom which is not marked #[inline], so I don't think marking internal functions #[inline] really does anything.

In fact it looks like (in release mode) the compiler just inlines everything in this file into use_file::getrandom_inner. I checked with the current nightly build.

EDIT: here's the ASM when compiled with --release.

fn get_fd() -> Option<libc::c_int> {
match FD.load(Relaxed) {
LazyUsize::UNINIT => None,
val => Some(val as libc::c_int),
}
} else {
fn init_file() -> Option<libc::c_int> {
unsafe { open_readonly(FILE_PATH) }
}

// Use double-checked locking to avoid acquiring the lock if possible.
if let Some(fd) = get_fd() {
return Ok(fd);
}

// SAFETY: We use the mutex only in this method, and we always unlock it
// before returning, making sure we don't violate the pthread_mutex_t API.
static MUTEX: Mutex = Mutex::new();
unsafe { MUTEX.lock() };
let _guard = DropGuard(|| unsafe { MUTEX.unlock() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that destructor will run right before function exit (be it return or panic)? For some reason I thought that with NLL drop place can change depending on liveness analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLL (and its future extensions w/ Polonious or what not) is just for Lifetimes. Drop is still scope based, so the drop method is always executed when the object leaves scope. As the guard is at function scope, it drops on return or on panic unwinding.

The liveness analysis is just used to determine if a set of borrows are permitted. This is why (modulo compiler bugs) NLL was introduced without it being a breaking change.

Docs: https://doc.rust-lang.org/book/ch15-03-drop.html


if let Some(fd) = get_fd() {
return Ok(fd);
}

// On Linux, /dev/urandom might return insecure values.
#[cfg(any(target_os = "android", target_os = "linux"))]
wait_until_rng_ready()?;

let fd = unsafe { open_readonly(FILE_PATH)? };
// The fd always fits in a usize without conflicting with UNINIT.
debug_assert!(fd >= 0 && (fd as usize) < LazyUsize::UNINIT);
FD.store(fd as usize, Relaxed);

Ok(fd)
}

// Succeeds once /dev/urandom is safe to read from
#[cfg(any(target_os = "android", target_os = "linux"))]
fn wait_until_rng_ready() -> Result<(), Error> {
// Poll /dev/random to make sure it is ok to read from /dev/urandom.
let fd = unsafe { open_readonly("/dev/random\0")? };
let mut pfd = libc::pollfd {
fd,
events: libc::POLLIN,
revents: 0,
};
let _guard = DropGuard(|| unsafe {
libc::close(fd);
});

loop {
// A negative timeout means an infinite timeout.
let res = unsafe { libc::poll(&mut pfd, 1, -1) };
if res >= 0 {
assert_eq!(res, 1); // We only used one fd, and cannot timeout.
return Ok(());
}
let err = crate::util_libc::last_os_error();
match err.raw_os_error() {
Some(libc::EINTR) | Some(libc::EAGAIN) => continue,
_ => return Err(err),
}
}
}

struct Mutex(UnsafeCell<libc::pthread_mutex_t>);

impl Mutex {
const fn new() -> Self {
Self(UnsafeCell::new(libc::PTHREAD_MUTEX_INITIALIZER))
}
unsafe fn lock(&self) {
let r = libc::pthread_mutex_lock(self.0.get());
debug_assert_eq!(r, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not check the return value in all builds? (Also unlock.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdlib also uses debug_assert, and man page specifically says that lock/unlock basically can't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. Closer reading of the possible error values shows that none should be applicable in this case, and if any were that would be a code error, so okay. (Assuming EAGAIN is specific to recursive locks.)

}
unsafe fn unlock(&self) {
let r = libc::pthread_mutex_unlock(self.0.get());
debug_assert_eq!(r, 0);
}
}

unsafe impl Sync for Mutex {}

struct DropGuard<F: FnMut()>(F);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this isn't a part of the core library!

Copy link

@matklad matklad Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often times, this does not work in higher level code, because closure borrows the environment, and so you can't modify it. We only able to use it because we close over low-level file descriptor, which is Copy.

To be clear, it is an oftentimes useful utility, but it is not as useful in practice as it could seem.

EDIT: you can, of course, make a guard smart-pointer with DerefMut (https://docs.rs/scopeguard/1.0.0/scopeguard/#scope-guard-with-value), but that's a more complex API with some design choices.


impl<F: FnMut()> Drop for DropGuard<F> {
fn drop(&mut self) {
self.0()
}
}
32 changes: 0 additions & 32 deletions src/util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ impl LazyUsize {

// The initialization is not completed.
pub const UNINIT: usize = usize::max_value();
// The initialization is currently running.
pub const ACTIVE: usize = usize::max_value() - 1;

// Runs the init() function at least once, returning the value of some run
// of init(). Multiple callers can run their init() functions in parallel.
Expand All @@ -50,36 +48,6 @@ impl LazyUsize {
}
val
}

// Synchronously runs the init() function. Only one caller will have their
// init() function running at a time, and exactly one successful call will
// be run. init() returning UNINIT or ACTIVE will be considered a failure,
// and future calls to sync_init will rerun their init() function.
pub fn sync_init(&self, init: impl FnOnce() -> usize, mut wait: impl FnMut()) -> usize {
// Common and fast path with no contention. Don't wast time on CAS.
match self.0.load(Relaxed) {
Self::UNINIT | Self::ACTIVE => {}
val => return val,
}
// Relaxed ordering is fine, as we only have a single atomic variable.
loop {
match self.0.compare_and_swap(Self::UNINIT, Self::ACTIVE, Relaxed) {
Self::UNINIT => {
let val = init();
self.0.store(
match val {
Self::UNINIT | Self::ACTIVE => Self::UNINIT,
val => val,
},
Relaxed,
);
return val;
}
Self::ACTIVE => wait(),
val => return val,
}
}
}
}

// Identical to LazyUsize except with bool instead of usize.
Expand Down
39 changes: 4 additions & 35 deletions src/util_libc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,37 +89,6 @@ impl Weak {
}
}

pub struct LazyFd(LazyUsize);

impl LazyFd {
pub const fn new() -> Self {
Self(LazyUsize::new())
}

// If init() returns Some(x), x should be nonnegative.
pub fn init(&self, init: impl FnOnce() -> Option<libc::c_int>) -> Option<libc::c_int> {
let fd = self.0.sync_init(
|| match init() {
// OK as val >= 0 and val <= c_int::MAX < usize::MAX
Some(val) => val as usize,
None => LazyUsize::UNINIT,
},
|| unsafe {
// We are usually waiting on an open(2) syscall to complete,
// which typically takes < 10us if the file is a device.
// However, we might end up waiting much longer if the entropy
// pool isn't initialized, but even in that case, this loop will
// consume a negligible amount of CPU on most platforms.
libc::usleep(10);
},
);
match fd {
LazyUsize::UNINIT => None,
val => Some(val as libc::c_int),
}
}
}

cfg_if! {
if #[cfg(any(target_os = "linux", target_os = "emscripten"))] {
use libc::open64 as open;
Expand All @@ -129,15 +98,15 @@ cfg_if! {
}

// SAFETY: path must be null terminated, FD must be manually closed.
pub unsafe fn open_readonly(path: &str) -> Option<libc::c_int> {
pub unsafe fn open_readonly(path: &str) -> Result<libc::c_int, Error> {
debug_assert!(path.as_bytes().last() == Some(&0));
let fd = open(path.as_ptr() as *mut _, libc::O_RDONLY | libc::O_CLOEXEC);
let fd = open(path.as_ptr() as *const _, libc::O_RDONLY | libc::O_CLOEXEC);
if fd < 0 {
return None;
return Err(last_os_error());
}
// O_CLOEXEC works on all Unix targets except for older Linux kernels (pre
// 2.6.23), so we also use an ioctl to make sure FD_CLOEXEC is set.
#[cfg(target_os = "linux")]
libc::ioctl(fd, libc::FIOCLEX);
Some(fd)
Ok(fd)
}