Optionally support parking_lot within global allocators. #305

sunfishcode · 2021-12-07T00:53:04Z

This adds a feature to support using parking_lot locks in
#[global_allocator] implementations. The main hazard is parking_lot calling
into the global allocator, and then the global allocator calling back into
parking_lot, at a point where parking_lot is not prepared to be re-entered on
the same thread.

There are currently two places where this comes up:

When a new thread is created, parking_lot needs to allocate space in its
hashtable for the new thread. It uses the global allocator to allocate this
space, and if the global allocator uses parking lot, it doesn't work because
the hashtable is not ready for the new thread yet.
If the new hashtable is allocated while bucket locks are held, and the
global allocator attempts to acquire a parking lot lock, it deadlocks
because all the buckets are locked.

For the first, this adds a new reserve function, defined when the "reserve"
feature is enabled. This allocates space in the hashtable, and if users call
it before each thread creation in the program, parking_lot won't ever need to
call grow_hashtable on new threads. This is admittedly awkward to guarantee
in general, however it's not a problem in my own use case. And this function
might also be useful in cases where users want to create a pool of threads all
at once, to reduce temporary allocations, leakage, and rehashing. However, I'm
open to other ideas here.

For the second, when "reserve" is enabled, create_hashtable will allocate the
new HashTable before locking all the bucket locks, to avoid deadlock if
allocating the HashTable attempts to take a lock.

Amanieu · 2021-12-07T01:30:46Z

core/src/parking_lot.rs

+        // For feature = "reserve", create the new table here, before we lock
+        // all the buckets, in case the allocator needs to acquire a lock.
+        #[cfg(feature = "reserve")]
+        let new_table = HashTable::new(num_threads, table);


This doesn't sound right: if the allocator attempts to take a lock at this point, won't it just trigger hashtable growth again? This would result in infinite recursion instead of a deadlock, which isn't much better.

If the user calls reserve() before all new threads, then grow_hashtable will only ever run on threads which already have enough hashtable space for themselves, so the allocator can take locks here.

Except that there's an issue with calling reserve before creating a thread: thread creation may fail, but there's no way to undo the reservation. Every time creating a thread fails, NUM_RESERVED will still be increased.

If you instead call reserve as the first action within a new thread then you run into the problem I described above where the allocator can cause infinite recursion.

I think the correct approach here is to do hashtable growth as the first thing in a thread, while using a thread-local bool to detect recursion and skip the growth if it happens within the allocator. This is fine because the hash table here uses chaining so its capacity is not limited to the number of buckets.

This adds a feature to support using parking_lot locks in `#[global_allocator]` implementations. The main hazard is parking_lot calling into the global allocator, and then the global allocator calling back into parking_lot, at a point where parking_lot is not prepared to be re-entered on the same thread. There are currently two places where this comes up: - When a new thread is created, parking_lot needs to allocate space in its hashtable for the new thread. It uses the global allocator to allocate this space, and if the global allocator uses parking lot, it doesn't work because the hashtable is not ready for the new thread yet. - If the new hashtable is allocated while bucket locks are held, and the global allocator attempts to acquire a parking lot lock, it deadlocks because all the buckets are locked. This defines a new "global_allocator_compat" feature. When defined: The hashtable growing code checks whether it's being reentered, and does no growth in that case. And, `create_hashtable` will allocate the new `HashTable` before locking all the bucket locks, to avoid deadlock if allocating the `HashTable` attempts to take a lock. And, a new function, `init`, is defined, for allocating the initial hashtable. This must be done before performing any syncronization.

sunfishcode · 2021-12-08T23:17:45Z

I've now implemented the thread-local bool recursion check, as you suggested. The main tricky part is that it doesn't work for creating the initial hash table. In that case we can't defer allocating, because the HASHTABLE pointer starts out null. I've fixed that for now by adding an init function, which allows the hash table to be initialized before any other locks are taken.

I looked into statically initializing the hashtable, it's doable doable, though it looks awkward because Bucket contains a Timestamp which is not statically initializable. Once something like Lazy is stablized, this might be less awkward to implement.

Amanieu

I think this is good enough as a hack for now, but I'd like to explore the possibility of a proper solution.

I'm somewhat concerned about what happens if an allocator ends up re-entering itself from unexpected places, which could cause a deadlock. Consider the case where an allocator needs to acquire 2 locks and triggers hashtable growth when trying to acquire the second one.

If global allocators have to explicitly support parking_lot, could we simply expose a critical section API which inhibits hashtable growth for the duration of a call?

For example:

fn alloc(...) {
    parking_lot_core::inhibit_allocations(|| {
        // actual allocator implementation
    })
}

core/src/parking_lot.rs

sunfishcode · 2021-12-11T01:00:24Z

If global allocators have to explicitly support parking_lot, could we simply expose a critical section API which inhibits hashtable growth for the duration of a call?

For example:
fn alloc(...) {
    parking_lot_core::inhibit_allocations(|| {
        // actual allocator implementation
    })
}

Interesting. I'll need to experiment with that.

As another option, what would you think of exposing WordLock as a public API? I haven't studied the implementation here, but the webkit blog about it suggests its main downside is just the size of the lock type, and most allocators only use a small fixed number of locks. Then the solution could just be that allocators should use WordLock instead of RawMutex and friends.

Amanieu · 2021-12-11T02:10:16Z

I haven't studied the implementation here, but the webkit blog about it suggests its main downside is just the size of the lock type, and most allocators only use a small fixed number of locks.

Another downside is that WordLock can't be used with a CondVar, but that's not going to be an issue in practice.

As another option, what would you think of exposing WordLock as a public API?

I am certainly open to the idea! Perhaps it should be renamed to something else before being re-exported? After all the important part isn't that it fits in a word, it's that it doesn't require any dynamic allocations.

Amanieu · 2021-12-11T02:11:18Z

cc @m-ou-se since solving this issue properly is an important step towards replacing the standard library locks with parking lot.

sunfishcode · 2022-01-03T16:05:51Z

One wrinkle with exporting WordLock is that it doesn't currently have any fairness mechanism. That may be ok for parking_lot's internal needs, but it's less ok for general-purpose use.

And a concern with porting parking_lot's fairness code to WorkLock is that the fairness code depends on std::time::Instant::now, which on some platforms needs to acquire a lock (to ensure monotonicity). If parking_lot is to replace the standard library locks, that would create another circular dependency.

Amanieu · 2022-01-12T21:33:16Z

I spent some time thinking about this. The only way I can see this working is for parking_lot to use a custom allocator which is guaranteed to not call back int parking_lot. A function pointer to an allocator function could be provided via an API during early initialization.

The actual allocator could be implemented in several ways:

Without using any locks by directly calling mmap.
Using only libc locks such as pthread_mutex_t.
Using only WordLock.

Regarding std::time::Instant::now, the lock is going to be removed soon so we don't need to worry about it.

quininer · 2022-01-27T04:01:13Z

core/src/parking_lot.rs

+        // letting the outer call perform the grow.
+        #[cfg(feature = "global_allocator_compat")]
+        let grow = {
+            thread_local!(static ACTIVE: Cell<bool> = Cell::new(false));


This seems unreliable, if the target platform uses pthread tls, there will be multiple memory operations inside std.

see /~https://github.com/rust-lang/rust/blob/master/library/std/src/thread/local.rs#L740 /~https://github.com/rust-lang/rust/blob/master/library/std/src/sys_common/thread_local_dtor.rs#L19

sunfishcode · 2022-05-23T17:12:54Z

After rust-lang/rust#93740 and related developments, I myself no longer have a need for this PR.

Amanieu reviewed Dec 7, 2021

View reviewed changes

sunfishcode force-pushed the reserve branch from ca57261 to e7e94cd Compare December 8, 2021 22:25

Minor code simplification.

0afa01d

Amanieu reviewed Dec 10, 2021

View reviewed changes

core/src/parking_lot.rs Outdated Show resolved Hide resolved

Use Cell<bool> instead of AtomicBool.

b9e0b54

quininer reviewed Jan 27, 2022

View reviewed changes

m-ou-se mentioned this pull request Feb 7, 2022

Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} rust-lang/rust#93740

Closed

63 tasks

cameronelliott mentioned this pull request May 11, 2022

Exposing RawMutex and RawRwLock in the public API? #68

Closed

sunfishcode closed this May 23, 2022

sunfishcode deleted the reserve branch June 22, 2022 23:52

Amanieu mentioned this pull request Oct 28, 2022

Apparent bug/hang in Mutex #361

Closed

cstrahan-blueshift mentioned this pull request Oct 28, 2022

parking_lot's Mutex allocates, causing deadlock nnethercote/dhat-rs#25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally support parking_lot within global allocators. #305

Optionally support parking_lot within global allocators. #305

sunfishcode commented Dec 7, 2021

Amanieu Dec 7, 2021

sunfishcode Dec 7, 2021

Amanieu Dec 8, 2021

sunfishcode commented Dec 8, 2021

Amanieu left a comment

sunfishcode commented Dec 11, 2021

Amanieu commented Dec 11, 2021

Amanieu commented Dec 11, 2021

sunfishcode commented Jan 3, 2022

Amanieu commented Jan 12, 2022

quininer Jan 27, 2022

sunfishcode commented May 23, 2022

Optionally support parking_lot within global allocators. #305

Optionally support parking_lot within global allocators. #305

Conversation

sunfishcode commented Dec 7, 2021

Amanieu Dec 7, 2021

Choose a reason for hiding this comment

sunfishcode Dec 7, 2021

Choose a reason for hiding this comment

Amanieu Dec 8, 2021

Choose a reason for hiding this comment

sunfishcode commented Dec 8, 2021

Amanieu left a comment

Choose a reason for hiding this comment

sunfishcode commented Dec 11, 2021

Amanieu commented Dec 11, 2021

Amanieu commented Dec 11, 2021

sunfishcode commented Jan 3, 2022

Amanieu commented Jan 12, 2022

quininer Jan 27, 2022

Choose a reason for hiding this comment

sunfishcode commented May 23, 2022