-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement emutls inside compiler_rt.zig #7577
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I started implementing this but then life got in the way.
We need some way to force the emulated TLS on for some other platforms and/or get a OpenBSD CI machine to make sure this works (and keeps working) as intended.
|
||
/// Retrieve the pointer at request index, using control to initialize it if needed. | ||
pub inline fn get_pointer(self: *ObjectArray, index: usize, control: *emutls_control) !ObjectPointer { | ||
if (self.slots[index] == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using and abusing orelse
like this
return self.slots[index] orelse block: {
// init code here...
break :block xxx;
};
var mutex: std.c.pthread_mutex_t = std.c.PTHREAD_MUTEX_INITIALIZER; | ||
|
||
/// Return a next free index, incrementing global counter. | ||
pub fn new() usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just need to pull a fresh index I'd turn this into a single LOCK+GET+INCREMENT+UNLOCK sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand the exact context of the remark.
if it is just about the new()
method, it is already done under locking (i will rename it to new_locked()
to be more explicit).
if it is about the global index managment (getIndex()
), grabing a full mutex would be too inefficient: the function is called at every threadvar access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is just about the new() method, it is already done under locking (i will rename it to new_locked() to be more explicit).
Instead of requiring the caller to lock, call new
then unlock the mutex my suggestion was to replace it with a single method like this:
fn next_index() usize {
mutex.lock();
defer mutex.unlock();
const old = index;
index += 1;
return index;
}
After all you only need the mutex to make sure each index is unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the general usage of the index is: "get the index but if 0 initialize it with the next global value". with the fact that several threads could race to read/set the index.
the simple method could be:
- lock globally all threads for the global counter (it will also serialize access on control index)
- check the control index
- if 0, get the next index (and incr the global value)
- return the index (and unlock)
it is would involve taking a lock for every access of the index. lock couldn't be taken per variable, as emutls_control
doesn't have structure to hold it.
the approch taken by LLVM emutls.c (and so by me too), is to use atomic reading lockless (so with possible races). in the general case (index already initialized) there is no problem and access is quick. for first access, it needs locking (and race checking as two threads could have read 0
index value and want to initialize it). so it can't be a all-in-one function for getting the next index.
eventually I could merge Index.new()
inside control.get_index()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind then, if multiple threads can race to initialize the same emutls_control
this makes sense.
Packing mutex
with the index is misleading, one is tempted to assume its only task is to protect the global counter while here it serves as a synchronization point for every thread/TLS object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reorganized the code. the global variable for the index has been put toplevel. and the mutex is a global variable inside emutls_control
and lcok()
and unlock()
helpers.
and now I wonder if I should put the global variable global_index
inside emutls_control
too.
|
||
/// Invoked by pthread specific destructor. the passed argument is the ObjectArray pointer. | ||
fn deinit(arrayPtr: *c_void) callconv(.C) void { | ||
var array = @ptrCast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's about time this @ptrCast(*T, @alignCast(@alignOf(T), ptr)
pattern is turned into a nice helper in std.mem
.
And... are you sure @alignOf(*ObjectArray)
is not @alignOf(ObjectArray)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And... are you sure @Alignof(*ObjectArray) is not @Alignof(ObjectArray) ?
the passed argument is the same object set with pthread_setspecific()
. I passed a *ObjectArray
, so I should get a *ObjectArray
. Regarding the alignment, I am a bit unexperimented with them. I am casting to *ObjectArray
so I keep the same for the alignment. But I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This small snippet should help understand the subtle difference between the two:
const A = struct {
x: u32,
};
var x: A = undefined;
@compileLog(@typeInfo(@TypeOf(&x)).Pointer.alignment); // 4
@compileLog(@typeInfo(@TypeOf(@alignCast(@alignOf(A), &x))).Pointer.alignment); // 4
@compileLog(@typeInfo(@TypeOf(@alignCast(@alignOf(*A), &x))).Pointer.alignment); // 8
to force emulated TLS, it would need a new |
I can confirm this implementation works just fine on Linux w/ forced emulated TLS, thanks to this I discovered #7609. |
@andrewrk could you considere the PR to be merged ? (the freebsd CI failure isn't related) |
Yes I will have a look at this tomorrow! |
I folded few commits and rebased it on top of master. and maybe having a green CI and some activity might help to make progress on this. |
Nice work @semarie, thanks for your patience on the merge. |
return control.getPointer(); | ||
} | ||
|
||
/// Simple allocator interface, to avoid pulling in the while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: s/while/whole
This implementation follows LLVM compiler_rt emutls implementation (release 8.0 - MIT).
It is required for at least OpenBSD and Android (see #5921), but as I can't test for Android, so I don't have setup it for it.
There are still some
XXX
points which would need discussion:Allocator
interface doesn't permit that (only compile-know alignement), so the current code directly useposix_memalign()
(like LLVM emutls.c)emutls_control
struct is using data with special size:gcc_word
which is defined in C astypedef unsigned int gcc_word __attribute__((mode(word)))
. for now I just checked the value on OpenBSD x68_64 and use itRegarding the code itself, the entrypoint of emutls is
__emutls_get_address()
. The structureemutls_control
is part of the ABI (what GCC or LLVM will pass to__emutls_get_address()
for asking a memory pointer).Memory is automatically reclaimed when a thread is terminated (emutls memory and threadvar memory).