-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: New HPC-GAP guard checking without using ward #2845
base: master
Are you sure you want to change the base?
Conversation
e97e1dd
to
2869ca9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2845 +/- ##
==========================================
+ Coverage 93.44% 93.93% +0.48%
==========================================
Files 716 667 -49
Lines 812058 787729 -24329
==========================================
- Hits 758860 739933 -18927
+ Misses 53198 47796 -5402
|
22eade5
to
9c78401
Compare
This PR does not actually change the optimization level to -O3, wasn't that expected to be required for optimal performance? Where the benchmark numbers to mentioned earlier done with the default -O2, or with -O3 (perhaps manually inserted into the build system)? |
fa474ef
to
e482ed5
Compare
For now, I'm leaving the optimization level at -O2, as -O3 doesn't give me a significant speedup and slows down compilation. This has to be benchmarked more extensively, of course. |
make for unward produces:
EDIT: re2c is a hard requirement (if anyone else hits this) |
@ChrisJefferson Sorry, I did forget to mention re2c. It's possible to package unward with the generated C++ sources (and the Makefile should work in that case, too), but I hadn't made such a package yet. Edit: I've included the files in unward and it now also builds on system without re2c. |
The current status is that HPC-GAP should compile and run up to the prompt (and it doesn't blow up for at least the basic operations). As an additional featured, if you build with |
So, with the instructions above one still has to have ward installed and the source is processed through ward; is this intentional/needed? |
I don't want to slow things down, but at some point it would be good to have an explanation on where and why UNSAFE calls are put -- is it for correctness or efficiency (or both?) |
@rbehrends I am surprised to read that regarding -O2 vs -O3, as in previous discussions, you emphasized that -O3 was necessary in order for the @markuspf I think this is because this PR simply contains no build system changes at all. As far as I understood, the @ChrisJefferson no worries re "slowing down" things; this is a prototype, and it's only here because I badgered @rbehrends into putting it here. I hope he'll find time to finish it, but since he has many other obligations, I wanted to make sure that at least his partial work is available, so that we can see and discuss the general ideas at least, and so that in the worst case, somebody else can pick up the work... |
90ee28f
to
4e40031
Compare
I just submitted PR #2870 which removes use of ward from the build system, and also changed the title of this PR, as it does not actually do anything about ward. As @rbehrends pointed out to me (and which I verified separately), in |
I had a lengthy comment here but my laptop ran out of power and crashed :/. But roughly it said that I think several Anyway, @rbehrends FYI I also pushed two commits here which constify more stuff. More could/should be done on that front. |
I the CONST bits were pulled out, I'd be happy to merge them now (well, assuming packages compile and stuff, obviously). |
These existed to assist Ward's data flow analysis and serve no purpose without Ward.
3b944e5
to
bbff21a
Compare
src/lists.c
Outdated
@@ -146,7 +146,6 @@ static Obj AttrLENGTH(Obj self, Obj list) | |||
/* internal list types */ | |||
#ifdef HPCGAP | |||
ReadGuard(list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ReadGuard
needed, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can be removed, too, now I look at it. It used to be paired with an ImpliedWriteGuard()
in order to inform data flow analysis in Ward. On its own, it doesn't serve a purpose anymore (and given how the code has changed, probably hasn't for a while).
src/objects.h
Outdated
#ifdef HPCGAP | ||
MEMBAR_WRITE(); | ||
// Only require a read guard | ||
((Obj *) CONST_ADDR_OBJ(obj))[0] = val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that in general the code which access the type of pos/com/dat objects is messy, and in big parts due to inconsistent HPC-GAP changes. Can we please try to resolve those? Note that contrary to claims I heard in the past, many T_DATOBJS are not readonly in practice, and they can change their type (even though they might rarely do so). There certainly are no rules anywhere that forbid this, and esp. not in any place where anybody would have a chance to learn about them. There are also no measures in places to prevent this (changing types and write access to T_DATOBJ is in fact explicitly supported by kernel APIs). While we might want to rethink that in the future, for now we need to deal with things how they are in reality. And I just don't understand how we can justify dealing with T_DATOBJ differently than with T_COMOBJ and T_POSOBJ (despite having asked about this many times, and getting answers to it many times, I still don't get it, and I still don't see any kind of written documentation).
In particular:
- Why do we have custom
SET_TYPE_COMOBJ
andSET_TYPE_POSOBJ
but not a customSET_TYPE_DATOBJ
(and thus we do haveUNSAFE_SET_TYPE_DATOBJ
...) ? They should either all be the same, or else there should be a comment that justifies the difference - similar for the getters: for
TYPE_COMOBJ
andTYPE_POSOBJ
there areUNSAFE_
counterparts, but not forTYPE_DATOBJ
. Why? Well, partially because their callers are different:
#ifndef WARD_ENABLED
static Obj TypePosObj(Obj obj)
{
Obj result = UNSAFE_TYPE_POSOBJ( obj );
#ifdef HPCGAP
MEMBAR_READ();
#endif
return result;
}
#endif
...
static Obj TypeDatObj(Obj obj)
{
return TYPE_DATOBJ( obj );
}
but why are these different? Again, these functions should either be aligned, or their differences be documented.
BTW this is of course not new with this PR, but rather has been an annoying problem with HPC-GAP for many years now. But @rbehrends is the unique person who might be able to address these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the path forward here is indeed to handle T_DATOBJ
consistently with the rest rather than preserving the existing implementation. I'll look into how to best to clean this up.
src/gasman.h
Outdated
@@ -293,18 +298,71 @@ EXPORT_INLINE UInt SIZE_BAG_CONTENTS(const void *ptr) { | |||
** the application must inform {\Gasman} that it has changed the bag, by | |||
** calling 'CHANGED_BAG(old)' in the above example (see "CHANGED_BAG"). | |||
*/ | |||
#ifdef HPCGAP | |||
EXPORT_INLINE int ReadCheck(Bag bag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping?
#ifdef USE_HPC_GUARDS | ||
if (!WriteCheck(bag)) | ||
HandleWriteGuardError(bag); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I never understood: Why don't we need some kind of read/write guard in BAG_HEADER
as well? Right now, is there anything that would stop two threads from using ResizeBag/RetypeBag concurrently, possibly leading in a garbage output?
(This thought just motivated me to submit PR #3884).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are dealing with two cases here. One is TNUM_OBJ()
, which we don't want to burden with additional guard checks, as it's used all the time. Changes are generally either harmless or in the rare cases where they do matter, handled explicitly. The biggest unpleasantness here is KTNumPlist()
, which has an explicit guard check for this reason.
You are correct that with resizing read-only objects, we run into a problem. I believe the code has assumed that the functions that resize objects (such as GROW_PLIST()
would fail naturally when changing the length word in the object). However, it looks like sometimes the change happens rather late.
I'd still simply add a write guard to ResizeBag()
, where it causes minimal overhead, rather than making all accesses to the header more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I don't want to burden TNUM_BAG, SIZE_BAG, TEST_OBJ_FLAGS with guards; but they can be written to, and so at least RetypeBage / ResizeBag and perhaps also the SET/CLEAR_OBJ_FLAGS functions should have guards... I always wondered why they didn't.
src/gasman.h
Outdated
@@ -293,18 +298,71 @@ EXPORT_INLINE UInt SIZE_BAG_CONTENTS(const void *ptr) { | |||
** the application must inform {\Gasman} that it has changed the bag, by | |||
** calling 'CHANGED_BAG(old)' in the above example (see "CHANGED_BAG"). | |||
*/ | |||
#ifdef HPCGAP | |||
EXPORT_INLINE int ReadCheck(Bag bag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked myself, by looking at EqRange
, which is rather simple and ideally only would need to ReadGuard
calls, but naively requires six. And sadly, what I am seeing are the full six calls. Here is disassembly in plain GAP with your PR:
$ otool -p _EqRange -x -V ./gap
./gap:
(__TEXT,__text) section
_EqRange:
0000000100107940 pushq %rbp
0000000100107941 movq %rsp, %rbp
0000000100107944 movq (%rdi), %rax
0000000100107947 movq (%rsi), %rcx
000000010010794a movq (%rcx), %rdx
000000010010794d xorq (%rax), %rdx
0000000100107950 cmpq $0x3, %rdx
0000000100107954 ja 0x100107977
0000000100107956 movq 0x8(%rcx), %rdx
000000010010795a xorq 0x8(%rax), %rdx
000000010010795e cmpq $0x3, %rdx
0000000100107962 ja 0x10010797b
0000000100107964 movq 0x10(%rcx), %rcx
0000000100107968 xorq 0x10(%rax), %rcx
000000010010796c xorl %eax, %eax
000000010010796e cmpq $0x4, %rcx
0000000100107972 setb %al
0000000100107975 popq %rbp
0000000100107976 retq
0000000100107977 xorl %eax, %eax
0000000100107979 popq %rbp
000000010010797a retq
000000010010797b xorl %eax, %eax
000000010010797d popq %rbp
000000010010797e retq
000000010010797f nop
And here is what I get in HPC-GAP with your PR:
$ otool -p _EqRange -x -V ./gap
./gap:
(__TEXT,__text) section
_EqRange:
0000000100156730 pushq %rbp
0000000100156731 movq %rsp, %rbp
0000000100156734 pushq %r15
0000000100156736 pushq %r14
0000000100156738 pushq %rbx
0000000100156739 pushq %rax
000000010015673a movq %rsi, %r14
000000010015673d movq %rdi, %r15
0000000100156740 movq 0x8(%rdi), %rax
0000000100156744 testq %rax, %rax
0000000100156747 je 0x10015676b
0000000100156749 movq %rbp, %rcx
000000010015674c andq $-0x100000, %rcx
0000000100156753 cmpq %rcx, 0x28(%rax)
0000000100156757 je 0x10015676b
0000000100156759 movslq (%rcx), %rcx
000000010015675c cmpb $0x0, 0x50(%rax,%rcx)
0000000100156761 jne 0x10015676b
0000000100156763 movq %r15, %rdi
0000000100156766 callq _HandleReadGuardError
000000010015676b movq (%r15), %rax
000000010015676e movq (%rax), %rbx
0000000100156771 movq 0x8(%r14), %rax
0000000100156775 testq %rax, %rax
0000000100156778 je 0x10015679c
000000010015677a movq %rbp, %rcx
000000010015677d andq $-0x100000, %rcx
0000000100156784 cmpq %rcx, 0x28(%rax)
0000000100156788 je 0x10015679c
000000010015678a movslq (%rcx), %rcx
000000010015678d cmpb $0x0, 0x50(%rax,%rcx)
0000000100156792 jne 0x10015679c
0000000100156794 movq %r14, %rdi
0000000100156797 callq _HandleReadGuardError
000000010015679c movq (%r14), %rax
000000010015679f xorq (%rax), %rbx
00000001001567a2 cmpq $0x3, %rbx
00000001001567a6 ja 0x100156885
00000001001567ac movq 0x8(%r15), %rax
00000001001567b0 testq %rax, %rax
00000001001567b3 je 0x1001567d7
00000001001567b5 movq %rbp, %rcx
00000001001567b8 andq $-0x100000, %rcx
00000001001567bf cmpq %rcx, 0x28(%rax)
00000001001567c3 je 0x1001567d7
00000001001567c5 movslq (%rcx), %rcx
00000001001567c8 cmpb $0x0, 0x50(%rax,%rcx)
00000001001567cd jne 0x1001567d7
00000001001567cf movq %r15, %rdi
00000001001567d2 callq _HandleReadGuardError
00000001001567d7 movq (%r15), %rax
00000001001567da movq 0x8(%rax), %rbx
00000001001567de movq 0x8(%r14), %rax
00000001001567e2 testq %rax, %rax
00000001001567e5 je 0x100156809
00000001001567e7 movq %rbp, %rcx
00000001001567ea andq $-0x100000, %rcx
00000001001567f1 cmpq %rcx, 0x28(%rax)
00000001001567f5 je 0x100156809
00000001001567f7 movslq (%rcx), %rcx
00000001001567fa cmpb $0x0, 0x50(%rax,%rcx)
00000001001567ff jne 0x100156809
0000000100156801 movq %r14, %rdi
0000000100156804 callq _HandleReadGuardError
0000000100156809 movq (%r14), %rax
000000010015680c xorq 0x8(%rax), %rbx
0000000100156810 cmpq $0x3, %rbx
0000000100156814 ja 0x100156885
0000000100156816 movq 0x8(%r15), %rax
000000010015681a testq %rax, %rax
000000010015681d je 0x100156841
000000010015681f movq %rbp, %rcx
0000000100156822 andq $-0x100000, %rcx
0000000100156829 cmpq %rcx, 0x28(%rax)
000000010015682d je 0x100156841
000000010015682f movslq (%rcx), %rcx
0000000100156832 cmpb $0x0, 0x50(%rax,%rcx)
0000000100156837 jne 0x100156841
0000000100156839 movq %r15, %rdi
000000010015683c callq _HandleReadGuardError
0000000100156841 movq (%r15), %rax
0000000100156844 movq 0x10(%rax), %rbx
0000000100156848 movq 0x8(%r14), %rax
000000010015684c testq %rax, %rax
000000010015684f je 0x100156873
0000000100156851 movq %rbp, %rcx
0000000100156854 andq $-0x100000, %rcx
000000010015685b cmpq %rcx, 0x28(%rax)
000000010015685f je 0x100156873
0000000100156861 movslq (%rcx), %rcx
0000000100156864 cmpb $0x0, 0x50(%rax,%rcx)
0000000100156869 jne 0x100156873
000000010015686b movq %r14, %rdi
000000010015686e callq _HandleReadGuardError
0000000100156873 movq (%r14), %rax
0000000100156876 xorq 0x10(%rax), %rbx
000000010015687a xorl %eax, %eax
000000010015687c cmpq $0x4, %rbx
0000000100156880 setb %al
0000000100156883 jmp 0x100156887
0000000100156885 xorl %eax, %eax
0000000100156887 addq $0x8, %rsp
000000010015688b popq %rbx
000000010015688c popq %r14
000000010015688e popq %r15
0000000100156890 popq %rbp
0000000100156891 retq
0000000100156892 nopw %cs:(%rax,%rax)
000000010015689c nopl (%rax)
Of course that particular function could (should?) easily be optimized, in fact it could simply call memcmp
. I only chose it because it's easy to see with a glance what is going on, even if one is not familiar with assembly.
As for the code size problem, I realize now where this is coming from. I had erroneously assumed that when I saw no differences in performance for removing the "pure" attribute trick that the optimizer had managed to handle the cases regardless. However, it seems it doesn't do that, because it has to handle the case where the error handling functions can return and it has to assume that they may change arbitrary memory. I propose a cleaner solution than using that hack again, however. I'll split the |
This also adds some missing memory barriers.
@@ -456,6 +456,7 @@ UInt ResizeBag(Bag bag, UInt new_size) | |||
CollectBags(0, 0); | |||
#endif | |||
|
|||
WriteGuard(bag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't RetypeBag
need this as well?
Also, shouldn't it be inside an #ifdef HPCGAP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this push went through prematurely. I've been experimenting with various checks on the header functions. ResizeBag()
seems to not cause any problems; but other functions — such as SIZE_BAG()
— do when you add guards to them. This seems to be mostly related to some internal kernel objects, for reasons that I don't fully understand yet.
If I add checks to BAG_HEADER()
directly, this seems to cause an overhead on the order of 10%, possibly a bit less (with clang).
a91c64b
to
3af3664
Compare
The recog package assigns to SMTX.InvariantBilinearForm in c6.gi; however, for HPCGAP, this record is made read-only. We change it to an atomic record so that we don't get a write guard error.
3af3664
to
0bc8e5e
Compare
0bc8e5e
to
777b36c
Compare
This permits packages to add entries after initialization, while still allowing access from multiple threads.
In order to make PackageInfo records accessible from other threads, they need to be immutable, readonly, or atomic. Previously, they were made immutable, but that broke packages that altered them after initialization. The new workaround makes the package record itself atomic and its contents immutable. See issue gap-system#2568 for further discussion.
a66cb80
to
758a08c
Compare
690d95f
to
8ade01f
Compare
This is the start of the work on getting rid of Ward as a dependency.
To try this code out, first install
unward
from /~https://github.com/rbehrends/unward; we assume that it is stored in a directory called$UNWARD
. To build it, do:After that, go to the GAP directory and build that:
Fixes #2565