Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: New HPC-GAP guard checking without using ward #2845

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

rbehrends
Copy link
Contributor

@rbehrends rbehrends commented Sep 20, 2018

This is the start of the work on getting rid of Ward as a dependency.

To try this code out, first install unward from /~https://github.com/rbehrends/unward; we assume that it is stored in a directory called $UNWARD. To build it, do:

cd $UNWARD
./configure
make

After that, go to the GAP directory and build that:

./configure --enable-hpcgap
$UNWARD/bin/unward --inplace src
make

Fixes #2565

@rbehrends rbehrends added do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) do not review PRs which are not yet ready for a proper external review (e.g. only submitted for test results) topic: HPC-GAP Issues and PRs related to HPC-GAP labels Sep 20, 2018
@rbehrends rbehrends changed the title WIP: Initial work on tracking spurious guard errors with unward. WIP: Eliminate Ward as a dependency. Sep 20, 2018
@rbehrends rbehrends force-pushed the hpcgap-noward branch 4 times, most recently from e97e1dd to 2869ca9 Compare September 20, 2018 09:25
@codecov
Copy link

codecov bot commented Sep 20, 2018

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.93%. Comparing base (9723019) to head (8ade01f).
Report is 1562 commits behind head on master.

Files with missing lines Patch % Lines
lib/helpview.gd 80.00% 1 Missing ⚠️
lib/meatauto.gi 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2845      +/-   ##
==========================================
+ Coverage   93.44%   93.93%   +0.48%     
==========================================
  Files         716      667      -49     
  Lines      812058   787729   -24329     
==========================================
- Hits       758860   739933   -18927     
+ Misses      53198    47796    -5402     
Files with missing lines Coverage Δ
lib/helpview.gi 37.39% <ø> (-0.54%) ⬇️
lib/package.gi 74.30% <ø> (-0.05%) ⬇️
src/gasman.h 80.43% <ø> (+1.26%) ⬆️
src/gvars.c 90.25% <ø> (+2.80%) ⬆️
src/lists.c 82.59% <ø> (+0.60%) ⬆️
src/objects.c 88.50% <ø> (+2.43%) ⬆️
src/objects.h 78.87% <100.00%> (+3.15%) ⬆️
src/opers.cc 95.99% <ø> (+2.43%) ⬆️
lib/helpview.gd 85.71% <80.00%> (-14.29%) ⬇️
lib/meatauto.gi 95.18% <0.00%> (-0.11%) ⬇️

... and 186 files with indirect coverage changes

@rbehrends rbehrends force-pushed the hpcgap-noward branch 2 times, most recently from 22eade5 to 9c78401 Compare September 20, 2018 11:48
@fingolfin
Copy link
Member

This PR does not actually change the optimization level to -O3, wasn't that expected to be required for optimal performance? Where the benchmark numbers to mentioned earlier done with the default -O2, or with -O3 (perhaps manually inserted into the build system)?

lib/filter.g Outdated Show resolved Hide resolved
@rbehrends rbehrends force-pushed the hpcgap-noward branch 2 times, most recently from fa474ef to e482ed5 Compare September 20, 2018 16:53
@rbehrends
Copy link
Contributor Author

For now, I'm leaving the optimization level at -O2, as -O3 doesn't give me a significant speedup and slows down compilation. This has to be benchmarked more extensively, of course.

@fingolfin fingolfin added the gapdays2018-fall Issues and PRs that arose at https://www.gapdays.de/gapdays2018-fall label Sep 20, 2018
@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Sep 21, 2018

make for unward produces:

g++ -g -O2 -DADLIB_DEBUG=0  -I. -c -o build/analyzer.o src/analyzer.cc
make: *** No rule to make target 'src/args.cc', needed by 'build/args.o'. Stop.

EDIT: re2c is a hard requirement (if anyone else hits this)

@rbehrends
Copy link
Contributor Author

rbehrends commented Sep 21, 2018

@ChrisJefferson Sorry, I did forget to mention re2c. It's possible to package unward with the generated C++ sources (and the Makefile should work in that case, too), but I hadn't made such a package yet.

Edit: I've included the files in unward and it now also builds on system without re2c.

@rbehrends
Copy link
Contributor Author

The current status is that HPC-GAP should compile and run up to the prompt (and it doesn't blow up for at least the basic operations).

As an additional featured, if you build with -DDEBUG_GUARDS, then the GAP variable GUARD_ERROR_STACK will contain a C stack backtrace if your system supports the backtrace() functionality. This can make it easier to troubleshoot problematic guard errors without having to look at them in the debugger.

@markuspf
Copy link
Member

So, with the instructions above one still has to have ward installed and the source is processed through ward; is this intentional/needed?

@ChrisJefferson
Copy link
Contributor

I don't want to slow things down, but at some point it would be good to have an explanation on where and why UNSAFE calls are put -- is it for correctness or efficiency (or both?)

@fingolfin
Copy link
Member

@rbehrends I am surprised to read that regarding -O2 vs -O3, as in previous discussions, you emphasized that -O3 was necessary in order for the pure functions optimization trick to be fully effective. So now you are saying this is not necessary after all (or perhaps the other way around: even -O3 is not enough to optimize the guards use in e.g. loops sufficiently)?

@markuspf I think this is because this PR simply contains no build system changes at all. As far as I understood, the ward support could/should simply be removed completely.

@ChrisJefferson no worries re "slowing down" things; this is a prototype, and it's only here because I badgered @rbehrends into putting it here. I hope he'll find time to finish it, but since he has many other obligations, I wanted to make sure that at least his partial work is available, so that we can see and discuss the general ideas at least, and so that in the worst case, somebody else can pick up the work...

lib/coll.gd Outdated Show resolved Hide resolved
src/gasman.h Outdated Show resolved Hide resolved
src/gasman.h Outdated Show resolved Hide resolved
src/listfunc.c Outdated Show resolved Hide resolved
@fingolfin fingolfin changed the title WIP: Eliminate Ward as a dependency. WIP: New HPC-GAP guard checking without using ward Sep 25, 2018
@fingolfin
Copy link
Member

I just submitted PR #2870 which removes use of ward from the build system, and also changed the title of this PR, as it does not actually do anything about ward.

As @rbehrends pointed out to me (and which I verified separately), in master it turns out that ward is now emitting zero guards. So there is really no point in keeping it.

@fingolfin
Copy link
Member

I had a lengthy comment here but my laptop ran out of power and crashed :/. But roughly it said that I think several WARD_ENABLED sections cover too much, resp. the code unward produces needs to be audited: e.g. many string accesses in integer.c and src/hpc/aobjects should not really be made UNSAFE.

Anyway, @rbehrends FYI I also pushed two commits here which constify more stuff. More could/should be done on that front.

@ChrisJefferson
Copy link
Contributor

I the CONST bits were pulled out, I'd be happy to merge them now (well, assuming packages compile and stuff, obviously).

src/hpc/guards.h Outdated Show resolved Hide resolved
src/hpc/guards.h Outdated Show resolved Hide resolved
These existed to assist Ward's data flow analysis and serve no purpose
without Ward.
src/lists.c Outdated
@@ -146,7 +146,6 @@ static Obj AttrLENGTH(Obj self, Obj list)
/* internal list types */
#ifdef HPCGAP
ReadGuard(list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ReadGuard needed, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be removed, too, now I look at it. It used to be paired with an ImpliedWriteGuard() in order to inform data flow analysis in Ward. On its own, it doesn't serve a purpose anymore (and given how the code has changed, probably hasn't for a while).

src/objects.h Outdated Show resolved Hide resolved
src/objects.h Outdated
#ifdef HPCGAP
MEMBAR_WRITE();
// Only require a read guard
((Obj *) CONST_ADDR_OBJ(obj))[0] = val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that in general the code which access the type of pos/com/dat objects is messy, and in big parts due to inconsistent HPC-GAP changes. Can we please try to resolve those? Note that contrary to claims I heard in the past, many T_DATOBJS are not readonly in practice, and they can change their type (even though they might rarely do so). There certainly are no rules anywhere that forbid this, and esp. not in any place where anybody would have a chance to learn about them. There are also no measures in places to prevent this (changing types and write access to T_DATOBJ is in fact explicitly supported by kernel APIs). While we might want to rethink that in the future, for now we need to deal with things how they are in reality. And I just don't understand how we can justify dealing with T_DATOBJ differently than with T_COMOBJ and T_POSOBJ (despite having asked about this many times, and getting answers to it many times, I still don't get it, and I still don't see any kind of written documentation).

In particular:

  • Why do we have custom SET_TYPE_COMOBJ and SET_TYPE_POSOBJ but not a custom SET_TYPE_DATOBJ (and thus we do have UNSAFE_SET_TYPE_DATOBJ...) ? They should either all be the same, or else there should be a comment that justifies the difference
  • similar for the getters: for TYPE_COMOBJ and TYPE_POSOBJ there are UNSAFE_ counterparts, but not for TYPE_DATOBJ. Why? Well, partially because their callers are different:
#ifndef WARD_ENABLED
static Obj TypePosObj(Obj obj)
{
    Obj result = UNSAFE_TYPE_POSOBJ( obj );
#ifdef HPCGAP
    MEMBAR_READ();
#endif
    return result;
}
#endif
...
static Obj TypeDatObj(Obj obj)
{
    return TYPE_DATOBJ( obj );
}

but why are these different? Again, these functions should either be aligned, or their differences be documented.

BTW this is of course not new with this PR, but rather has been an annoying problem with HPC-GAP for many years now. But @rbehrends is the unique person who might be able to address these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the path forward here is indeed to handle T_DATOBJ consistently with the rest rather than preserving the existing implementation. I'll look into how to best to clean this up.

src/gasman.h Outdated
@@ -293,18 +298,71 @@ EXPORT_INLINE UInt SIZE_BAG_CONTENTS(const void *ptr) {
** the application must inform {\Gasman} that it has changed the bag, by
** calling 'CHANGED_BAG(old)' in the above example (see "CHANGED_BAG").
*/
#ifdef HPCGAP
EXPORT_INLINE int ReadCheck(Bag bag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping?

#ifdef USE_HPC_GUARDS
if (!WriteCheck(bag))
HandleWriteGuardError(bag);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I never understood: Why don't we need some kind of read/write guard in BAG_HEADER as well? Right now, is there anything that would stop two threads from using ResizeBag/RetypeBag concurrently, possibly leading in a garbage output?

(This thought just motivated me to submit PR #3884).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are dealing with two cases here. One is TNUM_OBJ(), which we don't want to burden with additional guard checks, as it's used all the time. Changes are generally either harmless or in the rare cases where they do matter, handled explicitly. The biggest unpleasantness here is KTNumPlist(), which has an explicit guard check for this reason.

You are correct that with resizing read-only objects, we run into a problem. I believe the code has assumed that the functions that resize objects (such as GROW_PLIST() would fail naturally when changing the length word in the object). However, it looks like sometimes the change happens rather late.

I'd still simply add a write guard to ResizeBag(), where it causes minimal overhead, rather than making all accesses to the header more expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I don't want to burden TNUM_BAG, SIZE_BAG, TEST_OBJ_FLAGS with guards; but they can be written to, and so at least RetypeBage / ResizeBag and perhaps also the SET/CLEAR_OBJ_FLAGS functions should have guards... I always wondered why they didn't.

src/gasman.h Outdated
@@ -293,18 +298,71 @@ EXPORT_INLINE UInt SIZE_BAG_CONTENTS(const void *ptr) {
** the application must inform {\Gasman} that it has changed the bag, by
** calling 'CHANGED_BAG(old)' in the above example (see "CHANGED_BAG").
*/
#ifdef HPCGAP
EXPORT_INLINE int ReadCheck(Bag bag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked myself, by looking at EqRange, which is rather simple and ideally only would need to ReadGuard calls, but naively requires six. And sadly, what I am seeing are the full six calls. Here is disassembly in plain GAP with your PR:

$ otool -p _EqRange -x -V ./gap
./gap:
(__TEXT,__text) section
_EqRange:
0000000100107940	pushq	%rbp
0000000100107941	movq	%rsp, %rbp
0000000100107944	movq	(%rdi), %rax
0000000100107947	movq	(%rsi), %rcx
000000010010794a	movq	(%rcx), %rdx
000000010010794d	xorq	(%rax), %rdx
0000000100107950	cmpq	$0x3, %rdx
0000000100107954	ja	0x100107977
0000000100107956	movq	0x8(%rcx), %rdx
000000010010795a	xorq	0x8(%rax), %rdx
000000010010795e	cmpq	$0x3, %rdx
0000000100107962	ja	0x10010797b
0000000100107964	movq	0x10(%rcx), %rcx
0000000100107968	xorq	0x10(%rax), %rcx
000000010010796c	xorl	%eax, %eax
000000010010796e	cmpq	$0x4, %rcx
0000000100107972	setb	%al
0000000100107975	popq	%rbp
0000000100107976	retq
0000000100107977	xorl	%eax, %eax
0000000100107979	popq	%rbp
000000010010797a	retq
000000010010797b	xorl	%eax, %eax
000000010010797d	popq	%rbp
000000010010797e	retq
000000010010797f	nop

And here is what I get in HPC-GAP with your PR:

$ otool -p _EqRange -x -V ./gap
./gap:
(__TEXT,__text) section
_EqRange:
0000000100156730	pushq	%rbp
0000000100156731	movq	%rsp, %rbp
0000000100156734	pushq	%r15
0000000100156736	pushq	%r14
0000000100156738	pushq	%rbx
0000000100156739	pushq	%rax
000000010015673a	movq	%rsi, %r14
000000010015673d	movq	%rdi, %r15
0000000100156740	movq	0x8(%rdi), %rax
0000000100156744	testq	%rax, %rax
0000000100156747	je	0x10015676b
0000000100156749	movq	%rbp, %rcx
000000010015674c	andq	$-0x100000, %rcx
0000000100156753	cmpq	%rcx, 0x28(%rax)
0000000100156757	je	0x10015676b
0000000100156759	movslq	(%rcx), %rcx
000000010015675c	cmpb	$0x0, 0x50(%rax,%rcx)
0000000100156761	jne	0x10015676b
0000000100156763	movq	%r15, %rdi
0000000100156766	callq	_HandleReadGuardError
000000010015676b	movq	(%r15), %rax
000000010015676e	movq	(%rax), %rbx
0000000100156771	movq	0x8(%r14), %rax
0000000100156775	testq	%rax, %rax
0000000100156778	je	0x10015679c
000000010015677a	movq	%rbp, %rcx
000000010015677d	andq	$-0x100000, %rcx
0000000100156784	cmpq	%rcx, 0x28(%rax)
0000000100156788	je	0x10015679c
000000010015678a	movslq	(%rcx), %rcx
000000010015678d	cmpb	$0x0, 0x50(%rax,%rcx)
0000000100156792	jne	0x10015679c
0000000100156794	movq	%r14, %rdi
0000000100156797	callq	_HandleReadGuardError
000000010015679c	movq	(%r14), %rax
000000010015679f	xorq	(%rax), %rbx
00000001001567a2	cmpq	$0x3, %rbx
00000001001567a6	ja	0x100156885
00000001001567ac	movq	0x8(%r15), %rax
00000001001567b0	testq	%rax, %rax
00000001001567b3	je	0x1001567d7
00000001001567b5	movq	%rbp, %rcx
00000001001567b8	andq	$-0x100000, %rcx
00000001001567bf	cmpq	%rcx, 0x28(%rax)
00000001001567c3	je	0x1001567d7
00000001001567c5	movslq	(%rcx), %rcx
00000001001567c8	cmpb	$0x0, 0x50(%rax,%rcx)
00000001001567cd	jne	0x1001567d7
00000001001567cf	movq	%r15, %rdi
00000001001567d2	callq	_HandleReadGuardError
00000001001567d7	movq	(%r15), %rax
00000001001567da	movq	0x8(%rax), %rbx
00000001001567de	movq	0x8(%r14), %rax
00000001001567e2	testq	%rax, %rax
00000001001567e5	je	0x100156809
00000001001567e7	movq	%rbp, %rcx
00000001001567ea	andq	$-0x100000, %rcx
00000001001567f1	cmpq	%rcx, 0x28(%rax)
00000001001567f5	je	0x100156809
00000001001567f7	movslq	(%rcx), %rcx
00000001001567fa	cmpb	$0x0, 0x50(%rax,%rcx)
00000001001567ff	jne	0x100156809
0000000100156801	movq	%r14, %rdi
0000000100156804	callq	_HandleReadGuardError
0000000100156809	movq	(%r14), %rax
000000010015680c	xorq	0x8(%rax), %rbx
0000000100156810	cmpq	$0x3, %rbx
0000000100156814	ja	0x100156885
0000000100156816	movq	0x8(%r15), %rax
000000010015681a	testq	%rax, %rax
000000010015681d	je	0x100156841
000000010015681f	movq	%rbp, %rcx
0000000100156822	andq	$-0x100000, %rcx
0000000100156829	cmpq	%rcx, 0x28(%rax)
000000010015682d	je	0x100156841
000000010015682f	movslq	(%rcx), %rcx
0000000100156832	cmpb	$0x0, 0x50(%rax,%rcx)
0000000100156837	jne	0x100156841
0000000100156839	movq	%r15, %rdi
000000010015683c	callq	_HandleReadGuardError
0000000100156841	movq	(%r15), %rax
0000000100156844	movq	0x10(%rax), %rbx
0000000100156848	movq	0x8(%r14), %rax
000000010015684c	testq	%rax, %rax
000000010015684f	je	0x100156873
0000000100156851	movq	%rbp, %rcx
0000000100156854	andq	$-0x100000, %rcx
000000010015685b	cmpq	%rcx, 0x28(%rax)
000000010015685f	je	0x100156873
0000000100156861	movslq	(%rcx), %rcx
0000000100156864	cmpb	$0x0, 0x50(%rax,%rcx)
0000000100156869	jne	0x100156873
000000010015686b	movq	%r14, %rdi
000000010015686e	callq	_HandleReadGuardError
0000000100156873	movq	(%r14), %rax
0000000100156876	xorq	0x10(%rax), %rbx
000000010015687a	xorl	%eax, %eax
000000010015687c	cmpq	$0x4, %rbx
0000000100156880	setb	%al
0000000100156883	jmp	0x100156887
0000000100156885	xorl	%eax, %eax
0000000100156887	addq	$0x8, %rsp
000000010015688b	popq	%rbx
000000010015688c	popq	%r14
000000010015688e	popq	%r15
0000000100156890	popq	%rbp
0000000100156891	retq
0000000100156892	nopw	%cs:(%rax,%rax)
000000010015689c	nopl	(%rax)

Of course that particular function could (should?) easily be optimized, in fact it could simply call memcmp. I only chose it because it's easy to see with a glance what is going on, even if one is not familiar with assembly.

@rbehrends
Copy link
Contributor Author

As for the code size problem, I realize now where this is coming from. I had erroneously assumed that when I saw no differences in performance for removing the "pure" attribute trick that the optimizer had managed to handle the cases regardless. However, it seems it doesn't do that, because it has to handle the case where the error handling functions can return and it has to assume that they may change arbitrary memory.

I propose a cleaner solution than using that hack again, however. I'll split the HandleReadGuardError() and HandleWriteGuardError() in two functions, one of which is actually pure (and marked as such) and contains the extra checks, while the actual error handler is properly marked with the noreturn attribute. This allows the optimizer to treat the checks as common subexpressions or to hoist them out of loops. Importantly, they use the function attributes accurately.

@@ -456,6 +456,7 @@ UInt ResizeBag(Bag bag, UInt new_size)
CollectBags(0, 0);
#endif

WriteGuard(bag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't RetypeBag need this as well?

Also, shouldn't it be inside an #ifdef HPCGAP?

Copy link
Contributor Author

@rbehrends rbehrends Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this push went through prematurely. I've been experimenting with various checks on the header functions. ResizeBag() seems to not cause any problems; but other functions — such as SIZE_BAG() — do when you add guards to them. This seems to be mostly related to some internal kernel objects, for reasons that I don't fully understand yet.

If I add checks to BAG_HEADER() directly, this seems to cause an overhead on the order of 10%, possibly a bit less (with clang).

@rbehrends rbehrends force-pushed the hpcgap-noward branch 2 times, most recently from a91c64b to 3af3664 Compare February 10, 2020 10:21
The recog package assigns to SMTX.InvariantBilinearForm in c6.gi;
however, for HPCGAP, this record is made read-only. We change it
to an atomic record so that we don't get a write guard error.
This permits packages to add entries after initialization, while
still allowing access from multiple threads.
In order to make PackageInfo records accessible from other threads,
they need to be immutable, readonly, or atomic. Previously, they
were made immutable, but that broke packages that altered them after
initialization. The new workaround makes the package record itself
atomic and its contents immutable.

See issue gap-system#2568 for further discussion.
@fingolfin fingolfin modified the milestones: GAP 4.12.0, GAP 4.13.0 Feb 23, 2021
@fingolfin fingolfin removed this from the GAP 4.13.0 milestone Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) do not review PRs which are not yet ready for a proper external review (e.g. only submitted for test results) gapdays2018-fall Issues and PRs that arose at https://www.gapdays.de/gapdays2018-fall topic: HPC-GAP Issues and PRs related to HPC-GAP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HPC-GAP: access to plist IntFF in src/finfield.c is not thread safe
7 participants