Understanding a HPC-GAP speed penalty #499

fingolfin · 2016-01-19T16:24:58Z

Consider this command: SylowSubgroup(SymmetricGroup(350),2);;

In HPC-GAP this takes 7.3 seconds for me, in classic GAP only 5.3 seconds -- so HPC-GAP is about 30% slower.

It would be nice to know why this is so.... And perhaps we can improve this a little bit?

UPDATE: The original example now just takes a few milliseconds on either system due to algorithmic improvements. But other examples exist. E.g. from PR #1326 (which includes more details):
g:=WreathProduct(MathieuGroup(9),Group((1,2)));; ConjugacyClassesSubgroups(g);;
For me that takes about 3.2 seconds in GAP, and 5.7 seconds in HPC-GAP.

The text was updated successfully, but these errors were encountered:

hulpke · 2016-01-19T16:44:38Z

This is really weird, because the Sylow subgroup is just written down -- there is no calculation, but just construction of permutations and of a group.

ChrisJefferson · 2016-01-19T17:11:18Z

I have tried profiling this (you can see the outputs at https://caj.host.cs.st-andrews.ac.uk/covs/gap-out/ - gap and https://caj.host.cs.st-andrews.ac.uk/covs/hpc-out/ - HPCgap). These were generated by:

ProfileLineByLine("~/profout.gz"); SylowSubgroup(SymmetricGroup(350),2);; UnprofileLineByLine(); (remember to call the two prof outputs different names!) then running
LoadPackage("profiling"); OutputAnnotatedCodeCoverageFiles("~/profout.gz", "~/temp/hpc-out"); (this uses profiling 0.4.0, released today).

In both cases the time is being spent in StabChainRandomPermGroup, which spends most of it's time in SCRSift. However, it is ImageInWord which seems much slower. Not sure why (and of course, these profiles could be misleading!)

rbehrends · 2016-01-19T18:23:11Z

About half the overhead is due to read- and writeguards. You can see this by compiling with make WARD= to disable Ward. It may well be that this involves code where Ward doesn't do a good job of hoisting checks out of loops (another reason why I think a rewrite of Ward matters, incidentally).

I'm really not sure where the other 15% come from right now, though. If I had to guess, I'd say it's a performance regression either in the lib or in the kernel, but even with Instruments, I can't find a hotspot in the kernel that looks particularly crowded.

stevelinton · 2016-01-20T12:23:14Z

It looks like the main culprit here is the number of permutation objects that are being created and immediately discarded in line 671 of stbcrand.gi. This throws up a few things:

This whole routine SCRSift is hot spot and should probably be recoded in C.
Adding a kernel routine to multiply a whole list of permutations without creating intermediates might help this and similar code by reducing waste bag creation, at the cost of making code less elegant
Quite often SCRSift is called in the for IsOne(SCRSift(...)) This can be speeded up (a) because there is no need to allocate a bag at all and (b) because as soon as we find one point not fixed by the sifted word we know the answer.

One problem is that the optimisations which are right for small base (avoid multiplying permutations at any cost) are not right in other settings.

Anyway I'll try a kernel version of SCRSift that multiplies a permutation in-place.

rbehrends · 2016-01-20T12:37:44Z

It looks like the main culprit here is the number of permutation objects that are being created and immediately discarded in line 671 of stbcrand.gi.

While that may contribute performance loss, I did not observe a whole lot of time being spent in the GC for this particular example. In fact, disabling the GC entirely did only improve performance by 3%-4% on its own (though this does not account for allocations being more expensive on balance). Most likely, it's a lot of little things coming together.

stevelinton · 2016-01-24T11:15:33Z

Kernel version of SCRSift in #525. That's set up as a patch to master but I see no reason why it shouldn't work in HPCGAP.

fingolfin added topic: HPC-GAP Issues and PRs related to HPC-GAP gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 labels Jan 19, 2016

stevelinton mentioned this issue Jan 24, 2016

Tell Sylow subgroups of natural An or Sn their size when we make them #529

Merged

fingolfin added the regression A bug that only occurs in the branch, not in a release label May 9, 2017

fingolfin added the topic: performance bugs or enhancements related to performance (improvements or regressions) label Jun 23, 2017

fingolfin added the kind: discussion discussions, questions, requests for comments, and so on label Mar 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding a HPC-GAP speed penalty #499

Understanding a HPC-GAP speed penalty #499

fingolfin commented Jan 19, 2016 •

edited

Loading

hulpke commented Jan 19, 2016

ChrisJefferson commented Jan 19, 2016

rbehrends commented Jan 19, 2016

stevelinton commented Jan 20, 2016

rbehrends commented Jan 20, 2016

stevelinton commented Jan 24, 2016

Understanding a HPC-GAP speed penalty #499

Understanding a HPC-GAP speed penalty #499

Comments

fingolfin commented Jan 19, 2016 • edited Loading

hulpke commented Jan 19, 2016

ChrisJefferson commented Jan 19, 2016

rbehrends commented Jan 19, 2016

stevelinton commented Jan 20, 2016

rbehrends commented Jan 20, 2016

stevelinton commented Jan 24, 2016

fingolfin commented Jan 19, 2016 •

edited

Loading