Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding a HPC-GAP speed penalty #499

Open
fingolfin opened this issue Jan 19, 2016 · 6 comments
Open

Understanding a HPC-GAP speed penalty #499

fingolfin opened this issue Jan 19, 2016 · 6 comments
Labels
gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 kind: discussion discussions, questions, requests for comments, and so on regression A bug that only occurs in the branch, not in a release topic: HPC-GAP Issues and PRs related to HPC-GAP topic: performance bugs or enhancements related to performance (improvements or regressions)

Comments

@fingolfin
Copy link
Member

fingolfin commented Jan 19, 2016

Consider this command: SylowSubgroup(SymmetricGroup(350),2);;

In HPC-GAP this takes 7.3 seconds for me, in classic GAP only 5.3 seconds -- so HPC-GAP is about 30% slower.

It would be nice to know why this is so.... And perhaps we can improve this a little bit?

UPDATE: The original example now just takes a few milliseconds on either system due to algorithmic improvements. But other examples exist. E.g. from PR #1326 (which includes more details):
g:=WreathProduct(MathieuGroup(9),Group((1,2)));; ConjugacyClassesSubgroups(g);;
For me that takes about 3.2 seconds in GAP, and 5.7 seconds in HPC-GAP.

@fingolfin fingolfin added topic: HPC-GAP Issues and PRs related to HPC-GAP gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 labels Jan 19, 2016
@hulpke
Copy link
Contributor

hulpke commented Jan 19, 2016

This is really weird, because the Sylow subgroup is just written down -- there is no calculation, but just construction of permutations and of a group.

@ChrisJefferson
Copy link
Contributor

I have tried profiling this (you can see the outputs at https://caj.host.cs.st-andrews.ac.uk/covs/gap-out/ - gap and https://caj.host.cs.st-andrews.ac.uk/covs/hpc-out/ - HPCgap). These were generated by:

ProfileLineByLine("~/profout.gz"); SylowSubgroup(SymmetricGroup(350),2);; UnprofileLineByLine(); (remember to call the two prof outputs different names!) then running
LoadPackage("profiling"); OutputAnnotatedCodeCoverageFiles("~/profout.gz", "~/temp/hpc-out"); (this uses profiling 0.4.0, released today).

In both cases the time is being spent in StabChainRandomPermGroup, which spends most of it's time in SCRSift. However, it is ImageInWord which seems much slower. Not sure why (and of course, these profiles could be misleading!)

@rbehrends
Copy link
Contributor

About half the overhead is due to read- and writeguards. You can see this by compiling with make WARD= to disable Ward. It may well be that this involves code where Ward doesn't do a good job of hoisting checks out of loops (another reason why I think a rewrite of Ward matters, incidentally).

I'm really not sure where the other 15% come from right now, though. If I had to guess, I'd say it's a performance regression either in the lib or in the kernel, but even with Instruments, I can't find a hotspot in the kernel that looks particularly crowded.

@stevelinton
Copy link
Contributor

It looks like the main culprit here is the number of permutation objects that are being created and immediately discarded in line 671 of stbcrand.gi. This throws up a few things:

  1. This whole routine SCRSift is hot spot and should probably be recoded in C.
  2. Adding a kernel routine to multiply a whole list of permutations without creating intermediates might help this and similar code by reducing waste bag creation, at the cost of making code less elegant
  3. Quite often SCRSift is called in the for IsOne(SCRSift(...)) This can be speeded up (a) because there is no need to allocate a bag at all and (b) because as soon as we find one point not fixed by the sifted word we know the answer.

One problem is that the optimisations which are right for small base (avoid multiplying permutations at any cost) are not right in other settings.

Anyway I'll try a kernel version of SCRSift that multiplies a permutation in-place.

@rbehrends
Copy link
Contributor

It looks like the main culprit here is the number of permutation objects that are being created and immediately discarded in line 671 of stbcrand.gi.

While that may contribute performance loss, I did not observe a whole lot of time being spent in the GC for this particular example. In fact, disabling the GC entirely did only improve performance by 3%-4% on its own (though this does not account for allocations being more expensive on balance). Most likely, it's a lot of little things coming together.

@stevelinton
Copy link
Contributor

Kernel version of SCRSift in #525. That's set up as a patch to master but I see no reason why it shouldn't work in HPCGAP.

@fingolfin fingolfin added the regression A bug that only occurs in the branch, not in a release label May 9, 2017
@fingolfin fingolfin added the topic: performance bugs or enhancements related to performance (improvements or regressions) label Jun 23, 2017
@fingolfin fingolfin added the kind: discussion discussions, questions, requests for comments, and so on label Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 kind: discussion discussions, questions, requests for comments, and so on regression A bug that only occurs in the branch, not in a release topic: HPC-GAP Issues and PRs related to HPC-GAP topic: performance bugs or enhancements related to performance (improvements or regressions)
Projects
None yet
Development

No branches or pull requests

5 participants