-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding a HPC-GAP speed penalty #499
Comments
This is really weird, because the Sylow subgroup is just written down -- there is no calculation, but just construction of permutations and of a group. |
I have tried profiling this (you can see the outputs at https://caj.host.cs.st-andrews.ac.uk/covs/gap-out/ - gap and https://caj.host.cs.st-andrews.ac.uk/covs/hpc-out/ - HPCgap). These were generated by:
In both cases the time is being spent in StabChainRandomPermGroup, which spends most of it's time in SCRSift. However, it is |
About half the overhead is due to read- and writeguards. You can see this by compiling with I'm really not sure where the other 15% come from right now, though. If I had to guess, I'd say it's a performance regression either in the lib or in the kernel, but even with Instruments, I can't find a hotspot in the kernel that looks particularly crowded. |
It looks like the main culprit here is the number of permutation objects that are being created and immediately discarded in line 671 of stbcrand.gi. This throws up a few things:
One problem is that the optimisations which are right for small base (avoid multiplying permutations at any cost) are not right in other settings. Anyway I'll try a kernel version of SCRSift that multiplies a permutation in-place. |
While that may contribute performance loss, I did not observe a whole lot of time being spent in the GC for this particular example. In fact, disabling the GC entirely did only improve performance by 3%-4% on its own (though this does not account for allocations being more expensive on balance). Most likely, it's a lot of little things coming together. |
Kernel version of SCRSift in #525. That's set up as a patch to master but I see no reason why it shouldn't work in HPCGAP. |
Consider this command:
SylowSubgroup(SymmetricGroup(350),2);;
In HPC-GAP this takes 7.3 seconds for me, in classic GAP only 5.3 seconds -- so HPC-GAP is about 30% slower.
It would be nice to know why this is so.... And perhaps we can improve this a little bit?
UPDATE: The original example now just takes a few milliseconds on either system due to algorithmic improvements. But other examples exist. E.g. from PR #1326 (which includes more details):
g:=WreathProduct(MathieuGroup(9),Group((1,2)));; ConjugacyClassesSubgroups(g);;
For me that takes about 3.2 seconds in GAP, and 5.7 seconds in HPC-GAP.
The text was updated successfully, but these errors were encountered: