-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems requiring heavy library usage might not be good for benchmarking. #220
Comments
Only when a program is using its standard library, FFI bindings are allowed (e.g. bigint in julia, crystal uses GMP under the hood). The third-party libraries I choose are all lang-native implementations (plz feel free to point out exceptions, I might have miss-read some of them)
bigint, http server/client, coroutines are the exceptions as it's impossible to implement these in ~100 LOC. For other cases 3rd-party libs are avoided as much as possible. e.g. for secp256k1, only 0.go uses cgo bindings to C libsecp256k1, and 0.rs uses pure rust implementation, all others are implemented with only bigint library.
Again, FFI bindings(e.g. crystal bigint) / assemblies (e.g. go bigint) are accepted here only when it's part of lang's stdlib. |
Please note that Common Lisp was standardized in 1994, therefore all new language extensions/libraries are external in that sense. Should it be frozen in its core implementation? I myself bag to differ. Programmers, engineers are developing tools to be reused as well to make our life less miserable. Library developers are developing optimized libraries for the same reason other reusable tools in any other field on engineering are developed. I would also add that it is one of the core responsibility of an engineer to use the best tool for a job and that includes the selection of language used for that specific problem and the selection of the libraries used for that specific problem. Not every language is optimal for every problem and if anyone feels that a language is at short in any way than just opt not to use it. Other people might still like it because it has other advantages apart from being the fastest running one. And how about a language that translates itself to C and uses the C native compiler. Should it be allowed to use only native or C libraries as well: In my humble opinion one of the site maintainer's core responsibly is to make sure that no code is cheating by reducing the problem space in any way and also giving clear instructions what is allowed and what is not. This site maintainer is definitely doing the first one very vigorously! (maybe even too vigorously:) |
I hadn't noticed that nearly all the third party libraries in use are also programmed in the same language as the program. That seems like a decent improvement over the Computer Language Benchmarks Game to see that third party libraries programmed in different languages generally are not being allowed. However it still seems unfair to have problems that will typically rely on a good library to achieve good performance, allow some programs to use good libraries programmed in a different language just as long as they are part of the standard libraries, and then disallow other programs from using those same libraries. This is a valid reason for people to question the legitimacy of the Computer Language Benchmarks Games and I'm hoping these benchmarks might still address that.
Shouldn't that 0.go program be removed since it's using a C library that isn't a part of Go's standard library? I can't see any reason why an exception to your rules should be allowed for that Go program. If that Go program is allowed to use a C library, it seems only fair that every other language should get to use C libraries. Lastly, I also just want to put a little more emphasis on how much the libraries can affect performance. Libraries can have vastly different performance from each other. For regex libraries, they could be implemented using NFAs, DFAs, backtracking, etc... and they could also make use of assembly code, JIT compilation, etc... For arbitrary precision math libraries, some will make use of assembly code and they can also use different algorithms for various operations (multiplication alone can be done using long multiplication, Karatsuba multiplication, Toom-Cook multiplication, or even Fourier transforms). For hash table libraries, there are a wide variety of hash functions that can be used and additionally there are also multiple ways to deals with collisions like chaining or open addressing and with open addressing you still have more variation with things like linear probing, quadratic probing, double hashing, etc... All these different choices can give people another valid reason to question the legitimacy of benchmarks which is why I think it's good idea to try to avoid heavy library usage. On a side note, I'm kind of wondering if it's a good idea to have benchmark problems that are likely to have programs use crypto libraries that have a legitimate concern about dealing with side-channel timing attacks? I was looking at some of the source code for libsecp256k1 and it makes a couple mentions about avoiding side-channel timing attacks (which often means intentionally making the code slower). This is just another example of how libraries can have different performance from other libraries, in this case because a library was designed to be more secure. |
I don't think my suggestion applies to what you mentioned. My suggestion is for trying to avoid having problems where the libraries do much of the work. I'm OK with allowing libraries to enable the use of primitive/basic type things (say like adding support for multithreading or SIMD) which may be important for performance but ultimately are not responsible for doing most of the work. What I'm not OK with is problems like the pidigits problem where often 90+% of the time is spent in arbitrary precision math libraries or the regex-redux problem where often 60+% of the time is spent in regular expression libraries. |
And if those arbitrarily precision math are part of the language and regex is written in the same language? To be frank I don’t get your point. Using languages without libraries are like building a house without tools. Though, I kind of agree that the absolute best would be to use libraries written in the same language (it would be quite mad to do it in python and ….), however, why should all the existing libraries be rewritten for all the languages? Each and every language has FFI to tap into existing libraries. Your job is to pick the one best suited for the job. Nevertheless, even after my last paragraph, I do agree that for this particular benchmark I tend to agree to only allow to use libraries written in the same language without exception. |
Even if the libraries are written in the same language, the libraries can still have very different performance from other libraries because one library might be using a much better algorithm (see my previous comment for examples of why this can happen). This project's website calls itself the Programming Language and compiler Benchmarks, not the the Programming, Language, Library, Algorithm, and compiler Benchmarks. These benchmarks are turning into library and algorithm benchmarks if libraries are doing most of the work instead of code made in the programming language.
Yes, in most cases doing programming without using libraries is a bad idea, but that's not what I'm arguing. What I'm arguing is that in this case it is a bad idea to be benchmarking programming languages and their implementations by making programs that make heavy usage of libraries that may possibly be written in different programming languages and using different algorithms. To continue with your analogy, let's say that you and I wanted to determine who is faster at digging a hole for a house's basement. I start using a bucket to move the dirt somewhere else. You take this a step further and use two buckets (kind of like using multithreading in a program) to move dirt even faster. Fair enough, you're still doing the work, just doing it more efficiently. Then you take things a step further and get a wheelbarrow (maybe kind of like using SIMD in a program) so you can move yet more dirt faster. OK, still fair enough because you're still doing the work but even more efficiently. I get fed up with this and then hire some large company to move the dirt for me (kind of like using a library in a program) and now we're going way faster than you. In this case could I now legitimately say that I'm faster than you or would it be more appropriate to say that the large company I hired is faster than you? I think it would be the latter because they were the ones that did most of the work. |
Let's make the topic more realistic, when ppl are making choices from many langs, do they take standard or most popular libraries into consideration. Let's say you're choosing a language for your new program which will heavily depend on bigint / regex, do you think pidigits / edigits / regex-redux numbers are helpful for you to make the decision or not? Do you agree that the performance of a language should be evaluated in a library-agnostic way or not? Besides these, there're also numbers for the same source code with different compilers in the same language, do you think it makes the comparison invalid by using libraries? At the end of the day, if you write your real-life program with those libraries, do the benchmarks of those libraries reflect an important aspect of the language's performance or not? |
How about if your program is using better algorithm and running much faster? That only means that you are a better programmer and the same applies if you are using a better library. Though as I mentioned I tend to agree in this case that the libraries should be written in the same language.
Different language can be questionable, but different algorithm? Just use the better one as a good engineer would do.
Were the conditions agreed upon in prior about the allowable use of “tools”? If yes and even subcontractors were allowed and you picked a better, in this case faster, one than what is the problem? One more note: If the library is written in the same language isn't that automatically implies that you benchmarking that particular language? Though I do agree that not all libraries are the same quality. But than you can ask the library developers to improve it or help them out by providing better/faster code. |
It depends. I know many Python and Java developers love how the languages include a lot of functionality in their standard libraries which usually allows for faster and easier development. On the other hand developers for high performance software (OS kernels, AAA 3D games, multimedia software, etc...) often won't care too much and they'll use whatever is the fastest thing available to them (or just make their own software if they can't find anything that meets their needs).
They'd probably be at least partially useful to a wise person that looks at the source code of the programs and the libraries that are being used. To a person that just looks at the benchmark results, there is potential that the results could lead to a bad decision. Case in point, do you know why regex-redux is called regex-REDUX? Basically it's because over two dozen of the original regex-DNA programs weren't following the CLBG (Computer Language Benchmark Games) rules about using regular expressions to do searching and instead were just doing plain text searching. See this thread where I brought up this issue several years ago. The regex-DNA benchmark was flawed and it would have been a bad idea to make a decision based on its results. Similarly, one of the PCRE developers made some benchmarks at https://zherczeg.github.io/sljit/regex_perf.html that shows RE2 struggling pretty badly when using a regular expression like "[a-q][^u-z]{13}x" but otherwise performing well. If the regex-redux problem was only using regular expressions like that, someone might think that RE2 is no good whereas better benchmarks like Zoltan's would show that it is actually quite good.
Yes, benchmarks of a programming language and its implementation should be done in a way that tries to avoid heavy influence from the quality of any libraries being used.
Again, it depends. If you're using the exact same compiled libraries with multiple programs, that should probably lead to the libraries performing the same for each program and any performance differences in the programs then being a result of the remaining non-library code. This should result in a valid comparison for seeing which program has the fastest non-library code. However if the goal is to compare the programming languages and their implementations and some libraries are doing ~99% of the program work (which is the case for some of the pidigits programs), then the comparison can be pretty useless (but still valid!) if the remaining ~1% of the program work is being done by actual code from that programming language and its implementation. Comparisons of programs can easily be invalidated. You can invalidate a comparison of two programs using the same program source code, with the same library source code, and all compiled by the same compiler if you just use different compiler options for the library for instance. I'm pretty sure this is already happening for the pidigits problem in this project and, at least for a while, the CLBG. Chapel is using its own GMP library (Edit: This is correct for Chapel's preferred configuration (see https://chapel-lang.org/docs/usingchapel/QUICKSTART.html#using-chapel-in-its-preferred-configuration) and probably correct for the CLBG but it turns out this project is using the chapel/chapel:latest docker image which sets the CHPL_GMP environment variable to "system" which should make Chapel use Debian's GMP library.) that is a little more optimized than the GMP library being used by some of the other programs. I was looking into this many months ago for the CLBG and came to the conclusion that it was due to better compilation options being used and/or GMP's source code being better configured (this could be considered as using different library source code but GMP has a program for tuning some of its parameters, see https://gmplib.org/manual/Performance-optimization). I verified for the CLBG that it is in fact the library that is faster by making a program use Chapel's GMP library which causes it to run just as fast as the Chapel program.
Again, it depends. For comparing programming languages and their implementations, I think that avoiding heavy usage of libraries leads to much more useful and valid comparisons. |
If someone is using a better, accepted algorithm in their program, then the other programmers can just copy that same algorithm if it works better for them too. This has happened many times on the CLBG. The same does NOT currently apply if someone uses a better library. If some programming language implementation's standard libraries are using some great library made in assembly but another programming language implementation does not use that same library in its standard libraries, then programs made with that second programming language implementation are at a disadvantage and nothing can be done about it except try to convince the developers of that programming language implementation to include that library in a future release.
Yeah, you can likely just use the better algorithm but you can NOT just use a different programming language for the library (disallowed by the rules) and that can almost always put you at a potential disadvantage unless you're programming in assembly.
You're missing the point. What if we agree to allow the hiring of companies but for some reason I have more money than you and can hire a better company so we still are faster than you? The point is that >> I << would not be doing the work so it wouldn't be right to say that >> I << am faster than you.
Yes, you'd be evaluating the performance of the programming language implementation that the library is implemented in. However you might be adding a lot of other variables that you don't want to be adding. Maybe the library is using a different programming implementation than the one your program is using. Maybe that library's programming implementation doesn't support all the same features (like SIMD or multithreading) your program's programming implementation does. Maybe that library was compiled with less optimal compilation options. In my opinion it's better to avoid heavy use of libraries so that you reduce the influence of these other variables. |
This is a tough one and we can argue back and forth for ages. How about letting the site maintainer decide about the rules. It should be his responsibility. After he or she communicates it clearly we have to abide the rules and stick with it. At the moment I kind of thinking about dropping out all together since it looks like I can’t even convince the site maintainer about a simple thing that SBCL is always allocates on the heap without specifically requesting stack allocation even after the language maintainers confirmed it and the heap allocation profiler clearly confirmed it I am not getting any reply for long days. This is just embarrassing and very frustrating. It looks like being right from the beginning is something that is not welcomed here. All CL binary tree codes are removed for being too fast compared to Java. I don’t even want to go into the tricky area that knowing about Java’s very advanced escape analysis what guarantees do we have in Java. And if the compiler, current or future, decide to put something on stack should I intervene somehow. I have even started improving one of the C++ codes since I am worried what can happen with the nsieve codes that are faster then the C++ using the same algorithm since C++ is using the slowish std::vector against the fast simple-bitvector in the Common Lisp code. I rewrote the C++ code using the Boost library but I don’t get reply if that is allowed or not either. Could you please have a look at the 3.cpp nsieve code sitting in my unaccepted PR and speed it up by at least 25% using standard library only to beat the 3.cl code that is using the same algorithm and is also sitting in my unaccepted PR? So it looks like at the moment that I am wasting my time. Correction: nsieve 3.cl is about 25% faster than 3.cpp and about 30% faster than 2.cpp on my CPU (i7-7700HQ) but both C.pp codes are a bit faster on the Intel(R) Xeon(R) Platinum 8272CL. Good. I don’t have to worry any more. |
I'm not sure if you were referring to me. If you were, I think I'd rather port your existing, accepted 5.cl program to C/C++ instead. It looks like that is using a different and more optimal algorithm. |
There must be having a code somewhere since as far as I know the CL code is a loose port from C. |
I might have found the C code. |
I have made the amendments to the above C code to work here and it runs quite a bit faster than the CL code on my particular CPU. However I don't see any point sending a PR since it looks like nothing is accepted from me. No matter how nicely or what I ask about or for I don't get any kind of reply or action for something like 7 days. Not even a yes or no or a get out of here since then. The last message I've received 7 days ago was " I believe I've clearly explained everything I could with examples and profiling reports, and all I need is a profiler report to prove the fairness which I believe is clear and actionable. If you still cannot or refuse to understand my point, please feel free to fork the code and publish your own results, as this talk is leading nowhere but wasting my time, I'm closing it." I have contacted the SBCL language maintainers who confirmed my point and a day or two later with the help of the head SBCL maintainer I did send the profiling report clearly proving that I was right from the beginning but no reply or action since then. And all this hassle and annoyance and time wasting is based on this:
OCaml code was profiles. And if OCaml is doing something than CL must be doing the same. The numbers are not even better!!! They are better for small trees like d=14 and close or slightly better at 18 but when the JIT can really kick in at 20 or 21 Java clearly wins.
Of coerce the multi-threaded CL codes are still faster than the single-threaded Java code for a reason. I hope I just imagining it but it feels very upsetting, annoying and extremely unprofessional how this issue is handled. At least in the CLBG site first we got something like "You are not welcomed here", "Harassment is not acceptable", "Bullying is not acceptable" "Anything you've asked for is rejected" message before cancelling my account altogether. But anyway, I am still hoping that the site maintainer is just away from the site and then my apology is very much warranted, though he was replying to you and one of the other guy in the meanwhile so I don't know. |
I would wait at least several more days before complaining. We just had a couple holidays in the last couple weeks and it's not uncommon for people to be taking extended vacations this time of the year so people might not be responding to things as quickly as usual. I've also got a pull request that hasn't had any responses or actions for six days so you're not the only one.
You did make a couple mistakes (which Isaac and I pointed out), called Isaac's results misleading and said he must have messed up something (I verified that the results were probably correct and that he didn't mess up anything), made an unreasonable request for him to benchmark on newer hardware with at least 8 GiB of memory (for a benchmark that was demonstrated to not even require much memory), were making a lot of comments/edits/new issues. It was definitely wasting a decent amount of his time and overall I could see how he considered it to be harassment. I do think it was wrong for him not to accept the code from others though unless he had a good reason to believe that they were just doing it on your behalf. |
Jeremy, When you see speed figures on about 15 more recent CPUs that are completely different to what is shown on a benchmark site using a, at that time, 12 year-old CPU you try to guess what is going on. I guessed the lack of RAM as well not just the CPU. I might have been wrong there. I was indeed wrong there. But my main point was that the 12 year-old CPU he used was absolutely not adequate for comparing languages and reporting speed figures when, I am quite certain, >95% of the developers had long forgotten about that CPU architecture. As I mentioned I did spend a lot of time and bench-marked on about 15 more recent CPU's, to see if I was doing something wrong, ranging from something like 6-7 years-old to 1-2 year-old. You quite rightly protected Isaac for using 2Gb RAM but if I remember correctly you did not do the same for the CPU. When you do something and you wanna do it with credential you are not doing it with ancient tools. The small CPU upgrade at CLBG and the relative speed figures on this site did prove my point. I am still standing by it and in my opinion it wasn't a mistake to point out that the figures were very misleading to people. I have never accused him that he did anything wrong intentionally. But anyway since I am not allowed to contribute there anymore it is just pointless to argue further about it, however, I sometimes connect back to CLBG to see what is going on there and what Isaac was doing with those other guys who might have been acted upon my request to improve and contribute my code was the most shameful thing I have seen from any site maintainer. How shameful was it to punish someone whose only sin was to try to help someone else for sympathy? I hope this site maintainer is indeed enjoying his well earned holiday and I cordially apologize to him for saying anything that might have offended him. |
I am not missing the point. What if a sportsman, say a runner, have more money and hires a better trainer and uses better training facilities and becomes faster? He is abiding the rules and gets faster then you. It happens everywhere including software development. |
I'm not sure if you're trying to support your argument or my argument here. In your example the sportsman is still doing the work, no one else is doing the work for him. Similarly I'm trying to argue that the programming language and its implementation should be doing most of the work for these benchmarks, not libraries. Additionally if your sportsman is using a better training regimen, I should be allowed and capable of copying that training regimen to make things more fair. Similarly I'm OK with allowing programs to copy different algorithms (and I even argued that it should be allowed) as long as it's allowed for everyone to do it, feasible for everyone to do it, and is done without using a different programming language. I'm trying to argue that libraries should not be doing most of the work for these benchmarks since these are supposed to be benchmarks for programming languages and their implementations, not libraries that will likely be using different algorithms than one another and implemented in different programming languages. P.S. I could comment a little bit more about your other comment regarding the CLBG but I'm going to ignore it so this issue doesn't get off track. If you want to continue that discussion, send me a message at https://JeremyZerfas.com/Contact.jsp . |
I was referring to “have more money” part of you message and see how this wasn’t clear whose argument I was supporting:) I fully understand your point and I also fully with you on, if allowed at all, only allow to use same language libraries without exception. I can’t be though with you at the moment to altogether disallow using external libraries since one language might have lot of thing in the standard library that in the other might only be supported externally. My preferred language Common Lisp were standardized in 94 and all new language inventions and extensions can only be in external libraries. And how should those languages be handled where even standard libraries are written in different language? It is a tricky field for sure. Nevertheless, I could also accept your standing to limit the use of libraries and even support it as well if it is decided and can be done in a transparent and enforceable way. This transparency and enforcement requirements though might be also not the easiest to do. My only strong points are that the rules should be clear for everyone from the beginning and also enforceable without much room for argument. However, will it be possible? |
I said it earlier but I'll say it again, I'm OK with allowing libraries to enable the use of primitive/basic type things (say like adding support for multithreading or SIMD) which may be important for performance but ultimately are not responsible for doing most of the work. What I'm not OK with is problems like the pidigits problem where often 90+% of the time is spent in arbitrary precision math libraries or the regex-redux problem where often 60+% of the time is spent in regular expression libraries.
That's fine just as long as the problems aren't making heavy usage of the libraries. Problems like fannkuch-redux, fasta, mandelbrot, nbody, and spectral-norm are good problems because they don't make heavy usage of libraries (pretty much the most complex things libraries are used for in these problems is doing fairly basic things like square roots and outputting data and they don't make up the vast majority of the work). They primarily have the program running lots of its own loops, conditionals statements, functions, etc... knucleotide is a bit borderline because of its heavy hash table usage but it still involves the program doing a decent amount of the work.
If you design the problems so that they don't require heavy library usage (and assuming programs are allowed to use different algorithms), then there shouldn't be a need for many rules and enforcement should be fairly easy. Pretty much the only rules you would need is that a submitted program can not have any code in a different programming language and it should only be allowed to use its own standard libraries and a few other approved libraries for enabling the use of primitive/basic type things. |
But as far as I know those two benchmarks were specifically developed to test arbitrary precision math and regular expression operations. Of course most of the time is spent on those operations.
I don't think that use of hash table should be a problem at all. Regarding your last paragraph I can relate to it but still feeling a bit uncomfortable to limit the benchmark problems to such an extent but could easily accept it if needed. |
That brings us back to my original issue, they are "Problems requiring heavy library usage" and they "might not be good for benchmarking." See my original comment for reasons why this is bad.
Slight nitpick but from my count slightly over half the programming languages in these benchmarks now have built-in support for arbitrary precision math. I've pretty sure regular expression support is even better. |
Slight nitpick but if there is such a strong built in language support for those than why do we need heavy library usage?:) Someone has to decide what is allowed and what is not and then we have to stick to it. |
We don't need heavy library usage (as demonstrated by some of the good problems I mentioned two comments ago) and we should avoid it. Although about half of the programming languages have built-in support for arbitrary precision math, most of the best performing programs for those languages on the CLBG pi-digits benchmark use the GMP C/assembly library instead. They're presumably using GMP because it is faster than the built-in arbitrary precision math code, it almost certainly wasn't done because it was easier or required less source code. Just goes to show that GMP is possibly the fastest performing arbitrary math precision library and if problems are made to require heavy library usage, then it will just turn those problems into benchmarks for comparing libraries instead of programming languages and their implementations. |
But if we limit the use of libraries to same language ones only than the above problem just vanishes. You have a strong enough argument and I can go with either decision, so let the site maintainer decide when he is back. |
That wouldn't make all of the above problems vanish, it would only solve one part of the above problems. Multiple libraries made in the same programming language can and often will have different performance from each other due to the use of different algorithms and different code quality. I previously linked to some benchmarks from one of the PCRE developers and that alone shows how different algorithms in PCRE have different performance from one another depending on whether you use the basic, JIT, or DFA engines. It also shows how other C regex engines like TRE and Oniguruma have different performance. It would also be impractical to limit the use of libraries to ones that are written in the same languages since some of the standard, built-in libraries for some language implementations are written in different programming languages. Chapel, GHC, Julia, and PHP usually/always include GMP for arbitrary precision math for instance. Even more implementations have arbitrary precision math code which is not written purely in the relevant programming language but instead uses a considerable amount of assembly or C/C++ code for doing the hard work. |
I did get your point from the beginning but this logic leads us to not allowing the use of any libraries whatsoever since even the standard libraries if written in the same language or not can have wildly different performances. So what would be your recommended solution? No library use at all not even the std libs? So all the problems should be defined to use some kind of common denominator approach, and if not possible, then if a particular language missing that certain feature than the programmer should reimplement it? And who would be the wise one to define or pick the problems and then control it in a transparent way acceptable by everyone. As I reiterated several times I could go with any approach when clearly defined but what I foresee by this approach is an even bigger opportunity for disagreement. |
Remember that I keep on saying that HEAVY library usage is not good for benchmarking. I certainly don't think that ALL library usage should be disallowed, that would be somewhat crazy (it would make it harder to do input/output from the programs for one). In my opinion the problems should be designed so that probably no more than a third (and preferably less) of the CPU time is spent in libraries. The fannkuch-redux, fasta, mandelbrot, nbody, and spectral-norm problems are ones where I think this is mostly the case and why I think those are good problems (and it's probably also part of the reason why there hasn't been much controversy on the CLBG for these benchmarks as compared to the binary-trees, k-nucleotide, pi-digits, regex-redux, and reverse-complement problems which rely more on libraries). The problems should be designed to be more like those problems. |
"Chapel uses an unmodified copy of the latest GMP release, but makes use of its custom allocation routines so that it can take advantage of Chapel's fast and thread-scalable allocator." |
It's probably true that Chapel is using a mostly unmodified copy of GMP. (Edit: This is correct for Chapel's preferred configuration (see https://chapel-lang.org/docs/usingchapel/QUICKSTART.html#using-chapel-in-its-preferred-configuration) and probably correct for the CLBG but it turns out this project is using the chapel/chapel:latest docker image which sets the CHPL_GMP environment variable to "system" which should make Chapel use Debian's GMP library.) My previous statements that Chapel will preferably use its own GMP library and that Chapel's GMP library is a bit more optimized than the GMP library used by many other programs are also true and that's the important point. If you build and install Chapel on Ubuntu, you should probably be able to find its GMP library inside Just to better show how much of a difference the library can make, I compiled and have included results of running (on a Core 2 Duo E8400) the Chapel #2 and C #1 pidigits programs from your site which seem to be using roughly the same algorithm (the Chapel program seems to have been based on that particular C program). I also compiled the C program using several different libraries (but no other changes): the Ubuntu provided GMP library, Chapel's GMP library, a GMP library I built from source with the default configuration (which should be similar to how Chapel's GMP library was built), and a GMP library I built from source but which I configured for a generic x86-64 processor using
I think this pretty clearly shows that just changing compilation settings of a library can have a considerable impact on performance and that Chapel's GMP library is a bit more optimized than the Ubuntu GMP library that many programs will be using. Getting back to the original issue, it's not hard to imagine that changing libraries can have even more effects on the performance of the programs. Finally, for the pidigits problem/benchmark, changing the memory allocation routines will have a negligible effect on performance. I did some profiling and the memory allocation routines only use ~0.1% of the CPU time for the above programs. |
Thanks for sharing the facts, that's what I'm interested in. On this topic, I think either way makes sense, depending on ppl's perspective, it's an important goal of this repo to provide a tool(offline data generation+ online data visualization) for ppl to easily benchmark their own selection of programs on the hardware they choose. I don't think anyone has to agree on some compromised rules, as that also means no one agrees with it 100%. The term performance is vague enough and some of the disagreements are not compromisable, like some only care about numbers on a single core machine, but I'm not likely to set core affinity, or some may care about the performance of avx512, which I'm not going to enable either. Some may only care about perf w/o even using stdlib (e.g. ppl who work on compilers), but I care about perf of the stdlib + most popular libs in a language's ecosystem (e.g. ppl who are evaluating tech stack). Then the best solution is not to work on some compromised goal(which is not fun and just wastes time), but they can just maintain a different subset of programs that they can make the most sense of and the tool here is always ready to help, and waiting for improvements. P.S. I've already regretted borrowing or accepting many of the programs. It is a non-goal for myself to make the number smaller, by using parallelization or different algos, which are adding maintenance burden but I personally cannot make extra sense out of them. Although at the same time, I recognize the value of these programs for many ppl, I'm not going to compromise like I mentioned above, meaning of performance can be very personal |
Hi Jeremy, Do you get similar relative differences on more recent CPUs as well? |
I looked a bit more into how GMP is used by Chapel in this project and it looks like I was wrong that Chapel is using its own GMP library in this project. It looks like this project uses the chapel/chapel:latest Docker image at https://hub.docker.com/layers/chapel/chapel/latest/images/sha256-c58392f33922079a6a202d1405a7176d6c6263f443abd411babc8a7cac38fc1a?context=explore and that sets the CHPL_GMP environment variable to "system" which should make Chapel use Debian's GMP library. On the other hand, Chapel's preferred configuration (see https://chapel-lang.org/docs/usingchapel/QUICKSTART.html#using-chapel-in-its-preferred-configuration) does cause it to use its own GMP library and I'm still pretty sure that that is the setup being used on the CLBG. I'll also update my previous comments to better reflect this. |
Below are results from my previous tests but with an i7-3615QM and EPYC 7551 included. The E8400 and i7-3615QM were running a live version of Xubuntu Desktop 21.10 booted from an ISO on bare metal but the EPYC 7551 is running from a fully installed version of Ubuntu 21.10 on an Oracle Cloud Free Tier VM.Standard.E2.1.Micro VM (which is throttled). I wish I thought of this earlier but I compiled the C programs using static linking but it looks like Chapel uses dynamic linking by default. I did verify on the EPYC 7551 though that changing between static and dynamic linking for both the Chapel and C programs has a negligible effect on performance.
Below are results of comparing the CLBG Chapel #2 and C #1 programs with results from the CLBG included. On the E8400, i7-3615QM, and EPYC 7551 systems the C #1 program was using static linking but the CLBG systems used dynamic linking instead. On the 8400, i7-3615QM, and EPYC 7551 systems the C #1 program was linked to the Ubuntu GMP library and I think that was also the case for all the CLBG systems with the possible exception of the i5-3330 Ubuntu 21.10 system. I see that sometime after December 12 Isaac upgraded Ubuntu on the i5-3330 and now the C #1 program as well as several other programs using GMP got a noticeable performance boost. On the i5-3330 Ubuntu 21.10 system the C #1 program is now slightly faster than the Chapel #2 program and it's also beating itself running on my i7-3615QM with the Ubuntu GMP library so I have a suspicion that the i5-3330 Ubuntu 21.10 system's library setup for GMP changed so that the Ubuntu GMP library is not being used.
It appears that on newer hardware, the Chapel #2 and C #1 programs that used either Chapel's GMP library or a GMP library built with GMP's default configuration had similar performance changes. It also appears that the default (more optimal) configuration GMP library is getting faster than a generic x86-64 configuration GMP library. |
Note "chpl version 1.25.1" "built with LLVM version 11.0.1" I should be more explicit:
|
I think I've heard enough from hanabi1224 to get his stance on this issue and now think that a summary of the situation should be left for any other people that are also interested in this issue. @hanabi1224: Can you look over the following and add any corrections? If I'm correct about your stance, I think it would also be good for you to label this issue as "wontfix" and close it.
|
Thank you @JZerf for the summary.
I think these problems by nature depends on arbitrary precision big integer or regular expression engine (maybe 256bit big integer is a better suit for secp256k1, but it may not be feasible in all languages), IMHO these problems are more reflecting the ecosystem of a language since it's not likely possible to implement an efficient version of such kind of libs in ~100 lines of code, FFI is only allowed for stdlib because FFI is tricky for cross-platform compatibility, which typically only stdlib would guarantee. Since these problems are more reflecting the ecosystem instead of performance of language/compiler, I feel cross-platform compatibility should be taken into consideration. (e.g. Go's big module in stdlib is written in assembly, performant with good cross-platform compatibility, this standard might be too high for a third-party lib)
For other problems, I would prefer not to use third-party libs when everything can be implemented in ~100 LOC, which should be included in per-problem guidance in the future. (OTOH, I feel it's OK to directly use some common data structures like ordered map, doubly linked list for LRU, bit vector for nsieve, when they are missing from stdlib). coro-prime-sieve is another exception, the goal is to evaluate the infrastructure of coroutine/fiber/green thread/virtual thread and their schedulers, so solutions that use semi-coroutine(generator) are not accepted. Implementations that use third-party libs should be specially marked (e.g. with name 0 or some suffix) to indicate it's showing reference perf numbers of some state-of-art implementations but not really comparable to others. (some of go and rust implementations under LRU are violating the naming rule ATM)
There's only 0-ffi.go under secp256k1 doing this. I use it to show the perf number of state-of-art implementations as a reference. I have to admit that, I have not really thoroughly thought over the problems made by myself, some of them are not good enough. Most of the secp256k1 implementations are ported from an algorithm optimized for javascript but not low-level languages, and it's mostly testing performance of arbitrary precision big int lib, without showing extra values beyond pidigits and edigits. And I've made some updates to LRU as others suggest, the initial setup of the LRU size was bad indeed. Also, I personally feel some of the problems from CLBG have small flaws, fasta, mandelbrot and pidigits are printing too much text to stdout, making it a key influencer to use buffered stdout, some programs are slow only because they fail to do so. e.g. non-parallelized fast #1 is much slower than parallelized fast #5 on CLBG, which is actually a misleading conclusion, after fixing fast1 with buffered stdout, it's much faster than fast5. The same problem exits in many implementations of these problems, making the numbers a bit misleading when ppl don't realize, and that's why I changed mandelbrot output to print md5 hash of the graph instead of its binary. (CC @igouy on this part) |
I tried your fasta # 1 but unfortunately:
fwiw Once the output is checked the benchmarksgame programs are run again with output redirected to
|
@igouy This can be fixed by replacing |
This is what I see:
Maybe your fast5 is different?
? |
Maybe 4 cores help, I c 1 being consistently faster than 5 on the 2 core VM : ) |
So a misleading conclusion because someone might run a program written for multicore on a single core? |
I think there is something messed up here. For the CLBG benchmarks, the pidigits programs only generate a few KB of output and the mandelbrot programs only generates ~32 MB of output. These are quite reasonable in my opinion and writing to stdout generally shouldn't be very slow as long as the program isn't doing something really stupid and whatever stdout is connected to isn't being a bottleneck. For comparison, on Isaac's reverse-complement benchmark the fastest program reads, processes, and outputs over a GB of data in just ~0.4s on Isaac's computer. It's kind of rare for a single threaded program to be faster than a multi-threaded program when ran on a multi-core system. It certainly can happen in some cases but often people wouldn't bother dealing with the multi-threaded program if that was the case. It's quite sensible for the CLBG multi-threaded Rust #5 program to be faster than the CLBG single threaded Rust #1 program. Just like for Isaac, the CLBG #5 program was faster than the CLBG #1 program on both my E8400 and i7-3615QM systems (and I tested the i7-3615QM with both 4 cores and 2 cores enabled). The 5-m.rs program from your site was also faster than the 1.rs program from your site. I think it's actually your benchmarks that are being misleading in this case by showing a multi-threaded program as being slower than a single threaded program while running on a multi-core system. Looking more closely at the fasta results on your site, I now notice that for many of the programs the run times are considerably higher than the CPU time (user + system time) which is really unexpected. For instance the top program has a run time of 336 ms but the CPU time is only 177 ms. I think this is indicative of a problem with your benchmarking process and we should look into this more to find out what is going on. I was just using the |
I must have measured However, measurements for
|
You are right currently stdout and stderr are redirected to the tool process and then discarded, it might indeed create extra overhead for the toy app (e.g. buffer size too small for this problem, or reading and discarding the stdout is a heavy task for the tool so that it competes for cpu cycles with the parallelized program). Plz note that time(user) and time(sys) are directly read from /proc/{pid}/stat instead of being measured by the tool, they show the same result when output is not redirected to /dev/null But anyway, I've updated the tool to make the redirection, new results will be published soon, thx for ur testing. And I might consider migrating to hyperfine as well at some point. |
Feels like it's been many years since anyone made the too much text to stdout criticism. Perhaps other criticism has become more fashionable; perhaps fewer people actually look at the programs. There have been several who did not understand that large output was being sent to
The benchmarks game measurements are not made with hyperfine, but I do find hyperfine convenient for this kind-of discussion. The benchmarks game scripts are unnecessarily complicated because they were written to provide "features" that are no longer needed. If I wanted to rewrite now, I would try to use existing tools like hyperfine as-much-as possible.
Sorry to say but I try hard not to look at your project — I try hard not to interfere :-)
All of them have small flaws and some of them have big flaws. What's shown is provisional not definitive. |
@igouy I'd like to submit the new fasta1 to CLBG since it does improve the number, do I need to create a ticket in the repo or is it OK u just take it from here? BTW, 1c.rs is even faster with compile-time calculations but it requires nightly rust to compile |
Yeah, run-time will be less if the work is moved to compile-time. |
Getting the issue back on-topic, @hanabi1224, you said that you would prefer not to use third-party libraries when everything can be implemented in ~100 LOC. Are you sure that is a good idea? Only allowing third-party libraries when some code reaches an arbitrary amount of LOC seems like a bad idea to me. A couple of the benchmark programs are already using a third-party library to provide MD5 functions but it looks like that is something that could be implemented in ~100 LOC (see https://openwall.info/wiki/people/solar/software/public-domain-source-code/md5 for an example). Should these programs be forced to write their own MD5 code instead? For some things it might also be possible to create simple implementations that meet a LOC requirement but creating fast implementations might require more LOC. As an example, see /~https://github.com/jserv/cregex and /~https://github.com/mattbucknall/subreg for some simple regular expression engines and /~https://github.com/ankushagarwal/nweb for a simple HTTP server that aren't too far over that ~100 LOC limit (and could probably be made smaller to be closer to that limit). These examples would almost certainly be generally outperformed by more popular and complex regular expression engines and HTTP servers. I think it would be a bad idea to require some people to write simple regular expression engines and HTTP servers and then use them for benchmarking against more complex and faster regular expression engines and HTTP servers. If you're going to have benchmarks that are going to make heavy usage of libraries, I think it would be a better idea to allow programs to use any third-party library made in the same programming language as long as it doesn't create any large resource/maintenance requirements. Regarding resource/maintenance requirements, maybe anyone adding a new third-party library should be required to provide a pull request that updates the project to add the library and those changes should be required to download no more than say 50 MB of additional data from reasonably trustworthy sources, use no more than 50 MB of additional disk space, require no more than 5 minutes of additional build/bench time, etc...? |
Yes, Julia is using a 3rd-party lib for md5, currently, md5 is only used to reduce the size of output text, and is not used in the critical path for any problems (at least not designed to be), so I think either way is fine, better to save some bytes by using a lib
The
Same as
This should really depend on the goal of the benchmark and whether it's used in the critical path. There should be detailed rules for each problem in the future. |
But is it REQUIRED for people to write their own HTTP code? You previously said that you would prefer not to use third-party libs when everything can be implemented in ~100 LOC and it appears that this is something that could be implemented in ~100 LOC. Same question could possibly be asked of regular expression engines and JSON serialization/deserialization. I think it would be a bad idea to require people to write these things themselves but that seems to go against what you said. I think it would be better to get rid of the arbitrary ~100 LOC limit and just allow all third-party library use as long as it's written in the same language, doesn't create any large resource/maintenance requirements, and isn't explicitly prohibited for certain problems.
I don't think it should matter whether the library is used for critical path code or not. You currently are allowing third-party libraries for MD5 and HTTP functionally even though both could probably be implemented in ~100 LOC and the MD5 libraries are not used in critical path code but the HTTP libraries are. I'm trying to summarize under what circumstances you allow third-party library usage based on what you've said and programs that have been accepted so far and this is roughly what I have so far:
With the notable exception of libraries for memory allocations, pools, arenas, etc... (which would be useful for the binarytrees and merkletrees problems), you're pretty much already allowing every type of third-party library that would be needed. I think it would be better to just allow all third-party library use as long as it's written in the same language, doesn't create any large resource/maintenance requirements, and isn't explicitly prohibited for certain problems. |
Maybe, yes, as long as it does not invalidate the goal. Most of the problems are either evaluating libs on purpose (pidigits, http-server, coro-prime-sieve, json-serde, regex-redux) or the main part cannot be solved directly by libs (all the rest except secp256k1) |
Similar issues discussed
|
One way to make them aware is to present the library usage. However much one may attempt to make them aware, people will only see for themselves.
Valid not to bother with those particular tasks, although used as a stick to beat the benchmarks game pinata as a whole. Is SIMD use in the benchmarks game an example of heavy library use? |
@hanabi1224 > I've already regretted borrowing or accepting many of the programs. You can un-borrow and un-accept ;-) Maybe you'd like the simplicity of The Fastest Programming Language? |
Generally I would say "no". I see SIMD usage as being kind of similar to floating point number usage. Nowadays I (and I imagine most people) generally wouldn't say that floating point number use is an example of heavy library usage either. Now on the other hand if you were trying to use SIMD or floating point instructions on an old 486SX that lacks support for either and had to resort to using software routines, I think in that case an argument could be made that this could be considered heavy library usage. Luckily just about every modern (non-microcontroller) processor has floating point instructions and just about every programming language supports the direct use of those. SIMD support is much more varied at this time so some programming languages might have to use libraries/intrinsics/functions to make use of those SIMD instructions and more annoyingly it often has to be done in some platform specific manner instead of just a generic manner. |
@hanabi1224: Have I correctly summarized below this project's stance over library usage? If so, can you label this issue as "wontfix" and close it?
|
Some of the current problems being used for benchmarking usually result in program authors using a library to do much of the work. Examples of this are the edigits and pidigits problems that usually require an arbitrary precision math library like GMP, the regex-redux problem that usually requires a regular expression library like PCRE or RE2, and the secp256k1 problem that usually requires a crypto or arbitrary precision math library. It seems like the goal of these benchmarks is to benchmark programming languages and their implementations so it might not be a good idea to have problems that will typically be heavily dependent on libraries to do much of the work.
Many of the libraries will be implemented in a different programming language (like C, C++, or assembly) than the one the program is written in and additionally libraries can have greatly different performance from other libraries. This results in these problems being more of a benchmark of the libraries than the programming languages and their implementations.
Also if there is one highly dominant library (like GMP for arbitrary precision math), this can result in many ties. This was demonstrated about a year ago on the pidigits benchmark on the Computer Language Benchmarks game when there was roughly a 10 way tie for best performance. This is highly indicative of just how much program performance for these problems is dependent on the libraries being used.
Many people probably won't even be aware of this library usage but those who are probably won't find benchmarking of libraries to be quite as interesting as benchmarking programming languages and their implementations. I know at least a couple other people have the same thoughts that I do. I would suggest that the edigits, pidigits, regex-redux, and secp256k1 problems (as well as any others I may be missing) should be removed and future problems should try to avoid the use of libraries.
The text was updated successfully, but these errors were encountered: