-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5 SIMD #56558
Comments
How are you doing md5? In order support simd in this fashion, you'd need to batch the md5 requests and execute them in a separate thread anyway (maybe a libuv worker). It has to be asynchronous because you want other http requests to come in and request md5 hashing. Literally this is worth doing only if your application is doing many md5 hashes concurrently (to the point it's mostly doing md5s). Have you tried wrapping that module in a native addon and experimenting? |
Could you share the JavaScript/Node.js code that led to these numbers? |
I think the information and discussion in fastify/fastify-etag#91 can be relevant. |
FWIW, MD5 is not a good hash function by any means and new applications should avoid it except to interact with older protocols. That being said, even for hash functions that should be used, I am not sure that this kind of optimization makes sense within Node.js. I am not particularly optimistic about OpenSSL adding this internally, but that still seems like a better place to start optimizing the hash function implementation. |
Indeed. However, Content-MD5 is commonly used https://datatracker.ietf.org/doc/html/rfc1864
Why?
Why does it need to live in OpenSSL?
I don't think the MD5 hash implementation can be much more optimized than it already is in terms of processing a single stream. |
On ARM hardware, Bun is measurably faster than Node at MD5 hashing. JavaScript hashing speed comparison: MD5 versus SHA-256 So I would not rule out small gains (e.g. 20%). |
Yes, this falls under the "older protocols" I mentioned above, but of course I did not mean to imply that these older protocols are not being used anymore. The unfortunate popularity of MD5 is also why OpenSSL has not removed it yet despite being fundamentally broken and slow.
We already use OpenSSL's implementation of MD5, so if that can be made faster for all downstream consumers, that'd be my first approach. Avoiding yet another dependency or even our own bespoke implementation of cryptographic primitives is a good rule of thumb in my opinion. |
If you are using crypto.createHash() the overhead primarily comes from wrapper management/V8's GC performance, instead of having something to do with actual hashing, like what we already found in nodejs/performance#136 - to eliminate these factors, use crypto.hash() (which is why it's significantly faster than crypto.createHash() for one off hashes, though this doesn't matter all that much when hashing a big stream in chunks which is what crypto.createHash() is really for). Even using cppgc-based wrappers would be enough to make crypto.createHash() 8-10% faster in a microbenchmark that misuse it for one-off hashing, which shows how little this have to do with actual hashing and more having to do with GC... |
What is the problem this feature will solve?
MD5 hashing is quite commonly used for file handling in HTTP. However, the algorithm is quite slow and is not able to fully utilize modern hardware.
What is the feature you are proposing to solve the problem?
While it's not possible to do much more to optimize a single hashing instance there are techniques such as /~https://github.com/minio/md5-simd which is able to run multiple hashing instances over SIMD which can process 8 MD5 hashes in parallel using SIMD instructions.
In a real world application such as a http file server/client (using Content-MD5) with many parallel request this would provide a 2-6x real-world performance improvement. In some of our applications the MD5 hash takes more than 50% of cpu time.
This should be possible to implement without changing our API's.
What alternatives have you considered?
Run hashing in a thread pool. One does not necessarily exclude the other. Using a thread pool would be more about avoiding latency spikes as in terms of throughput just forking the http server provides similar results.
The text was updated successfully, but these errors were encountered: