Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[11.x] Replace MD5 with xxh128 in File::hasSameHash() #54690

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

vlakoff
Copy link
Contributor

@vlakoff vlakoff commented Feb 19, 2025

In File::hasSameHash(), replace MD5 with xxh128, which has been available since PHP 8.1.

xxh128 has the same 128-bit length, but is much faster and avoids the security issues associated with MD5. See also this article.

Quick benchmark for comparing 100 MB files:

Benchmark code
// Using MD5 hashes

$start = microtime(true);

$hash1 = md5_file('./random_100MB_file_1.bin');
$hash2 = md5_file('./random_100MB_file_2.bin');
$areHashesIdentical = ($hash1 === $hash2);

$timeMd5Hash = microtime(true) - $start;

// Using xxh128 hashes

$start = microtime(true);

$hash1 = hash_file('xxh128', './random_100MB_file_3.bin');
$hash2 = hash_file('xxh128', './random_100MB_file_4.bin');
$areHashesIdentical = ($hash1 === $hash2);

$timeXxh128Hash = microtime(true) - $start;

// Direct comparison of contents

$start = microtime(true);

$content1 = file_get_contents('./random_100MB_file_5.bin');
$content2 = file_get_contents('./random_100MB_file_6.bin');
$areContentsIdentical = ($content1 === $content2);

$timeContent = microtime(true) - $start;

// Results

echo 'MD5 hashes: ' . round($timeMd5Hash * 1000) . " ms\n";
echo 'xxh128 hashes: ' . round($timeXxh128Hash * 1000) . " ms\n";
echo 'file_get_contents(): ' . round($timeContent * 1000) . " ms\n";

Results:

MD5 hashes: 375 ms
xxh128 hashes: 76 ms
file_get_contents(): 69 ms

(The code using file_get_contents() is only for reference. Large files would exhaust memory, and hash_equals() (#49721) is designed for short, same-length parameters.)

Using xxh128 virtually removes the hashing overhead. 🚀

xxh128 is already used in some places in Laravel, for instance see #45371 and discussion #46074.

Drop-in replacement. Much faster than MD5, and avoids its security issues.
@crynobone
Copy link
Member

#52301

ping @GrahamCampbell

@vlakoff
Copy link
Contributor Author

vlakoff commented Feb 19, 2025

For the case at hand, I can't think of any regression. The hashes don't go outside the function.

@crynobone crynobone changed the title Replace MD5 with xxh128 in File::hasSameHash() [11.x] Replace MD5 with xxh128 in File::hasSameHash() Feb 19, 2025
@taylorotwell taylorotwell merged commit bf58d21 into laravel:11.x Feb 19, 2025
46 checks passed
@vlakoff vlakoff deleted the Filesystem/hasSameHash branch February 20, 2025 02:39
@francoism90
Copy link

Today I've learned this method actually exists. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants