Suggest: Add xxHash support for faster content dedupe

Discussion related to "Everything" 1.5 Alpha.
Post Reply
bit
Posts: 38
Joined: Fri Feb 17, 2023 8:57 am

Suggest: Add xxHash support for faster content dedupe

Post by bit »

when finding dupes a faster hash algorithm is preferred.
we have dupe:size;sha256, but it's a bit slow. (also sha1/md5/sha512)
I found a better hash algorithm there for non-cryptographic use, which fit the need for fastly dedupe files.
xxHash, especially XXH3, which is fastest in my knowledge.
Looking forward to see it supported in Everything.
xxHash - Extremely fast hash algorithm
https://github.com/Cyan4973/xxHash
void
Developer
Posts: 16775
Joined: Fri Oct 16, 2009 11:31 pm

Re: Suggest: Add xxHash support for faster content dedupe

Post by void »

I will consider support for xxHash.

Thank you for the suggestion.
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Suggest: Add xxHash support for faster content dedupe

Post by therube »

While xxHash is an Extremely fast hash algorithm, (& while I too would like to see it, with the XXH3_64b* algorithm [& if not, then XXH64), there are caveats.

That mainly being "throughput".

You can only go as fast as the slowest part of the equation.

So you have a fast hash, yet your I/O to your drive is slow (think like a USB 2.0 drive, or worse), at that point the efficiency of xxHash is pretty much for not. If you bench, you'll find that md5 or sha1 will give you very close results.

Now, given "speed" all around, xxHash will be faster.

Also implementation matters.
As in FastCopy, which can use xxHash, seems to get better throughput (during a copy/verify cycle) then expected.
Because it uses multi-threads for Read/Write/Verify, Overlapped I/O, Direct I/O, so it brings out the best speed of devices.
(And as far as that goes, voidhash seem rather efficient too.)


XXH3_64b*
Theoretically a bit faster then XXH64 & the same (physical) size hash (in bytes, & also less screen real estate [then the longer byte-size XXH128 hash]).


Oh, x64 only, so it would not work with Everything x86.
Rene
Posts: 58
Joined: Fri Nov 04, 2016 6:16 am

Re: Suggest: Add xxHash support for faster content dedupe

Post by Rene »

Faaassst-cinating! :mrgreen:

xxHash also has XXH32, so welcome back Everything x86.
Post Reply