Blake3 (hash function) implementation

Discussion related to "Everything" 1.5 Alpha.
Post Reply
Roby.One
Posts: 2
Joined: Mon Jun 10, 2024 1:13 pm

Blake3 (hash function) implementation

Post by Roby.One »

Hi,

it would be possible to have the implementation of BLAKE3 hash function?

Here a c implentation https://github.com/BLAKE3-team/BLAKE3?t ... ementation

thanks in advance.
Bye bye
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Blake3 (hash function) implementation

Post by therube »

(Makes sense if you need a crypto hash. It is crypto, I think?
Though xxHash is much preferred if you don't need crypto.


Since I just did this the other day...


@ office, on SSD, i get 411 MB/s (using FcHash.exe %*) [fchash v5.5.0]


T: @ home 172 MB/s, 171, 176, 174, 166 (w/4 files rather then 2), 55 MB/s when combine T: & slow W:


K: USB Flash Drive (2.0 connector)
T: Toshiba 7200 spinner

O: SSD
X: [same] USB Flash Drive ("3.0" connector, not quite sure?)
B: [same] USB Flash Drive ("3.0" connector, not quite sure? - on back of computer [twas the connector i had my mouse plugged into])

Code: Select all

hash --sha1 e*
K: Total: 8files, 2129.6MiB, 64.7sec,  32.9MiB/s
T: Total: 8files, 2129.6MiB, 15.1sec, 140.9MiB/s
T: Total: 8files, 2129.6MiB, 12.3sec, 173.5MiB/s   (a 2nd run. 1st was immediately after copying the files to T:)
X: Total: 8files, 2129.6MiB, 11.6sec, 184.2MiB/s
B: Total: 8files, 2129.6MiB, 66.4sec,  32.1MiB/s   (so this is AS SLOW as K:)
O: Total: 8files, 2129.6MiB,  4.9sec, 430.7MiB/s

Code: Select all

hash --xxh3 e*
K: Total: 8files, 2129.6MiB, 64.0sec,  33.3MiB/s
T: Total: 8files, 2129.6MiB, 12.2sec, 173.9MiB/s
X: Total: 8files, 2129.6MiB, 11.0sec, 194.5MiB/s
B: Total: 8files, 2129.6MiB, 66.1sec,  32.2MiB/s
O: Total: 8files, 2129.6MiB,  4.4sec, 484.1MiB/s

Code: Select all

hash --sha256 e*
K: Total: 8files, 2129.6MiB, 64.6sec,  32.4MiB/s
T: Total: 8files, 2129.6MiB, 12.3sec, 173.5MiB/s   (i sure was not expecting this)
X: Total: 8files, 2129.6MiB, 11.6sec, 183.2MiB/s
B: Total: 8files, 2129.6MiB, 68.2sec,  31.2MiB/s
O: Total: 8files, 2129.6MiB,  7.0sec, 305.4MiB/s   (substantially slower)
)
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Blake3 (hash function) implementation

Post by therube »

Blake3 actually does rather well on my SDD (essentially the same as xxh).
(Do note that you must clear cache when "benchmarking" or you'll get erroneous results, i.e., reading from cache.)

Code: Select all

TimeThis :  Command Line :  fchash --xxh3 e*
TimeThis :    Start Time :  Mon Jun 10 11:01:34 2024
TimeThis :      End Time :  Mon Jun 10 11:01:40 2024
TimeThis :  Elapsed Time :  00:00:05.709

Code: Select all

TimeThis :  Command Line :  b3sum e*
TimeThis :    Start Time :  Mon Jun 10 11:02:06 2024
TimeThis :      End Time :  Mon Jun 10 11:02:12 2024
TimeThis :  Elapsed Time :  00:00:05.877
(Oh, same data set as above.)
Roby.One
Posts: 2
Joined: Mon Jun 10, 2024 1:13 pm

Re: Blake3 (hash function) implementation

Post by Roby.One »

In fact I don't need cryptographic hash, but a fast algorithm to use.

I would need the hash to verify duplicate files.

it would be more useful to have a hash of the video or audio without metadata, because I can change those often.

With ffmpeg there is a way to calculate the hash of the streams inside a file, while FLAC calculates it automatically.

I should test if ffmpeg gives the same result as the calculation done by flac.

It's probably enough to use MD5 even if it's broken but is slow.

Anyway I have to sit there and think of a way to have a unique way of identifying my media files even if I change the metadata.

It would be nice to have a quick function implementation to hash video and audio in anything other than SHA1 or MD5.

Maybe we can think of a method to do this, but I wouldn't want to burden the development of this product which is already fantastic as it is!
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Blake3 (hash function) implementation

Post by therube »

For as far as I understand it, hashmedia.bat.

Take a .mp4
Copy (transcode) it to .mkv
ffmpeg -i input.mp4 -c copy out.mkv
Do the same a second time (output to a different filename) .mkv
ffmpeg -i input.mp4 -c copy out2.mkv

Because .mkv has a ... "Unique ID", the two .mkv will not hash compare.
out.mkv will have different file hash from out2.mkv
But if you compare the files' media contents, you will see they do compare.
Roby.One
Posts: 2
Joined: Mon Jun 10, 2024 1:13 pm

Re: Blake3 (hash function) implementation

Post by Roby.One »

Tools Versions used in test:

Code: Select all

ffmpeg version 6.1.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi
--enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil      58. 29.100 / 58. 29.100
libavcodec     60. 31.102 / 60. 31.102
libavformat    60. 16.100 / 60. 16.100
libavdevice    60.  3.100 / 60.  3.100
libavfilter     9. 12.100 /  9. 12.100
libswscale      7.  5.100 /  7.  5.100
libswresample   4. 12.100 /  4. 12.100
libpostproc    57.  3.100 / 57.  3.100

Code: Select all

xxhsum.exe 0.8.2 by Yann Collet
compiled as 64-bit x86_64 autoVec little endian with GCC 13.1.0

Code: Select all

b3sum 1.5.1

Tests: (run times take with oh my posh!)

Code: Select all

17 files - 54,8 GB - HDD Toshiba 8TB 7200rpm

[ 3:44.769s]   ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f md5 -
[ 3:42.909s]   ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f hash - (default hash SHA256)
[ 3:41.846s]   xxhsum.exe -H0 $f
[ 3:41.898s]   xxhsum.exe -H1 $f
[ 3:41.898s]   xxhsum.exe -H2 $f
[ 7:46.274s]   b3sum_windows_x64_bin.exe -l 4 $f
[ 7:37.951s]   b3sum_windows_x64_bin.exe -l 8 $f
[ 7:36.662s]   b3sum_windows_x64_bin.exe -l 16 $f

Code: Select all

3 files - 10 GB - Ramdrive

[ 2.919s]      xxhsum.exe -H0 $f
[ 1.575s]      xxhsum.exe -H1 $f
[ 1.26s]       xxhsum.exe -H2 $f
[ 0.888s]      b3sum_windows_x64_bin.exe -l 4 $f
[ 0.883s]      b3sum_windows_x64_bin.exe -l 8 $f
[ 0.877s]      b3sum_windows_x64_bin.exe -l 16 $f
[ 9.423s]      ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f md5 -
[ 19.043s]     ffmpeg -hide_banner -loglevel error -i $f -map 0:0 -c copy -f hash - (default hash SHA256)
I did a simple test to get a rough idea of ​​how using one hash over another can bring me a clear advantage.

Examining a medium-sized group of files to bypass HDD's cache and windows's disk cache in a simple but unprofessional way, simulating a fast HDD, but with defragmented files and about 1% free space, using xxHash or md5 or SHA256, I don't get a noticeable performance increase.

The big difference in performance is obtained instead by using a ramdrive with 10GB of video files inside. With top transfer speed, blake3's speed doubles that of xxHash and as expected outperforms MD5 and SHA256.

If you are working with fast disks, implementing these 2 hashes might make sense.

I haven't yet managed to match the hash calculated by FLAC with that calculated by ffmpeg, I will certainly have to study for a moment, because the hash calculated by FLAC on the audio stream does not depend on the compression ratio chosen.

While to have a hash of a video stream excluding data that can be manipulated compared to just the video stream from the chosen container, ffmpeg seems to me the best choice.

Obviously ffmpeg supports a limited number of hashes and to implement new ones you would have to ask for it on their ticket system.

I haven't found a way to pass just the video stream directly to an external hashing program.

Now I'm trying to understand whether to add custom properties directly in Everything or add the value of the hash taken with ffmpeg in the tags field of the matroska.

For Example: V_MD5=xxxxxxxxxxx;City=Rome;Location=Centrum

By doing this I can search using "tags:"

Obviously it would be more convenient to have a supported matroska meta tag to be able to take advantage of Everything's column-level duplicate search.

An ongoing project of mine is to normalize all my videos to have only matroska containers which actually support almost any file type.

Obviously I won't comment on Matroska's proprietary Tagging system: a unique mental delirium!
Last edited by Roby.One on Wed Jun 12, 2024 2:55 pm, edited 2 times in total.
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Blake3 (hash function) implementation

Post by therube »

Code: Select all

	T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
	x64.iso
	fdfb55611fd6c1d3f8  T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
	e3783ea5a9d00b1196  W:/Z/Win10_1809Oct_English_x64.iso
	
	
	Kernel  Time =    17.004 =    3%
	User    Time =    17.175 =    3%
	Process Time =    34.179 =    7%    Virtual  Memory =     12 MB
	Global  Time =   457.089 =  100%    Physical Memory =   5374 MB

Code: Select all

	T:\WINDOWS\ :
	  xxh3 <b3ae76391873db5eba87765c95144f03>: Dell_Win7_professional_64bit_sp1.iso
	W:\Z\ :
	  xxh3 <9a34afbb378239743641e6d8dc31a8d1>: Win10_1809Oct_English_x64.iso
	
	Total: 2files, 10200.1MiB, 173.3sec, 58.9MiB/s
	
	
	Kernel  Time =     0.811 =    0%
	User    Time =     1.435 =    0%
	Process Time =     2.246 =    1%    Virtual  Memory =     10 MB
	Global  Time =   173.343 =  100%    Physical Memory =     12 MB
blake3 (1.5.1) vs xxHash (0.8.2)

457 vs 173 sec -> put into english, that's 7 min 37 sec vs. 2 min 53 sec, that's quite the difference
5300 MB vs 12 MB RAM (Working Set, peak) -> now, just not sure just what that means, how "things" are affected, but it's quite the difference

(depending on ones fileset & other factors, difference might not be so large)

T: & W: are both spinners, T: 7200 internal, W: ? in a USB 2.0 enclosure [SLOWWW]
(i5-3570K, 16 GB RAM)


just some topics:
https://github.com/BLAKE3-team/BLAKE3/issues/278
https://github.com/BLAKE3-team/BLAKE3/issues/305
https://github.com/BLAKE3-team/BLAKE3/issues/208


Now on an SSD:
(i7-3770S, 8 GB RAM)
(not the same files as above, but close)

Code: Select all

C:\out\mozregression\small>timer b3sum -l 9 C:\out\mozregression\small\7601.24214.180801-1700.win7sp1_l
dr_escrow_CLIENT_ULTIMATE_x64FRE_en-us.iso C:\out\KKK\K-ORSAIR-0202\amber-OK\Dell_Win7_professional_64b
it_sp1.iso
c1395f4b35ad151961  C:/out/mozregression/small/7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMAT
E_x64FRE_en-us.iso
fdfb55611fd6c1d3f8  C:/out/KKK/K-ORSAIR-0202/amber-OK/Dell_Win7_professional_64bit_sp1.iso


Kernel  Time =    10.108 =   23%
User    Time =    11.637 =   26%
Process Time =    21.746 =   50%    Virtual  Memory =     13 MB
Global  Time =    43.410 =  100%    Physical Memory =   5373 MB

Code: Select all

C:\out\mozregression\small\ :
  xxh3 <f3d06ac283e42848487d9a7d0d770567>: 7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMATE_x6
4FRE_en-us.iso
C:\out\KKK\K-ORSAIR-0202\amber-OK\ :
  xxh3 <b3ae76391873db5eba87765c95144f03>: Dell_Win7_professional_64bit_sp1.iso

Total: 2files, 10963.8MiB, 22.2sec, 494.9MiB/s


Kernel  Time =     0.733 =    3%
User    Time =     1.450 =    6%
Process Time =     2.184 =    9%    Virtual  Memory =     10 MB
Global  Time =    22.173 =  100%    Physical Memory =     13 MB
So xxhash is still 2x faster then blake3.

For some reason, that 1st run of blake3 might have been a bit slow.
Later attempts have been faster, though still not as fast as xxhash.
Note that theoretically speaking xxhash -H128 should be a bit faster then the other variants.
(And the implementation in FcHash, might even be a bit faster then that of xxhash itself?)

Code: Select all

C:\out\mozregression\small>timer b3sum -l 9 C:\out\mozregression\small\7601.24214.180801-1700.win7sp1_l
dr_escrow_CLIENT_ULTIMATE_x64FRE_en-us.iso C:\out\KKK\K-ORSAIR-0202\amber-OK\Dell_Win7_professional_64b
it_sp1.iso
c1395f4b35ad151961  C:/out/mozregression/small/7601.24214.180801-1700.win7sp1_ldr_escrow_CLIENT_ULTIMAT
E_x64FRE_en-us.iso
fdfb55611fd6c1d3f8  C:/out/KKK/K-ORSAIR-0202/amber-OK/Dell_Win7_professional_64bit_sp1.iso


Kernel  Time =     9.219 =   26%
User    Time =    12.339 =   35%
Process Time =    21.559 =   62%    Virtual  Memory =     14 MB
Global  Time =    34.523 =  100%    Physical Memory =   5620 MB
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Blake3 (hash function) implementation

Post by therube »

Oh, & you thought we were done ;-).


--no-mmap, makes a HUGE difference on my end (i5-3570K, 16 GB RAM)
totally changing the "dynamics" of the hash computation

Code: Select all

T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
x64.iso    --no-mmap
fdfb55611fd6c1d3f8  T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
e3783ea5a9d00b1196  W:/Z/Win10_1809Oct_English_x64.iso

Kernel  Time =     5.007 =    3%
User    Time =     9.968 =    6%
Process Time =    14.976 =    9%    Virtual  Memory =      1 MB
Global  Time =   154.813 =  100%    Physical Memory =      3 MB
- why ?

so now blake3 is FASTER then xxhash, 2:35 vs 2:54 in blake3's favor (so 19 sec quicker) - why ?
(am i not invalidating caches, properly [not that i'm sure just HOW to do that?], OR ... ?)

---

With another dataset:


T:\K-ORSAIR\LIB\WIN7-DELL-HomePremium-ISO\sources/*
163 files, 3.2 GB, mostly small files < 10 MB, boot.wim 168 MB, install.wim 2.9 GB

(from fchash) Total: 163files, 3059.6MiB, 16.2sec, 188.9MiB/s
(so 188 MB/s is, i'm guessing, close to theoretical I/O speed for a 7200 spinner)

Code: Select all

blake-no.bat		--no-mmap	 16.024
xxhash-fc-real.bat	fchash --xxh3	 16.219
blake.bat                    		167.557
so if using
4-cores, & "memory mapping"
vs
1-core & NOT memory-mapping
is slower, for me...

that means, what?
that my memory is "slow", that the usage of multiple cores on my end is not that efficient ?

and to top it off, --num-threads 1:

Code: Select all

T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.iso W:\Z\Win10_1809Oct_English_
x64.iso    --num-threads 1
fdfb55611fd6c1d3f8  T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso
e3783ea5a9d00b1196  W:/Z/Win10_1809Oct_English_x64.iso


Kernel  Time =    19.500 =    7%
User    Time =    12.043 =    4%
Process Time =    31.543 =   12%    Virtual  Memory =     12 MB
Global  Time =   251.963 =  100%    Physical Memory =    673 MB
so...

--num-threads 1, is way slower then --no-mmap, but way faster then defaults [i've got to check that]
while at the same time using more "Physical Memory" then --no-mmap, but FAR less then defaults

so... now, i'm really confused (scratches:head)

this was from 2day, & quicker then the same from yesterday ?

Code: Select all

T:\WINDOWS>timer b3sum -l 9 T:\WINDOWS\Dell_Win7_professional_64bit_sp1.is
fdfb55611fd6c1d3f8  T:/WINDOWS/Dell_Win7_professional_64bit_sp1.iso


Kernel  Time =    10.966 =    3%
User    Time =     8.377 =    2%
Process Time =    19.344 =    5%    Virtual  Memory =     12 MB
Global  Time =   329.894 =  100%    Physical Memory =   5374 MB
(do i have the same files on each system, heh?) --- NO, that's T: & T:, where b4 it was T: & W: (& W: is much slower), so *NOT* a valid comparison !

So they know what I found out, b3sum has poor performance for large files on spinning disks, when multi-threading is enabled.
Post Reply