MID-TERM TEST on the Everything 1.5 function Dupe:

Discussion related to "Everything" 1.5 Alpha.
Post Reply
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

MID-TERM TEST on the Everything 1.5 function Dupe:

Post by ChrisGreaves »

Edited: COMMENTS WELCOME!
Dupe_09.png
Dupe_09.png (79.96 KiB) Viewed 690 times
You have on your system a folder tree T:\Music\ with 19,180 MP3 audio tracks. What is your strategy for locating tracks whose sound is identical?[1]

Remember: Find Duplicates
(1) The 1.5 Dupe function supports a maximum of 3 properties.
(2) If no property is specified, Everything will default to the Name property.
(3) To find files with duplicated content, Include the following in your search: dupe:size;sha256; Everything will first compare file sizes …

My answer:-

(a) Everything 1.5 Dupe: defaults to Name, but Name is not a perfect indicator of tracks whose sound is identical. So forget Name. (We will need to see Path and Name in the List Results, but they need not be part of our filter)
(b) Since “identical sound” is a criteria, that suggests Content: or SHA256:
(c) To compute SHA256 Everything must read the entire Content of the files, but total content takes more RAM storage than does SHA256[2], so stick with SHA256 rather than Content:.
(d) On my system “T:\Music\ *.mp3 Dupe:Size” returns 2,621 objects
(e) My fastest means of locating truly identically sounding tracks would be “T:\Music\ *.mp3 Dupe:Size;SHA256”. My theory is that the Size term reduces the Result List to 2,621 objects, and although SHA256 might re-examine size, Size: is indexed, and SHA256 is inspecting the contents of only 2,621 objects.

This ran for about one minute on my old Win7 DELL laptop, and returned 23 objects.
Dupe_10.png
Dupe_10.png (114.47 KiB) Viewed 690 times
For extra marks: Do you really think that the three tracks circled are identical in content? (I played them back. All three sound to me like a live performance (orchestra tuneup, applause, depressing music …) Gustav Mahler symphony
(1) Everything rarely makes a mistake.
(2) Humans (in my experience) make lots of mistakes.
(3) I have almost certainly erred in saving one track.
(4) Chances are strong that someone who uploaded one tracks knew less about classical music than did I.
(5) Adding superfluous terms such as Name: would have caused us to miss many duplicate tracks!

This year's final exam will be practical in nature: Devise a strategy and a package to perform comparison tests based on elapsed run-time on various Everything 1.5 filters for Dupe.

Cheers, Chris :D
[1] The test shown here is a simple test. I consider two tracks to be identical if one has had the applause removed and the other track has not.
[2] I ran 35,291 characters of plain text through this online generator and received in return "f75c360777e7c253be7ab0201ec03e14ff77dad967e01511df8842c81d152d75"
Last edited by ChrisGreaves on Mon Feb 13, 2023 7:58 pm, edited 1 time in total.
therube
Posts: 5003
Joined: Thu Sep 03, 2009 6:48 pm

Re: MID-TERM TEST on the Everything 1.5 function Dupe:

Post by therube »

If the hashes agree, 'o, there's a pretty good chance they're identical.


Better then uploading some text & getting a hash, if you could take that hash & reconstitute the source data, now that would be something :lol:.
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: MID-TERM TEST on the Everything 1.5 function Dupe:

Post by ChrisGreaves »

therube wrote: Mon Feb 13, 2023 7:24 pm... if you could take that hash & reconstitute the source data, now that would be something :lol:.
As soon as I get this tutorial finished! :lol: :twisted: :roll:

Back in the Word2.0/6.0 days I wrote a function that would lock functions and could not be hacked; it was in effect an irreversible process.
Cheers, Chris
Post Reply