First of all, I would like to note the excellent quality of your program Everything!
First question.
Are there any plans to introduce the ability to set the number of threads for individual folders/disks that will be used to index properties?
A real-life example: there are several network storage devices, each of which has several network folders accessible via Samba. Some devices work well with a number of simultaneous requests of about 10-20. Other devices work well only with one thread at a time - if you make, for example, 10 requests simultaneously, the response time increases by 50 or more times. And finally, there are some devices that provide maximum performance with 4-6 simultaneous requests.
In my case, the main time cost is getting SHA-256 file hashes.
It seems that the ability to set the number of threads individually for each device could increase the performance of obtaining hashes.
And the second question.
Will it be possible to add a new property for folders: calculate a hash based on existing (previously calculated) hashes of all files inside the folder?
At the moment, the folder hash can be calculated:
* based on file names, including the folder name - this allows you to search for identical copies by name, but excludes the ability to search for copies with different names of base (root, parent) folders
* based on file hashes - but for each folder, all file contents are read anew. For example, if you add a folder containing the following structure to the indexed ones:
my-app/
├─ X/
│ ├─ Y/
│ │ ├─ Z/
│ │ │ ├─ file_1_terabyte
├─ file_1_megabyte
then file_1_terabyte will be read from disk at least 3 times - to get the SHA-256 hash of folders X, X/Y and X/Y/Z. And if you add the SHA-256 property for files, then file_1_terabyte will be read a fourth time.
It seems that it would be great if there was an option to calculate the folder hash based on a list of file hashes that have already been calculated - this way you would have to read the contents of each file only once.
Performance for hashes (SHA-256 and others)
Re: Performance for hashes (SHA-256 and others)
Thank you for your feedback tirael,
Everything 1.5 -> Tools -> Options -> NTFS/Folders -> Volume/Folder -> Right click -> Advanced -> Threads -> Multiple threads.
Set the maximum number of indexing threads with Tools -> Options -> Advanced -> index_max_threads
Set the maximum number of property request threads with Tools -> Options -> Advanced -> content_max_threads
Everything will automatically use multiple threads for SSDs.
Please try one of the of following properties:
Folder Data and Names SHA-256
Folder Data SHA-256
Folder Names SHA-256
Hashes will match 7zip.
Thank you for the suggestions.
Yes.Are there any plans to introduce the ability to set the number of threads for individual folders/disks that will be used to index properties?
Everything 1.5 -> Tools -> Options -> NTFS/Folders -> Volume/Folder -> Right click -> Advanced -> Threads -> Multiple threads.
Set the maximum number of indexing threads with Tools -> Options -> Advanced -> index_max_threads
Set the maximum number of property request threads with Tools -> Options -> Advanced -> content_max_threads
Everything will automatically use multiple threads for SSDs.
Everything 1.5 has folder name and folder data hashes.Will it be possible to add a new property for folders: calculate a hash based on existing (previously calculated) hashes of all files inside the folder?
Please try one of the of following properties:
Folder Data and Names SHA-256
Folder Data SHA-256
Folder Names SHA-256
Hashes will match 7zip.
Thank you for the suggestions.
Re: Performance for hashes (SHA-256 and others)
Thank you for your reply!
Will it be possible to set a different number of threads for each folder? For example, for \\server\share1 - 2 threads, for \\server\share2 - 8 threads, for \\super-server\supershare - 20 threads?
For example, if a 1 terabyte file is at the 20th nesting level, then it will take a total of 20 (twenty) times to read the file, i.e. read 20 terabytes, to calculate the hashes for all folders. This, of course, is completely suboptimal, and therefore cannot be used for large volumes (more than hundreds of gigabytes).
Will it be possible to change the algorithm for calculating folder hashes so as not to make meaningless multiple reads of the same file? Or add a new property that would be calculated using the new algorithm.
I know about this setting, and I meant its expansion of functionality.Everything 1.5 -> Tools -> Options -> NTFS/Folders -> Volume/Folder -> Right click -> Advanced -> Threads -> Multiple threads.
Will it be possible to set a different number of threads for each folder? For example, for \\server\share1 - 2 threads, for \\server\share2 - 8 threads, for \\super-server\supershare - 20 threads?
Of these properties, only "Folder Data SHA-256" is relevant for comparison strictly by content. But, as I mentioned above, all files will be read as many times as there are folder nesting levels.Folder Data and Names SHA-256
Folder Data SHA-256
Folder Names SHA-256
For example, if a 1 terabyte file is at the 20th nesting level, then it will take a total of 20 (twenty) times to read the file, i.e. read 20 terabytes, to calculate the hashes for all folders. This, of course, is completely suboptimal, and therefore cannot be used for large volumes (more than hundreds of gigabytes).
Will it be possible to change the algorithm for calculating folder hashes so as not to make meaningless multiple reads of the same file? Or add a new property that would be calculated using the new algorithm.
Re: Performance for hashes (SHA-256 and others)
Currently, no.Will it be possible to set a different number of threads for each folder?
I will consider an option to set the number of threads per folder.
Currently, no.Will it be possible to change the algorithm for calculating folder hashes so as not to make meaningless multiple reads of the same file?
I will look into caching the sha256 value for each file.
For now, gather the information on the root folder only.