MD5 Property indexing for network drives seems to die

Discussion related to "Everything" 1.5 Alpha.
Post Reply
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

MD5 Property indexing for network drives seems to die

Post by EricB »

Hi,

I'm using Everything 1.5.0.1361a (x64) on Windows 10 22H2. I've several network drives on a Synology NAS attached, with lots of files. I know I've quite some duplicates there, so I defined an MD5 property so that information would be added to the index permanently (MD5 calculation on demand is quite slow). I'd like to use the MD5 property to find duplicates, since file names for files with same content might differ, and file size is not distinctive enough.

The problem I have with this setup is that a large part of the files on those NAS Network drives do not get the MD5 property during indexing, and I cannot figure out why. It looks like the indexing of the property halts somewhere. What would be the best way to find out what is happening here?

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Progress is shown in the status bar.
Hover over the progress bar to show the current file being indexed.



Errors are reported in the Debug Logs.

Please try enabling debug logging and refreshing files with missing md5 properties:
  • In Everything, from the Tools menu, under the Debug submenu, check Start Debug Logging.
  • From the Tools menu, under the Debug submenu, check Verbose.
  • Type in the following search:
    !md5:
  • Select all files (Ctrl + A)
  • Press Ctrl + F5.
  • wait for Everything to stop indexing md5 values.
  • Check your %TEMP%\Everything Debug Log.txt for any errors.
  • Look for the following lines:
    get property <property-id> <filename>
    CreateFileW(): <error-code>: Failed to open file <filename>
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Hello void,

Thanks for the prompt response. I've done as you described, and again saw indexing stopping intermittently.

Some observations:
  • A lot of files in Windows system seem to render the CreateFileW error. I excluded drive c: from the !md5 query for now. Not sure if that is one of the reasons for halting.
  • When not confined to a single network drive, indexing seems to go all over the place, seemingly picking random files for property calc. This is seen when hovering over the index progress bar. Probably due to the position in the index?
  • Whenever the machine locks itself, and being left a bit longer, also indexing seems to halt. Still it does not seem to be in sleep mode.
I've now confined the !md5 search to one single drive, containing about 37K files without property. It is now munching away those slowly.

Will give an update later on.

Regards, EricB

Update: even when excluding the c: drive from the !md5 query, I see several CreateFileW errors on c: appearing in the log. Does this mean that the indexing has a "backlog" that it wants to keep up with? Even when it broke off?

Update2: when confining to the single drive, I do not see those c: entries anymore. I excluded c: before by "!md5 !c:" but maybe the ! operator is not working properly in this case?
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

I also found out on another machine that certain entries on a Google Drive volume are not MD5 property indexed:
extensions are gdoc gslides gsheet gjam gform lnk. I guess these are treated like hardlinks here and therefore skipped?
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: MD5 Property indexing for network drives seems to die

Post by therube »

I excluded c: before by "!md5 !c:"
If you want to exclude (from Index rather then display), you do that in:
Tools | Options | Indexes | Properties -> Exclude
.
lots of files
How many is lots?
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

therube wrote: Mon Dec 04, 2023 3:39 pm If you want to exclude (from Index rather then display)
Well, I wouldn't exclude c: from indexing, but exclude it from the list of files not having an md5 yet. Those are selected and triggered into property indexing by ctrl-F5.
therube wrote: Mon Dec 04, 2023 3:39 pm How many is lots?
About 320K files, which might be reduced quite some by deduplication.
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: MD5 Property indexing for network drives seems to die

Post by therube »

I wouldn't exclude c: from indexing
Correct.
But as you do want to exclude the Property Indexing (of MD5) for C:, you would do that as above.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Another observation:
After the select all and ctrl-F5 the indexing for the single network drive was busy. From the corner of my eye (this machine is next to me on the desk, working on another one) I saw suddenly the GUI refreshing, getting rid of the selection and the indexing stopped. So could it be the GUI is interrupting here?

I saw in the log that c: indexing kicked in at the very same time, so following TheRube's suggestion I excluded c: now from property indexing. However this triggered a full property indexing of all network drives. 279K files to go....
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

I recommend using sidecar files to store md5 values.


A lot of files in Windows system seem to render the CreateFileW error.
This is normal.
Everything will only access files with standard user rights to calculate the md5.
Everything will not be able to access system files.


When not confined to a single network drive, indexing seems to go all over the place, seemingly picking random files for property calc. This is seen when hovering over the index progress bar. Probably due to the position in the index?
Everything should be gathering properties in path order.
Have you enabled Tools -> Options -> Advanced -> default_multithreaded? -Please make sure this is set to: (Use default)
Have you enabled multiple threads for your network index under Tools -> Options -> Folders -> Right click Network Drive -> Advanced -> Threads -Please make sure this is set to: (Use default)


Whenever the machine locks itself, and being left a bit longer, also indexing seems to halt. Still it does not seem to be in sleep mode.
I'm checking things my end..



Could you please send your Help -> Troubleshooting information.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Hi void,
void wrote: Tue Dec 05, 2023 3:30 am I recommend using sidecar files to store md5 values.
I understand that sidecar files would lessen the load on Everything itself, but the beauty of internal checksums, once generated, is that they are maintained with every file operation, beit copy, move or delete. Less issues with lingering remains, so to say.
void wrote: Tue Dec 05, 2023 3:30 am Everything should be gathering properties in path order.
Have you enabled Tools -> Options -> Advanced -> default_multithreaded? -Please make sure this is set to: (Use default)
Have you enabled multiple threads for your network index under Tools -> Options -> Folders -> Right click Network Drive -> Advanced -> Threads -Please make sure this is set to: (Use default)
I've checked the multithreaded settings, and all is set to default. I was thinking that setting higher priority might work advantageously, but didn't want to tinker.
void wrote: Tue Dec 05, 2023 3:30 am Could you please send your Help -> Troubleshooting information.
Gladly. What is the preferred method? The contact address that is listed on voidtools.com?

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

support@voidtools.com
-or-
anonymously with Bug Report (please paste the information in details)

Thank you.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Sent it, might end up in Junk, since I attached the info as a text file.
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Got it, thanks.

I am currently looking into the issue.



The first thing to catch my eye was "exclude_recall_on_data_access":1 (Tools -> Options -> Properties -> MD5 -> Exclude recall on data access)

I wonder if these network files have the M attribute set?
Please check if M is shown in the attributes column for these files.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

void wrote: Tue Dec 05, 2023 9:29 am
The first thing to catch my eye was "exclude_recall_on_data_access":1 (Tools -> Options -> Properties -> MD5 -> Exclude recall on data access)

I wonder if these network files have the M attribute set?
Please check if M is shown in the attributes column for these files.
Checked that, but no. I just enabled this to make really sure that no cloud file would be downloaded on access. Not that it is really necessary, I always keep the files available locally. So overdoing it a bit here....

file: !md5 distinct:attributes comes up with A, HA, HS, HSA, RA attributes only for files on network drives. And it's mostly A.
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: MD5 Property indexing for network drives seems to die

Post by therube »

I wonder if these network files have the M attribute set?
So Attribute 'M' is "OneDrive" specific?
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

EricB wrote: Tue Dec 05, 2023 8:18 am I understand that sidecar files would lessen the load on Everything itself, but the beauty of internal checksums, once generated, is that they are maintained with every file operation, beit copy, move or delete. Less issues with lingering remains, so to say.
At least, this is what I thought. But I moved a bunch of duplicate files to a new folder within the same drive using the Advanced move within Everything, and I see that the MD5 property for those files is empty and recalculated.

@void can you confirm this is as designed? I'd reckon when a file is moved, the index entry is updated, but that such a property would remain as is in stead of being cleared.
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

@void can you confirm this is as designed?
Yes, unfortunately.

Everything sees the move as removed + added.
The md5 property value is removed and then re-gathered.

This is just a limitation with ReadDirectoryChanges.

If you rename the parent folder or rename the file then the md5 property is not cleared/regathered.



I have been testing your settings my end and haven't run into the issue yet.
Locking a work station appears fine.

The only way I was able to simulate the issue is when I pulled out the network cable to the PC.
Everything will silently fail when it tries to read the remaining md5 values.
The md5 values appear empty in Everything.

Maybe the network is disconnecting for you?
Could you please check your event viewer and see if there is a system event for a network disconnection.



If md5 values are missing I recommend the following:
Search for:
!md5:
Select all files and press Ctrl + F5.



I have on my TODO list to re-request missing md5 property values when the device comes back online.

I also have on my TODO list to keep retrying for a single property value for up to one minute.
Currently, Everything will abandon all remaining property requests for a volume if it goes offline.
-If your network is down for 1 second, all remaining property requests will fail for that volume.
Hopefully this change will only led to one md5 value missing instead of all remaining md5 values..
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

void wrote: Wed Dec 06, 2023 4:51 am
Everything sees the move as removed + added.
The md5 property value is removed and then re-gathered.

This is just a limitation with ReadDirectoryChanges.
Too bad, but I understand the reason.
void wrote: Wed Dec 06, 2023 4:51 am I have been testing your settings my end and haven't run into the issue yet.
Locking a work station appears fine.

The only way I was able to simulate the issue is when I pulled out the network cable to the PC.
My machine is a laptop, connected by Wi-Fi, although it is plugged in, it locks after some 15 min of idle time.
void wrote: Wed Dec 06, 2023 4:51 am Maybe the network is disconnecting for you?
Could you please check your event viewer and see if there is a system event for a network disconnection.
I checked, and I see Kernel power events for the system entering the Connected standby due to idle time. [Lock screen]
After that some network events "7026 - Dump after return from D3 before/after cmd".
Only when the timeout is longer I see nhi reporting this: "The driver entered RTD3. All the connected devices will be removed from driver's internal state, so it is expected that DeviceDisconnected events will happen."
Not sure if that is a real disconnect, since the network drives are certainly not offline and readily available when unlocking the laptop.

It seems however that the disruption is not only happening during Lock screen idle time. I saw this happening a few times:
EricB wrote: Mon Dec 04, 2023 4:11 pm Another observation:
After the select all and ctrl-F5 the indexing for the single network drive was busy. From the corner of my eye (this machine is next to me on the desk, working on another one) I saw suddenly the GUI refreshing, getting rid of the selection and the indexing stopped. So could it be the GUI is interrupting here?
Current state of affairs is now that I just have to check regularly if the indexing has stopped. If so, I've to Ctrl-A and Ctrl-F5 again. It takes some time, but in the end I'm getting there. And the MD5 property is working very well for finding duplicates, I've already identified multiple gigabytes of duplicates.
void wrote: Wed Dec 06, 2023 4:51 am I have on my TODO list to re-request missing md5 property values when the device comes back online.

I also have on my TODO list to keep retrying for a single property value for up to one minute.
Currently, Everything will abandon all remaining property requests for a volume if it goes offline.
-If your network is down for 1 second, all remaining property requests will fail for that volume.
Hopefully this change will only led to one md5 value missing instead of all remaining md5 values..
Even if these additions would not help my case, they seem very sensible for other cases.

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Current state of affairs is now that I just have to check regularly if the indexing has stopped.
Could you please run Everything in verbose debug mode:
  • In Everything, from the Tools menu, under the Debug submenu, check Verbose.
  • From the Tools menu, under the Debug submenu, check Start Debug Logging.
    ---wait for Everything to stop property indexing---
  • In Everything, from the Tools menu, under the Debug submenu, click Stop Debug Logging.
    ---this will open your %TEMP%\Everything Debug Log.txt in Notepad.
  • Look for the following lines:
    CreateFileW(): <error-code>: Failed to open file <filename>
  • There should be a lot of these messages.
  • What is the error code?
These logs might help catch what is happening and confirm my assumptions (network connection lost and all pending property requests are aborted)
I'm working on adding a retry when gathering properties and the volume goes offline.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Hello void,

Today I indeed followed the procedure as described, it was happily MD5-ing, I got away from the machine 10 minutes and after unlocking it, I saw that the indexing had stopped.

Since the debug file has become quite large (even zipped it is 12 Mb), and I don't want to strip it, could I send it to you via Dropbox or WeTransfer?

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Yes, please.

Please send the link to support@voidtools.com
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Thank you for the debug log.

2023-12-09 14:08:52.607: GetOverlappedResult M: <share-name> 64
(there's a lot of these, one for each share, same error code)

Error 64 is ERROR_NETNAME_DELETED

The specified network name is no longer available.
All pending property requests are aborted.

I am working on a solution..
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Everything 1.5.0.1362a will now keep trying to read properties on offline volumes.

Everything should now only miss gathering a few properties when your network drops (instead of all remaining pending property requests on the offline volume)
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Thank you, David!

I'm going to test this the coming days. Is Ctrl-F5 on !MD5: search selection still the best way continuing indexing the properties in this case? Or should I just re-index the network drive?

Just out of curiosity, did you implement some mechanism to mark a network drive "dirty", if it looses connection, so indexing can be picked up again when connection is restored?

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

I'm going to test this the coming days. Is Ctrl-F5 on !MD5: search selection still the best way continuing indexing the properties in this case?
Yes.


Just out of curiosity, did you implement some mechanism to mark a network drive "dirty", if it looses connection, so indexing can be picked up again when connection is restored?
No, not yet.
It's on my TODO list.

I did experiment doing this last week.
Unfortunately, to determine if a property value was gathered successfully is rather CPU and Disk IO expensive.
EricB
Posts: 53
Joined: Wed Jun 26, 2013 8:56 am

Re: MD5 Property indexing for network drives seems to die

Post by EricB »

Hi void,

I did some experimenting MD5-ing on Network drives with large amounts of files. I see an improvement insofar that the network disconnect does not seem to happen anymore. However, I got a GUI hangup twice, which unfortunately also undid all the MD5 progress, since the database wasn't flushed. Funny thing was that the console was still running stuff, but the GUI did not recover anymore.

I've retrieved the Windows crash reports for both events. Also I've got the debug logs for both runs. Do you want me to send them for analysis?

Regards, EricB
void
Developer
Posts: 16777
Joined: Fri Oct 16, 2009 11:31 pm

Re: MD5 Property indexing for network drives seems to die

Post by void »

Post Reply