Help with Property Indexing

Discussion related to "Everything" 1.5 Alpha.
Post Reply
sebbiep
Posts: 3
Joined: Fri Jun 09, 2023 5:31 pm

Help with Property Indexing

Post by sebbiep »

I'm using Everything to index mostly media files in my house - it's been a game changer. I perform regular extensive file operations and the first thing I use in any process is Everything. I'm indexing a couple of 6TB ingress arrays on local PCs (NTFS index - instant), and 2 (production) of my 4 large (>100TiB) network arrays (Synology NAS / Windows) with total 405TB RAW / 344TiB formatted (about 51% full). Daily rescans of this currently used 180TiB of network files (with file size, folder size, date modified, fast size sort & fast date modified sort) takes just minutes and is very efficient.

But I've got very bogged down for the past few months with property indexing. I'm trying to index 3 properties below (with the reason why):

Length - to find duplicate videos including with different formats and resolutions.
Width - to find and delete corrupt images - which usually don't have valid dimensions.
Shortcut Target - to find and correct / delete invalid shortcuts that have evolved over my past 20 years of storage changes.

The numbers for each are:
Shortcuts - 25,000
Images - 26,000,000
Videos - 300,000

Indexing the length and shortcut properties are pretty quick, but the Image width is my issue. Everything is indexing around 2.5 million images per day, so depending on NAS workload - around 15 days to complete. However, except for just once it has never completed. To complicate matters, over the past few months, I've extensively upgraded / replaced all my NAS hardware and changed the network shares etc. There appear to be a number of circumstances where the property indexing restarts from 0% again, hence I've been trying to get this "up to date / stable" for maybe 4 to 5 months.

The main PC that I'm running this on has a fast high-core-count Intel Xeon with 256GB of ECC ram and NVME drives. Everything is using around 7.5GB of ram as it gets close to comepleting the indexing, so a small percentage of the 256GB available.

I have R/W NVME caches on all 4 network Synology NASes, which I hoped would speed things up a little. But especially on the larger NAS (with 25 million + files) the property indexing was just thrashing the cache and overflowing it many times over - so no real advantage other than wearing the SSDs out. The NVME cache read IOPS due to property indexing were not very diferent to the combined IOPS of the 18 x HDDS in the capacity spinning rust array. I've now disabled this cache (to spare the SSDs), until this round of indexing completes.

I'd expect that when I exit Everything, it would save the entire index database to disk (currently around 1.6GB on disk) and would restart where it left off on reboot etc. This does seem to be happening now - but in the past few months there have been many restarts from 0% - for a variety of reasons.

Questions (to help me understand how property indexing is working):

1. Is there a way to ensure that the propertiy indexing progresss doesn't get lost and restart from 0%?
2. Why is the Everything ram usage of circa 7 to 8Gb so different to the database saved on disk of around 1.6GB?
3. What should I avoid doing in Everything etc to prevent starting property indexing from 0%? Obvious things seem to be - don't change anything at all - including any of the indexing options even those unrelated to properties.
4. Any better suggestions to achieve my goals with the 3 properties (as explained above i.e. duplicate videos, currupt images and and invalid shortcuts)?
5. Am I wasting my time and just trashing the disks? File / Folder indexing gives me 99% of what I want in minutes. This last "nice to have" property indexining is taking weeks - and in fact months due to restarts.
void
Developer
Posts: 16774
Joined: Fri Oct 16, 2009 11:31 pm

Re: Help with Property Indexing

Post by void »

1. Is there a way to ensure that the propertiy indexing progresss doesn't get lost and restart from 0%.
The database structure may change during 1.5 development.
A database rebuild will be required if the database structure changes.
A warning will shown in bold in the change log for these updates.


2. Why is the Everything ram usage of circa 7 to 8Gb so different to the database saved on disk of around 1.6GB?
The Everything database is compressed when saving to disk.


3. What should I avoid doing in Everything etc to prevent starting property indexing from 0%? Obvious things seem to be - don't change anything at all - including any of the indexing options even those unrelated to properties.
Changing property indexing settings can cause Everything to rescan all your properties.
Everything will try to keep your existing property values.


4. Any better suggestions to achieve my goals with the 3 properties (as explained above i.e. duplicate videos, currupt images and and invalid shortcuts)?
Some ideas..

Reduce the number of files indexed.

Instead of indexing these properties, gather them as needed.
Work on small subsets at a time.

If the data is stored on NVMe drives, please try enabling multi-threading:
  • In Everything, from the Tools menu, click Options.
  • Click the NTFS tab on the left.
  • For each NTFS volume on an NVMe drive:
    • Right click the Drive.
    • Under the Advanced menu, under the Threads submenu, check Multiple threads.
  • Click OK.
Using Everything properties should be faster than the Windows Property System.
Everything will use the Windows Property System by default.
Please try disabling the Windows Property System:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    system
  • Select property_system.
  • Set the value to: false
  • Click OK.
Please consider Custom values for properties

Basicly, create a file list of the filenames and property values once.
Everything will use these values instead of gathering them from disk.
This might also struggle with 20,000,000+ files.

A Property Server is planned for a future version of Everything.
sebbiep
Posts: 3
Joined: Fri Jun 09, 2023 5:31 pm

Re: Help with Property Indexing

Post by sebbiep »

Thank you very much for the fast, comprehensive and very useful reply.

The property indexing has completed today after around 12+ days. The shortcuts and video lengths were all completed (apart from around 6,000 very old mpg videos which probably need converting). Around 500,000 images (of 26m) did not get their width indexed, but copying those folders in & out seems to resolve this.

Whilst I was doing this I could see the impact of the property indexing on the large folders that I was moving. Although these large network shares are basically media archives, they are quite dynamic - i.e. I am adding and reorganising many 100,000s of files each week. Hence in light of your response and my experience, I think 24x7 property indexing on 26,000,000+ images is a bit too much. It also probably dragging down performance on the large scale file transfers.

I have now found and deleted all the corrupt images. Hence I will follow your advice,diasable the image property indexing, and in future re-check every few months with sub-sets of the data.

Thanks again.
sebbiep
Posts: 3
Joined: Fri Jun 09, 2023 5:31 pm

Re: Help with Property Indexing

Post by sebbiep »

Without the image property indexing (which has now done it's job of finding corrupt images), the remaining property indexes only took around 1 hour from scratch and minimal overhead thereafter.

More importantly, I've re-enabled the NVME cache on my largest 250TiB NAS and file ops performance are now quite sprightly. I'm now getting close to the the underlying raw network / disk array speeds, whereas before it was quite laggy and bottlenecking. Although I don't have direct control of what gets cached, my aim with the NVME NAS caches was to mostly cache basic file metadata. When I was indexing the width property on 26m images, the 2TB NAS cache on the large NAS was 100% filling and then overwriting multiple times - for no real benefit. Now it looks like the cache will be roughly correctly sized for the "small IO" workload of these media archives - which will be mostly metadata.

My initial hope was that once the initial heavy lifing of creating the property index (with fast sort) for the 26m images was complete, subsequent "maintenance" property indexing would be low impact. However because I add 100,000s of images each week, and often re-organise more than this between network devices, the image property indexing was causing too much overhead relative to the small IO size of the images. But indexing the length of the videos is a small task compared to the IO size of a video.

Anyway - all the file sources are nicely tuned now - thanks for your help and great software.
Post Reply