indexing algorithm

nssang · Post by **nssang** » Wed Nov 04, 2020 4:23 am

I am a student studying programming. I was amazed at the speed of indexing of everything.
Can you tell me an indexin algorithm or some tips or algorithms to study? I am curious about how you can quickly search through that much data.
Please advise.

Post by **void** » Wed Nov 04, 2020 8:26 am

Everything creates its database from the NTFS master file table.
The database is stored in memory and saved to disk when you exit Everything.
The database is restored from disk the next time you open Everything.
Everything uses USN Journaling to keep its database up to date.

The database is basically a list of all the file names (in UTF-8), size, date modified and pointers to parent folders.
Indexes are maintained for name, path, size and date modified.

Everything is written entirely in C.

Searches are compiled into byte code and executed.
Everything uses an optimized multi-threaded strstr search on every single filename in the database.

I wrote the Everything database specifically for filenames to be efficient as possible.

raccoon · Post by **raccoon** » Tue Nov 10, 2020 7:48 pm

I see that exported EFU "Everything File List" files are CSV "Comma Separated Value" files.

Can you tell me what the Everything.db "ESDb" file format is? Is this an "Event Stream Database" per https://github.com/customerio/esdb ? Or did you roll your own database?

Post by **NotNull** » Tue Nov 10, 2020 9:06 pm

Can'[t find the relevant thread at the moment, but it is "roll your own".

Post by **void** » Tue Nov 10, 2020 9:08 pm

Here is a little info on the Everything.db:

https://www.voidtools.com/support/everything/db/

Note: this is not up to date for Everything 1.4. However, the structure is still very similar.

rajhlinux · Post by **rajhlinux** » Sun Jul 03, 2022 3:36 am

void wrote: ↑Wed Nov 04, 2020 8:26 am Everything creates its database from the NTFS master file table.
The database is stored in memory and saved to disk when you exit Everything.
The database is restored from disk the next time you open Everything.
Everything uses USN Journaling to keep its database up to date.

The database is basically a list of all the file names (in UTF-8), size, date modified and pointers to parent folders.
Indexes are maintained for name, path, size and date modified.

Everything is written entirely in C.

Searches are compiled into byte code and executed.
Everything uses an optimized multi-threaded strstr search on every single filename in the database.

I wrote the Everything database specifically for filenames to be efficient as possible.

Interesting, I'm also amazed of the blazing fast performance of this search tool, whats even more cool about this tool is that it literally finds everything that I search for. I wonder how can I implement or make such search tool work for FreeBSD. Would be great if you can provide some details on how to do so. FreeBSD uses the ZFS filesystem.

Thanks.

raccoon · Post by **raccoon** » Sun Jul 03, 2022 4:00 am

rajhlinux wrote: ↑Sun Jul 03, 2022 3:36 am... FreeBSD ... ZFS

NTFS has a real actual file that contains the Master File Table (eg, C:\$MFT) that can be accessed within the live file system, without requiring special hardware access for raw device commands and reads. Everything just grabs the file and parses it directly in one gulp.

Not sure if that can be done with ZFS. Here is a relevant thread that will interest you. It's one of the top search results for "ZFS file table".

-> https://github.com/openzfs/zfs/issues/4739

Worthy of note, Everything is very slow scanning FAT and exFAT file systems.

iseki · Post by **iseki** » Fri Sep 01, 2023 1:07 pm

void wrote: ↑Wed Nov 04, 2020 8:26 am Searches are compiled into byte code and executed.

That means you create your own regexp engine, in C?

Post by **void** » Sat Sep 02, 2023 12:00 am

Basically, yes.

meteorquake · Post by **meteorquake** » Thu Sep 07, 2023 6:22 pm

Just need someone to do the equivalent thing for the registry - "Regything"?

voidtools forum

indexing algorithm

indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm

Re: indexing algorithm