Content indexing doesn't find some things

Discussion related to "Everything" 1.5 Alpha.
Post Reply
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Content indexing doesn't find some things

Post by gregor »

Hi,
1. when I type
ext:docx content:
no file is shown. Why? I have .docx files on the disk and standard settings:

Image

2. Also I have some .xlsx files in which certain things are found and some are not. Why is that?
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content indexing doesn't find some things

Post by horst.epp »

ext:docx content:
works fine here and shows all documents for which content index is available

Maybe a system wide docx IFilter is not available or wrong configured.

On my system it's
WordpadFilter.dll Wordpad DOCX Filter WordPad Search Filters 10.0.22621.2215 (WinBuild.160101.0800) 10.0.22621.2215 Microsoft Corporation C:\Program Files\Windows NT\Accessories\WordpadFilter.dll {698A4FFC-63A3-4E70-8F00-376AD29363FB} {3037B4CD-A40B-401B-B676-2017EE8FAFF4} 07.05.2022 08:03:17

This info is provided with the NirSoft tool
https://www.nirsoft.net/utils/search_filter_view.html

______________________________________________________
Windows 11 Home x64 Version 22H2 (OS Build 22621.2361)
Everything 1.5.0.1357a (x64), Everything Toolbar 1.2
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

I don't know what is this iFilter. Should I look for some dll file on my disk? I don't have this WordpadFilter.dll in the SearchFilterView, but have on the disk in different locations.
Image
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content indexing doesn't find some things

Post by horst.epp »

In the NirSoft Tool, you can assign extension to the existing iFilters.
Try one of the Ope Office filters.
To my knowledge, the Microsoft one doesn't handle DOCX
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Is Everything finding docx files correctly:

Do you have any search options enabled under the Search menu?
Do you have a search filter enabled under the Search menu?

Does the following search find any results:

ext:docx



Please try the following search:

ext:docx regex:content:^(.*)$ addcol:regmatch1

-Is anything shown in the regmatch1 column?



Please try forcing a rebuild:
  • In Everything, from the Tools menu, click Options.
  • Click the Indexes tab on the left.
  • Click Force Rebuild.
  • Click OK.
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

horst.epp wrote: Sun Oct 08, 2023 6:22 pm In the NirSoft Tool, you can assign extension to the existing iFilters.
Try one of the Ope Office filters.
To my knowledge, the Microsoft one doesn't handle DOCX
It didn't help.
void wrote: Mon Oct 09, 2023 12:48 am Do you have any search options enabled under the Search menu?
No, only Everything.
Do you have a search filter enabled under the Search menu?
No.
Does the following search find any results:
ext:docx
Yes.
ext:docx regex:content:^(.*)$ addcol:regmatch1
-Is anything shown in the regmatch1 column?
No.

Please try forcing a rebuild:
Rebuilding database helped. Thanks.

Btw, why there are different search results when I type

Code: Select all

ext:docx content:"some text
or

Code: Select all

ext:docx content:some text
and

Code: Select all

ext:docx content:some
Can this be changed/simplified so that

Code: Select all

ext:docx content:some text
is the default search for the given phrase?
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: Content indexing doesn't find some things

Post by therube »

Without looking...

some quoting may only need the opening quote (with everything thereafter treated as if it had an ending quote)
(that said, it's always "safer" if both quotes are explicitly entered)
Ending quote mark not required.


content:some text

is treated as "content:some" AND text (with the word text being in the file name to search within.

default search for the given phrase?
I'd think could be done with a Bookmark/Macro.


(don't know if i've ever done a content: search [with Everything]? & don't particularly deal with bookmarks.)
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

I wanted to simplify it for people who doesn't use this function often and may not know/remember that they have to enter "content:" and the quotation marks.
Anyway, what about the second problem?
gregor wrote: Sun Oct 08, 2023 3:39 pm I have some .xlsx files in which certain things are found and some are not. Why is that?
Rebuilding database didn't help.
vdw1463
Posts: 3
Joined: Fri Oct 13, 2023 8:02 pm

Re: Content indexing doesn't find some things

Post by vdw1463 »

Went on a bit of an adventure dealing with this.

I initially wanted to report that I have also run into this issue. It happened multiple times over the past few months trying to content search PDFs in various folders. I have particularly noticed that sometimes the same query won't work, and sometimes it will. I tried immediately doing a Force Rebuild, and it did not fix the problem.

Then I noticed that if I searched "content:Selenium", I got 0 results (instead of the expected 6), but "content:Sel" gave me one result. I opened this PDF in a plaintext editor and found it contained "SeL". Then I came across this post, which talks about this specific problem being addressed in 1336a.

It was at this moment that I realized the "Check for updates" button had been lying to me, and that my version 1303a was very much out of date. So if anyone else has this issue, make sure you're on 1336a or higher.

Edit: Still getting some weird behaviour. 1357a doesn't seem to fix the issue. Still giving me the "sel" result. Interestingly, explicitly turning on content indexing in Options appears to destroy even that, and I get 0 results.
Last edited by vdw1463 on Fri Oct 13, 2023 8:37 pm, edited 3 times in total.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Everything uses the system iFilter to read PDF content.

Sounds like your PDF iFilter is not working.

Please try forcing Everything to use the stock Windows PDF iFilter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to:

    Code: Select all

    [{"filter":"*.pdf","handler":"{6C337B26-3E38-4F98-813B-FBA18BAB64F5}"}]
    
  • Click OK.
Please disable content indexing to quickly check if reading the content works. (or select the PDF file and press Ctrl + F5)
Content indexing progress is shown in the status bar.
Enabling or disabling content indexing will not change how PDFs are read by Everything.
Content indexing will take a very long time.
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

How to check if my xlsx iFilter is not working? Is there a similar method for xlsx?

Image

And what do you mean by "Everything stores content in RAM."?
I can see that the content is held in the Everything-1.5a.db file and read from it. Just like all the rest of the indexation data. So, it's ROM memory, not RAM.
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content indexing doesn't find some things

Post by horst.epp »

gregor wrote: Sat Oct 14, 2023 11:07 am And what do you mean by "Everything stores content in RAM."?
I can see that the content is held in the Everything-1.5a.db file and read from it. Just like all the rest of the indexation data. So, it's ROM memory, not RAM.
You are wrong.
The whole content index is stored and used from RAM.
Only on exit, Everything writes it into the database.
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

The size of my Everything-1.5a.db file is the same with the program on and off.
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content indexing doesn't find some things

Post by horst.epp »

gregor wrote: Sat Oct 14, 2023 2:42 pm The size of my Everything-1.5a.db file is the same with the program on and off.
It is but nevertheless all Everything index data is stored in RAM.
It's still fact, even you don't believe :)
That's the reason why you can't store much content in the x86 version
because RAM size per process is limited to 2GB.
vdw1463
Posts: 3
Joined: Fri Oct 13, 2023 8:02 pm

Re: Content indexing doesn't find some things

Post by vdw1463 »

void wrote: Sat Oct 14, 2023 12:57 am

Code: Select all

[{"filter":"*.pdf","handler":"{6C337B26-3E38-4F98-813B-FBA18BAB64F5}"}]
Immediately started working again. Thank you!
Last edited by vdw1463 on Sat Oct 14, 2023 8:25 pm, edited 1 time in total.
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

How to check if my xlsx iFilter is not working? Is there a similar method for xlsx?
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Content indexing doesn't find some things

Post by NotNull »

Unfortunately no.
Windows comes with a PDF iFilter to search the content of PDF files.
No such thing for XLSX files; you neeed to install an iFilter to search those. It will be installed when you install MS Office.


If you have MS Office installed, try repairing the MS Office installation:

- run APPWIZ.cpl (Add or Remove Programs)
- Right-click the MS Office entry
- Select Repair
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

It didn't work.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Please make sure you are using the x64 version of Everything (Help -> About Everything -> Version: x64)

Please make sure Everything is installed correctly and running as a standard user:
  • In Everything, from the Tools menu, click Options.
  • Click the General tab on the left.
  • Check Store settings and data in %APPDATA%\Everything.
  • Uncheck Run as administrator.
  • Check Everything Service. (Please make sure this is tick-checked and not square-checked)
  • Click OK.
  • Exit Everything (right click the Everything tray icon and click Exit).
  • Restart Everything.
Does the issue persist?
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

void wrote: Mon Oct 16, 2023 8:46 am Does the issue persist?
Yes, but I did another thing. I replaced my offfiltx.dll file with the newer one from https://www.dll-files.com/offfiltx.dll.html (2006.1200.6420.1000 vs 2006.1200.6605.1000) and did Forced rebuild and now ext:xlsx content: shows every file and looks like all content is found.
I also wanted to check if this command will show the content ext:xlsx regex:content:^(.*)$ addcol:regmatch1, but it doesn't even show proper xlsx files.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Please try the following search to view content:

ext:xlsx dotall:regex:content:^(.*)$ addcol:regmatch1

dotall:
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

Now it's ok. Problems with pdf, docx and xlsx content are fixed now, thanks.
And btw, if some files (with content) will be deleted, will the program automatically refresh the database or do I have to rebuild it manually?
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

The content will be removed from your index automatically when the file is deleted.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Just making a note here.

To force Everything to use the Office iFilter for doc, docx, xls and xlsx files:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to:

    Code: Select all

    [{"filter":"*.doc;*.docx;*.xls;*.xlsx","handler":"{f07f3920-7b8c-11cf-9be8-00aa004b9986}"}]
    
  • Click OK.
gregor
Posts: 27
Joined: Sun Jan 08, 2023 7:00 pm

Re: Content indexing doesn't find some things

Post by gregor »

When I add this

Code: Select all

[{"filter":"*.doc;*.docx;*.xls;*.xlsx","handler":"{f07f3920-7b8c-11cf-9be8-00aa004b9986}"}]
ext:xlsx content: stops showing files. Without it, and after rebuild it shows.
And btw, I don't know why, ext:docx content: also stopped showing files, no mather what is is content_ifilter_handlers :(
Force Rebuild doesn't help as before, I don't get it.
EDIT:
After adding docx extension to offfiltx.dll using SearchFilterView (new file version I mentioned) and rebuilding database in Everything, ext:docx content started to show files properly :?
Anyway, now I don't have nothing in content_ifilter_handlers and ext:pdf content: , ext:xlsx content: , ext:docx content: shows files properly again.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Everything 1.5.0.1360a will now automatically fall back to the builtin Windows PDF iFilter and Office iFilter.

The stock Windows iFilter for *.pdf, *.doc, *.docx, *.xlx and *.xlsx is now used when the currently installed third party iFilter fails.
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Content indexing doesn't find some things

Post by NotNull »

void wrote: Thu Nov 16, 2023 7:04 am when the currently installed third party iFilter fails.
Out of curiousity: How to check if an iFilter fails?
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content indexing doesn't find some things

Post by horst.epp »

void wrote: Fri Oct 27, 2023 11:36 am Just making a note here.

To force Everything to use the Office iFilter for doc, docx, xls and xlsx files:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to:

    Code: Select all

    [{"filter":"*.doc;*.docx;*.xls;*.xlsx","handler":"{f07f3920-7b8c-11cf-9be8-00aa004b9986}"}]
    
  • Click OK.
I guess this will only work if MS Office is installed
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Out of curiousity: How to check if an iFilter fails?
LoadIFilter returns an error.

The error code is reported in the Everything debug console in verbose debug mode.

Look for:

LoadIFilter <filename> <error-code>


I guess this will only work if MS Office is installed
Yes, correct.
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Content indexing doesn't find some things

Post by NotNull »

void wrote: Thu Nov 16, 2023 9:31 pm LoadIFilter returns an error.

The error code is reported in the Everything debug console in verbose debug mode.
Good to know. Little by little we get insight in all these cryptic debug messages. Thanks!
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content indexing doesn't find some things

Post by void »

Just making a note here...

Everything 1.5.0.1361a will now treat empty handlers as the NULL CLSID.

This might be useful to disable handlers.



For example, to disable the PDF iFIlter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":""}]
  • Click OK.


For example, to disable the Office iFIlter for *.doc;*.docx;*.xls;*.xlsx:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.doc;*.docx;*.xls;*.xlsx","handler":""}]
  • Click OK.


An empty handler is the same as: {00000000-0000-0000-0000-000000000000}
Post Reply