Unable to search PDF contents

Discussion related to "Everything" 1.5 Alpha.
Post Reply
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Unable to search PDF contents

Post by excelsius »

Version: 1.5.0.1355a (x64)
OS: Windows 11 Education 22H2 22621.2134

I just installed the alpha version of Everything and set the content indexing to include *.doc;*.docx;*.pdf;*.txt;*.xls;*.xlsx;*.sas;*.r;*.py;*.ipynb

The issue is that Everything is unable to search for pdf contents properly. I think indexing is complete, Everything database is about 3GB, RAM usage is 4GB. I have over 40GB of free RAM left and my drive is NVMe.

The searches I tried are:
  • "D:\" <*.pdf> content:perturb. This one bring no results at all
  • "D:\" <*.pdf> notindexed:content:perturb. This one brings only two results, but is very slow to come up with them and is missing 76 additional results that I have verified with AnyTXT
Trying to figure out if this is a bug or if I'm doing something wrong. The content in question is located on NVMe, but I have also included NAS locations in the general indexing of Everything. Everything 1.4 is also currently installed in parallel. Hopefully that's not a problem.

Must add, Everything is an amazing tool. Thank you for developing it and making it even way better in v1.5.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for your feedback excelsius,

Is Everything still indexing your content?
-A progress bar shown in the status bar if Everything is still indexing content.


(content indexing progress bar shown at the bottom right in the statusbar)

Progress is also shown under Tools -> Options -> Content.



Everything uses the system iFilter to search PDF content.

Could you please check the PDF content Everything retrieves is sane:

Please try the following search:

"c:\my folder\my file.pdf" regex:content:^(.*)$ addcol:regmatch1

where c:\my folder\my file.pdf is a PDF file that contains perturb.

Does the file content shown in the regular expression match 1 column look sane? -Does it match the text content in the PDF file?



Do you have any search options checked under the Search menu? (please make sure match case, whole words and regex are all disabled)



Some PDF iFilters do not like running as administrator.

Please make sure Everything is installed correctly and running as a standard user:
  • In Everything, from the Tools menu, click Options.
  • Click the General tab on the left.
  • Check Store settings and data in %APPDATA%\Everything.
  • Uncheck Run as administrator.
  • Check Everything Service. (Please make sure this is tick-checked and not square-checked)
  • Click OK.
  • Exit Everything (right click the Everything tray icon and click Exit).
  • Restart Everything.
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

Thank you for the very quick response. Yes, indexing is done. The status bar appeared when I first installed v1.5, but it completed very quickly, probably within few minutes.

I'm not sure exactly what results I'm looking for with the search you proposed, but here is a screenshot for one of the PDFs that does contain the keyword. The RegEx column is blank:
srch.png
srch.png (29.7 KiB) Viewed 5878 times
I checked all the other settings you mentioned in terms of Search and Administrator and they all are as you mentioned, so nothing needed to be changed there.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for the information.

Looks like the PDF iFilter is not working at all..
(The regmatch 1 column is blank)

Could you please send some debug output:
  • In Everything, from the Tools menu, under the Debug submenu, check Verbose.
  • In Everything, from the Tools menu, under the Debug submenu, check Start Debug Logging.
    ---
    Select your PDF file and press Ctrl + F5.
    ---
  • In Everything, from the Tools menu, under the Debug submenu, click Stop Debug Logging.
    The Everything Debug Log will open in Notepad.
  • Please save this file to the Desktop and send to support@voidtools.com
Privacy
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

I don't know if I was supposed to respond here too, but I sent the logs to you this morning. Thank you.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for the debug logs.

LoadIFilter D:\...\file.pdf 80070057


Loading the contents of the PDF fails.

Error 0x80070057: The function received an invalid parameter.

The parameters passed to the iFilter are correct.
This error is likely generated by the third party iFilter handler.



Reinstalling your PDF viewer might help.

I have put on my TODO list to add the option to override the default iFilter.
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

There is no info provided what PDF product is installed.
Windows 11 alone doesn't contain an iFilter.
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

horst.epp wrote: Sat Sep 02, 2023 6:24 am Windows 11 alone doesn't contain an iFilter.
It does here (Win11 Pro; no PDF software instalkled (yet)):

Reader Search Handler, using %systemroot%\system32\Windows.Data.Pdf.dll
Registered under CLSID {6C337B26-3E38-4F98-813B-FBA18BAB64F5}

( Maybe that one came with the Edge browser? )


2023-09-02 22_27_29-Registry Editor.png
2023-09-02 22_27_29-Registry Editor.png (101.52 KiB) Viewed 5790 times
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

2NotNull
You are right, it's the reader search handler.
Never used it.
Screenshot - 03.09.2023 , 16_02_51.png
Screenshot - 03.09.2023 , 16_02_51.png (11.92 KiB) Viewed 5746 times
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

OK, not just here then.

Your screenshot .. that program feels familiar, but can't put my finger on it. What is it?
horst.epp
Posts: 1448
Joined: Fri Apr 04, 2014 3:24 pm

Re: Unable to search PDF contents

Post by horst.epp »

NotNull wrote: Sun Sep 03, 2023 3:58 pm OK, not just here then.

Your screenshot .. that program feels familiar, but can't put my finger on it. What is it?
It's the Properties for an entry in the Nirsoft SearchFlterView
Screenshot - 03.09.2023 , 19_43_12.png
Screenshot - 03.09.2023 , 19_43_12.png (66.29 KiB) Viewed 5733 times
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: Unable to search PDF contents

Post by NotNull »

That's the one (forgot I even had it)

Thanks!
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Everything 1.5.0.1356a adds support for custom iFilter handlers.

To set Everything to use the built-in Windows PDF iFilter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":"{6C337B26-3E38-4F98-813B-FBA18BAB64F5}"}]
  • Click OK.
content_ifilter_handlers



To refresh your indexed PDF content and properties:
  • In Everything 1.5, search for:
    *.pdf
  • Select all results (Ctrl + A)
  • Press Ctrl + F5.
  • Indexing progress is shown in the status bar on the right.
  • Click OK.
excelsius
Posts: 4
Joined: Fri Sep 01, 2023 1:39 am

Re: Unable to search PDF contents

Post by excelsius »

This is excellent news. Thanks for such a quick update. I followed your instructions and it took couple of hours to index ~40K PDFs, but now I can search the contents. One thing that is strange is that before PDF indexing, Everything used about 3.8GB RAM. After PDF indexing, RAM usage dropped below 2GB and then jumped back up to about 6.7GB, which would be the expected value. But that was yesterday right after indexing. Today, the RAM usage is down to just 660MB. I'm wondering if maybe not all the indices are in RAM?

Also, a question, will the saved PDF indices in Everything automatically expand and contract as PDFs are added and removed in the file system?

To share the information I had shared with you via email with this forum, I'm including the AnyTXT screenshot below. If you ever have time to expand Everything so that it can populate the actual text results found without having to open the specific document, Everything would become an even more powerful tool.

Thanks again for all your hard work on this software.
anytxt.png
anytxt.png (451.62 KiB) Viewed 5393 times
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Thank you for your feedback excelsius,
I'm wondering if maybe not all the indices are in RAM?
All indexes and content are stored in RAM.

Tools -> Debug -> Statistics gives more information about memory usage.
What is shown for the first Database section?


Also, a question, will the saved PDF indices in Everything automatically expand and contract as PDFs are added and removed in the file system?
Yes, removing the PDF file from the system will also remove the properties and content indexed by Everything.


To share the information I had shared with you via email with this forum, I'm including the AnyTXT screenshot below. If you ever have time to expand Everything so that it can populate the actual text results found without having to open the specific document, Everything would become an even more powerful tool.
My own text preview handler is on my TODO list.
Thank you for the suggestion.
void
Developer
Posts: 16776
Joined: Fri Oct 16, 2009 11:31 pm

Re: Unable to search PDF contents

Post by void »

Just making a note here...

Everything 1.5.0.1361a will now treat empty handlers as the NULL CLSID.

This might be useful to disable handlers.



For example, to disable the PDF iFIlter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":""}]
  • Click OK.


For example, to disable the PDF iFIlter:
  • In Everything 1.5, from the Tools menu, click Options.
  • Click the Advanced tab on the left.
  • To the right of Show settings containing, search for:
    ifilter
  • Select content_ifilter_handlers.
  • Set the value to: [{"filter":"*.pdf","handler":""}]
  • Click OK.


An empty handler is the same as: {00000000-0000-0000-0000-000000000000}
Post Reply