Can't find known string in PDF file.

Discussion related to "Everything" 1.5 Alpha.
Post Reply
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Can't find known string in PDF file.

Post by ChrisGreaves »

Win7 1.5.0.1337a(64)
PDF_Content_01.png
PDF_Content_01.png (132.18 KiB) Viewed 2476 times
These results are obtained by the filter

Code: Select all

ext:pdf dm:>02/02/2023
I expected this result because I edited my manifesto on Fully-Funded Public Transit earlier this month.
I opened up the file to locate a string, and chose "TTC" because it easy for me to read.
PDF_Content_02.png
PDF_Content_02.png (52.44 KiB) Viewed 2476 times
and these results are obtained by the boosted filter

Code: Select all

ext:pdf dm:>02/02/2023 content:TTC
Now I expected fewer results with "Content:TTC", but I expected at least to see Fully-Funded Public Transit in the Result List; after all it was presented with the initial filter.

Too, I am puzzled about most of the results in the second try. Basically, I do not expect the user guides for my incoming HP laptop to be discussing the Toronto Transit Commission (OK, TTC could be a term within HP), nor do I expect to see "TTC" in the graduation ceremony booklets from Uni WA.

I suspect that I have misunderstood Everything's ability to extract text from PDF files.
Thanks and Please for any guidance.
Cheers, Chris
void
Developer
Posts: 16897
Joined: Fri Oct 16, 2009 11:31 pm

Re: Can't find known string in PDF file.

Post by void »

Thank you for the issue report ChrisGreaves,

Could you please send the FullFundedPublicTransit.pdf to support@voidtools.com

I'll check out what Everything is reading my end.



To view the content Everything is reading, search for:
FullFundedPublicTransit.pdf addcolumn:column1 dotall:regex:content:^(.*)$ column1:=$1:

The content is shown in the Column 1 column.
It will be hard to read as it shows as a single line.
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: Can't find known string in PDF file.

Post by ChrisGreaves »

void wrote: Mon Feb 20, 2023 5:51 am Could you please send the FullFundedPublicTransit.pdf to support@voidtools.com
sample PDFs and a narrative emailed.

Here is my narrative without images, just for the record.
Monday, February 20, 2023
Result list above produced by:-
*.pdf dm:>02/02/2023
I open “50th-Reunion-Booklet-for-Graduates-of-1967.pdfext:pdf” and seen the In Memoriam page a name HAMBLETON.
The filter
dm:>02/02/2023 content:HAMBLETON
returns no hits.
The PDF is produced by/from University of Western Australia, so perhaps they use a non-standard generator. I thought that the lists of names might be images, but tyhen the text tool works in my Foxit Reader and lets me copy the name’s text.
I open BMOM_20230210.pdf, a bank statement produced by Bank Montreal here in Canada, I rpduced this bank statement with Primo PDF, and the image sometimes goes clody (as seen here) and ius sometyimes is clear. But the header text is clear and can be seleted and copied.
ext:pdf dm:>02/02/2023 content:"Make a payment"
returns no hits.
I open HDD and SSD c02876562.pdf, and this is a PDF printed by/for Hewlett-Packard, and if anyone knows about printing, HP should!
ext:pdf dm:>02/02/2023 content:Hewlett
This produces four hits, all from HP Pdfs.
My conclusion: Content search of PDF works.
The generator I use (PrimoPDF) InternationalPrimoPDF_5.1.0.2.exe by the looks of it.
This is suspect, perhaps uses a non-standard format, although Foxit-reader seems to read PDF’s quite well, including the ones I print from MSWord documents with Primo.
Everything’s ability to inspect text may depend heavily on the package that generates the pdf.

Cheers, Chris
Last edited by ChrisGreaves on Wed Feb 22, 2023 3:49 pm, edited 1 time in total.
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: Can't find known string in PDF file.

Post by ChrisGreaves »

void wrote: Mon Feb 20, 2023 5:51 am To view the content Everything is reading, search for:
FullFundedPublicTransit.pdf addcolumn:column1 dotall:regex:content:^(.*)$ column1:=$1:
Thanks Void.
Your suggested string produced no output in the Result List. Although as I respond I noticed that I had copied "FullFunded", should be "FullyFunded". I am rapidly gaining a reputation here for not checking what I am typing/pasting.
I went back and used "FullyFunded" and did indeed get results.

But by then I had essayed with a different file:-
PDF_Content_05.png
PDF_Content_05.png (39.6 KiB) Viewed 2436 times
This file is emitted from dear old Hewlett-Packard.

Right now I would treat this as ultra-low priority.
Clearly I had composed the filter correctly, but I do not expect Everything to cope with every FREE download of a 3rd-party PDF generator.
A quick note in documentation should suffice to take care of any similar problem.
Thanks, Chris
void
Developer
Posts: 16897
Joined: Fri Oct 16, 2009 11:31 pm

Re: Can't find known string in PDF file.

Post by void »

Thanks for sending the PDF files.

Everything is finding the content for me with Adobe reader installed.

It looks like you don't have a iFilter registered for PDF files.

You will need an iFilter installed to read PDF content.
Otherwise, Everything will treat PDF content as binary.



If you have a PDF viewer installed already, please make sure you are using the x64 version of Everything (check Help -> About -> Version) and please make sure Everything is running as a standard user.

PDF iFilters do not like running as administrator.

Please make sure Everything is installed correctly and running as a standard user:
  • In Everything, from the Tools menu, click Options.
  • Click the General tab on the left.
  • Check Store settings and data in %APPDATA%\Everything.
  • Uncheck Run as administrator.
  • Check Everything Service. (Please make sure this is tick-checked and not square-checked)
  • Click OK.
  • Exit Everything (right click the Everything tray icon and click Exit).
  • Restart Everything.
If that doesn't help please try reinstalling your PDF viewer.
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: Can't find known string in PDF file.

Post by ChrisGreaves »

void wrote: Tue Feb 21, 2023 7:34 am It looks like you don't have a iFilter registered for PDF files.
PDF iFilters do not like running as administrator.
Void, thanks for this. I cannot NOT run as administrator (I have just re-run your instructions) so I am parking this issue until next week when (I hope) I will have a decent system (not Win7, not debris left lying around, iFilter installed, sandpit of well-defined data ...)
Cheers, Chris
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: Can't find known string in PDF file.

Post by ChrisGreaves »

void wrote: Tue Feb 21, 2023 7:34 am Thanks for sending the PDF files.
Hello Void. I am back with my new laptop, Win11 (:barf:) and working my way through a pile of unresolved issues. This post is an atempt to close this topic.
Content_10.png
Content_10.png (23.11 KiB) Viewed 1481 times
In this post I was puzzled that the document did not turn up. Now with a new laptop and a fresh install of Everything 1339.a the document is found. This PDF file was created from Word2003 with PrimoPDF.

I do not want to explore iFilters at this time; it seems deeper than I ought to be for a User Tutorial for beginners, but I will probably get to it sooner or later.
This topic would not exist were it not for my old systems ability to locate a file.
I suspect that this issue, and many others, will clear up quickly as i work my way through them. A ten-year-old laptop with Win7 and repeated rebuilds is suspect in the first place.

I plan on including several PDF files in the sample data, demonstrating finding Content in PDF where the PDF arise from different sources.
Thank you for your attention and for your patience.
Cheers, Chris
horst.epp
Posts: 1453
Joined: Fri Apr 04, 2014 3:24 pm

Re: Can't find known string in PDF file.

Post by horst.epp »

ChrisGreaves wrote: Sun Mar 19, 2023 6:37 pm I do not want to explore iFilters at this time; it seems deeper than I ought to be for a User Tutorial for beginners, but I will probably get to it sooner or later.

I plan on including several PDF files in the sample data, demonstrating finding Content in PDF where the PDF arise from different sources.
Thank you for your attention and for your patience.
Cheers, Chris
No having an iFilter and searching for PDF content is wasting time and a almost useless effort.
There are simple to install free iFilters for PDF.
Try the following for example, works perfect with Everything
and is independend from any installed PDF viewer software.
https://www.pdflib.com/download/tet-pdf-ifilter/
ChrisGreaves
Posts: 688
Joined: Wed Jan 05, 2022 9:29 pm

Re: Can't find known string in PDF file.

Post by ChrisGreaves »

horst.epp wrote: Sun Mar 19, 2023 6:49 pm
ChrisGreaves wrote: Sun Mar 19, 2023 6:37 pm I do not want to explore iFilters at this time; it seems deeper than I ought to be for a User Tutorial for beginners, but I will probably get to it sooner or later
No having an iFilter and searching for PDF content is wasting time and a almost useless effort.
There are simple to install free iFilters for PDF.
Try the following for example, works perfect with Everything
and is independend from any installed PDF viewer software.
https://www.pdflib.com/download/tet-pdf-ifilter/
Thanks Horst; it wasn't so much that I didn't want iFilters; it was that I didn't know about them!
As a beginner, I tried searching for (known) content in a PDF file with Everything, and couldn't find the file.
I suspect that the problem lay in my old system, perhaps a very old PDF reader I was using, perhaps in an old PDF creator I was using.

As I pointed out in my post, had I been using a "clean" and modern system, that content search would have worked straight off, without the need for my topic.

And that I believe is how it should be for naive beginners: Do what it says in the Wiki and enjoy the results.

Back in the second half of February I was having so many failures that I reached the conclusion that I could save everybody's time by pausing my research and investing in a new system. Blame Purolator(1) for two weeks of the delay; they kept driving the laptop up and down the highway claiming that i wasn't at home!

Cheers, Chris
(1)Operative syllable "later" C
Post Reply