Index html and htm files in subfolders?

Discussion related to "Everything" 1.5 Alpha.
Post Reply
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Index html and htm files in subfolders?

Post by Biff »

How would one have to adapt the code so that Everything also indexes html and htm files that are in this folder, I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\, and in its subfolders and perhaps other subfolders?

*.doc;*.docx;*.pdf;*.txt;*.xls;*.xlsx;*.ods;*.odt;*.ott;*.scrivx;*.csv;*.ics;*.rtf;*.eml;regex:^I:\\Eigene Dateien\\Notepad - Ansammlungen txt-Dateien\\[^.]*$

Image

Can Everything only index the text of an html or htm page that a visitor sees, not the code?
void
Developer
Posts: 16770
Joined: Fri Oct 16, 2009 11:31 pm

Re: Index html and htm files in subfolders?

Post by void »

How would one have to adapt the code so that Everything also indexes html and htm files that are in this folder, I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\, and in its subfolders and perhaps other subfolders?
Include the following in your Include only files:

I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.html;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.htm




To include multiple folders, please try:

I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.html;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.htm;C:\Another folder\**.html;C:\Another folder\**.htm



Can Everything only index the text of an html or htm page that a visitor sees, not the code?
A couple ways to do this:

1). Disable Tools -> Options -> Advanced -> content_builtin_text_plain_handler
Select your html/htm files and hit Ctrl + F5 to reindex content.

-or-

2). Remove html/htm from the Everything built-in list of extensions:
  • Type in the following search and press ENTER:
    about:config
    Change the following line:

    Code: Select all

    text_plain_extensions=a;ans;asc;ascx;asm;asp;aspx;asx;bas;bat;bcp;btm;c;cc;cls;cmd;contact;cpp;cs;csa;csproj;css;csv;cxx;dbs;def;dic;dos;dsp;dsw;efu;ext;faq;fky;h;hhc;hpp;hta;htm;html;htt;htw;htx;hxx;i;ibq;ics;idl;idq;inc;inf;ini;inl;inx;jav;java;js;json;kci;lgn;lst;lua;m3u;mak;mk;odc;odh;odl;php;pl;prc;ps1xml;py;rc;rc2;rct;reg;rgs;rul;s;scc;shtm;shtml;sol;sql;srf;stm;tab;tdl;tlh;tli;trg;txt;udf;udt;user;usr;vbproj;vbs;vcproj;viw;vspscc;vsscc;vssscc;wri;wtx;xml;xsd;xsl;xslt
    to:

    Code: Select all

    text_plain_extensions=a;ans;asc;ascx;asm;asp;aspx;asx;bas;bat;bcp;btm;c;cc;cls;cmd;contact;cpp;cs;csa;csproj;css;csv;cxx;dbs;def;dic;dos;dsp;dsw;efu;ext;faq;fky;h;hhc;hpp;hta;htt;htw;htx;hxx;i;ibq;ics;idl;idq;inc;inf;ini;inl;inx;jav;java;js;json;kci;lgn;lst;lua;m3u;mak;mk;odc;odh;odl;php;pl;prc;ps1xml;py;rc;rc2;rct;reg;rgs;rul;s;scc;shtm;shtml;sol;sql;srf;stm;tab;tdl;tlh;tli;trg;txt;udf;udt;user;usr;vbproj;vbs;vcproj;viw;vspscc;vsscc;vssscc;wri;wtx;xml;xsd;xsl;xslt
    (remove htm;html)
  • Save changes and exit Notepad
  • Accept the prompt in Everything to reload your config.
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

Thank you very much!

It seems this code

Code: Select all

*.doc;*.docx;*.pdf;*.txt;*.xls;*.xlsx;*.ods;*.odt;*.ott;*.scrivx;*.csv;*.ics;*.rtf;*.eml;regex:^I:\\Eigene Dateien\\Notepad - Ansammlungen txt-Dateien\\[^.]*$;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.html;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.htm
lets Everything index the content of these files:
*.doc;*.docx;*.pdf;*.txt;*.xls;*.xlsx;*.ods;*.odt;*.ott;*.scrivx;*.csv;*.ics;*.rtf;*.eml
and
html and htm
and the content of files without extension in the folder
"Notepad - Ansammlungen txt-Dateien" and all of its sub folders.

And a html file in the bin:
Image

Is it like it should be? Why is the html file in the bin shown / indexed, respectively kept in the index (which isn't bad).

Image



So I would not need this(?):

Code: Select all

I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.html;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.htm;C:\Another folder\**.html;C:\Another folder\**.htm
Or what for is this part good for?
void
Developer
Posts: 16770
Joined: Fri Oct 16, 2009 11:31 pm

Re: Index html and htm files in subfolders?

Post by void »

Is it like it should be?
Yes.


Why is the html file in the bin shown / indexed, respectively kept in the index (which isn't bad).
This might be from an old content index.
Please wait until Everything finishes indexing content.
Progress is shown in the status bar on the right.
The content for this file will eventually be removed.


So I would not need this(?):

Code: Select all

I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.html;I:\Eigene Dateien\Notepad - Ansammlungen txt-Dateien\**.htm;C:\Another folder\**.html;C:\Another folder\**.htm
Or what for is this part good for?
It's not needed unless you wanted to index html/htm content in other folders.
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

This might be from an old content index.
So Everything just keeps files in the index until (the new) indexing is finished although they are in a folder / in the bin in which they should not be indexed?
It's not needed unless you wanted to index html/htm content in other folders.
Ah, so that just is additionally code I could use (adapted) for every other folder. It was not intended to use it for that special folder.
void
Developer
Posts: 16770
Joined: Fri Oct 16, 2009 11:31 pm

Re: Index html and htm files in subfolders?

Post by void »

So Everything just keeps files in the index until (the new) indexing is finished although they are in a folder / in the bin in which they should not be indexed?
Ah, Everything is not clearing the properties when the file is moved (or deleted) and the new location is excluded.
Thank you for bringing the issue to my attention.
I am working on a fix.
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

Thank you very much!
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

It seem as if special signs / characters are not found in the content of files, e.g. ":", a colon.

Ccould Everything find also these special signs, characters?
void
Developer
Posts: 16770
Joined: Fri Oct 16, 2009 11:31 pm

Re: Index html and htm files in subfolders?

Post by void »

Could you please upload a file containing a colon (:) that Everything does not find in a bugreport

Everything should find colons in your file content.
For example:
content::

-or-
content:":"




What type of file are you searching?
-If it's html/htm, the colon could be encoded as
:
or
&0x3A;
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

E.g. an html file: it obviously does not find a string of signs / characters including the ":", e.g.: "m: Ein"

Image
-If it's html/htm, the colon could be encoded as
:
So it is not found not encoded?
void
Developer
Posts: 16770
Joined: Fri Oct 16, 2009 11:31 pm

Re: Index html and htm files in subfolders?

Post by void »

There's an <em> tag in the way:

m: <em>Ein




Please try the following alternative search:

content:<m: ein>


(search for content containing m: AND ein)



If you would like to search the visible text only, please try removing htm and html from your built-in list of text/plain extensions as mentioned above.
Biff
Posts: 1158
Joined: Mon May 25, 2015 7:09 am

Re: Index html and htm files in subfolders?

Post by Biff »

(search for content containing m: AND ein)
Yes, that works.
If you would like to search the visible text only, please try removing htm and html from your built-in list of text/plain extensions as mentioned above.
OK.

Thank you very much!
Post Reply