Regex Unicode and Non-Unicode

General discussion related to "Everything".
Post Reply
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Regex Unicode and Non-Unicode

Post by Debugger »

It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])
void
Developer
Posts: 16745
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Post by void »

Everything uses Perl Compatible Regular Expressions.

Please try:

\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Unicode is supported.
\p{...} is not supported.
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Post by Debugger »

[quote="void"]

Please try:

\b (?>) Matches a word boundary (the start or end of a word).

Regex enabled:
\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]
Always not work:
0 objects!!!!!!!!!!!
void
Developer
Posts: 16745
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Post by void »

regex:"\b\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}]" is working correctly here.

\b = starting word boundary.
\x0D = carriage return
\x0A = new line
| = OR (all text before this is one search, all text after this another search)
[] = in a set
\x{85} = new line
\x{2028} = separator
\x{2029} = separator

Combing them all together you get:
(a carriage return or newline after a word boundary) OR (a single character matching a carriage return, newline, atlernate newline or unicode separator 2028 or unicode separator 2029)

What exactly are you trying to search for?

Please try without the word boundary:
regex:[\x0A-\x0D\x{85}\x{2028}\x{2029}]

Make sure regex is disabled from the Search menu if you use the regex: modifier.
Also if you use the regex: modifier, please make sure you escape | with double quotes.

You can also use the built in macro to find unicode characters, which should be faster, with regex disabled, search for:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Post by Debugger »

0 object

Image Image
void
Developer
Posts: 16745
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Post by void »

Are you certain you have a filename with one of the above characters?

Does the following search find any results:
#x0a:|#x0d:|#x85:|#x2028:|#x2029:
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Post by Debugger »

It does not work for me.
I want correct Regex: Show the names of Unicode
I want correct Regex: All names without Unicode.
void
Developer
Posts: 16745
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Post by void »

I've tested creating filenames with 0x0a, 0x0d, U+2028 and U+2029 characters and the above searches would find them.

It's not clear what you are searching for.

To search for files with non-ASCII characters, search for:
regex:[^\x{00}-\x{7f}]

To search for files with only non-ASCII characters, search for:
!regex:[\x{00}-\x{7f}]

To search for files with ASCII only characters, search for:
regex:^[\x{00}-\x{7f}]*$
skribb
Posts: 23
Joined: Thu Mar 20, 2014 11:06 am

Re: Regex Unicode and Non-Unicode

Post by skribb »

Debugger wrote:It does not work in Everything:

Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
or
\p{Han} for CJK ideographs

Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])

I don't know anything about Regx BUT as far as I understand it I don't see why those strings would find folders and file names containing characters from the non-latin character set
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Post by Debugger »

regex:[^\x{00}-\x{7f}]

It works, but I do not want to include Polish alphabet (native OS Polish)
https://en.wikipedia.org/wiki/Polish_alphabet
Image
Show only English + Unicode.
void
Developer
Posts: 16745
Joined: Fri Oct 16, 2009 11:31 pm

Re: Regex Unicode and Non-Unicode

Post by void »

It works, but I do not want to include Polish alphabet (native OS Polish)
regex:[^\x{00}-\x{7f}\x{104}\x{106}\x{118}\x{141}\x{143}\x{d3}\x{15a}\x{179}\x{17b}\x{105}\x{107}\x{119}\x{142}\x{144}\x{f3}\x{15b}\x{17a}\x{17c}]
Show only English + Unicode.
What do you mean by English? does this include spaces? numbers?
What do you mean by Unicode? I assume you mean characters with a code > 7f.

To search for a-z only search for:
regex:^[a-zA-Z]*$
Debugger
Posts: 630
Joined: Thu Jan 26, 2017 11:56 am

Re: Regex Unicode and Non-Unicode

Post by Debugger »

English
Aa Cc Ec
Aceelerator

-----------------------
Polish
AĄaą CĆcć EĘeę
Mąka ćwikłowa

------------------------
Unicode -> Other languages than Polish native + Special Chars ★ Hozda ★

Code: Select all

¡ ¦ 
гвинея-спорт_олимпиада_мюнхен-72(1972)
極上スマイル(brz_



regex:^[a-zA-Z]*$
It does not show all the folders
It does not show all the files
Post Reply