soundex

Discussion related to "Everything" 1.5 Alpha.
Post Reply
avi
Posts: 30
Joined: Sat Aug 19, 2023 6:06 pm

soundex

Post by avi »

1. This option is very useful! Could you please add a setting to make it the default?
2. Could you please add support for letters in the Hebrew language?

Thanks!
void
Developer
Posts: 16773
Joined: Fri Oct 16, 2009 11:31 pm

Re: soundex

Post by void »

1. This option is very useful! Could you please add a setting to make it the default?
I will look into an option to make this easier.

For now, please try creating the following soundex filter:
  • In Everything, from the Search menu, click Add to filters....
  • Change the Name to: Soundex
  • Change the Search to: soundex:$param:
  • Click OK.
Filters can be activated from the Search menu, Filter bar (View -> Filters), right clicking the status bar, filter macro or filter keyboard shortcut.

The problem here is you can only have one search term when this filter is active.
(this search modifier was never really designed to be used with multiple terms)

soundex:
https://en.wikipedia.org/wiki/Soundex


2. Could you please add support for letters in the Hebrew language?
This is outside the scope of soundex.
Only A-Z is supported.
I will look into other algorithms that support Hebrew.

Thank you for the suggestions.
avi
Posts: 30
Joined: Sat Aug 19, 2023 6:06 pm

Re: soundex

Post by avi »

Thank you for your detailed reply!
I will look into other algorithms that support Hebrew.
If it is possible, with the option to make it the default (with a setting in "Advanced"?), that would be great!
This is especially necessary in Hebrew, because in Hebrew there are letters "י" and "ו" that some write and some omit them, for example there are those who write "ביאור" and there are those who write "באור", there are those who write "שלחן" and there are those who write "שולחן", and many more. (You can do a search for "ב*אור" and "ש*לחן" but because there are so many such words, it would be more helpful if there was such a default option).
I also use a program called "Fluent Search" a lot, and choose there your search engine, and if there was such a default setting in "Everything" it would be useful there too.

On this occasion I would like to personally thank you for your software, I use it a lot!
therube
Posts: 4985
Joined: Thu Sep 03, 2009 6:48 pm

Re: soundex

Post by therube »

Wonder if something in the Diacritics end could work?
Or even a filter that filtered out vowels?
meteorquake
Posts: 504
Joined: Thu Dec 15, 2016 9:44 pm

Re: soundex

Post by meteorquake »

The trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; I'm sure there may be better as I've not researched what algorithms there are but I found it works very well.
d
ChrisGreaves
Posts: 684
Joined: Wed Jan 05, 2022 9:29 pm

Re: soundex

Post by ChrisGreaves »

meteorquake wrote: Wed Jun 26, 2024 9:04 pmThe trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; ...
Hi meteorquake. I agree that a language-independent algorithm would be good.
In particular a mobile algorithem (which a language-independent algorithm satisfies) is good.

I am thinking of a spell-checker using symbol-pairs to match likely suggestions for corrections. A bonus would/should be the ability to include longer strings, for example: phrases built of hyphenated or space-separated strings.
Cheers, Chris
meteorquake
Posts: 504
Joined: Thu Dec 15, 2016 9:44 pm

Re: soundex

Post by meteorquake »

It should work all the same. However if doing that and you're looking for total matching you could speed it up by only searching words whose length is within a certain % of the Find phrase. For example you'd not expect hello to be a match for a 12 letter sequence. You'd just be looking at character symbols and you might or might not want to completely ignore anything else (so "It was highly-priced" would be "itwashighlypriced") or treat all sequences of non-characters as a single space.
d
ChrisGreaves
Posts: 684
Joined: Wed Jan 05, 2022 9:29 pm

Re: soundex

Post by ChrisGreaves »

meteorquake wrote: Thu Jun 27, 2024 2:41 pm... only searching words whose length is within a certain % of the Find phrase.
Quite so! And those values would be part of my parameters of the application.
Thanks, Chris
void
Developer
Posts: 16773
Joined: Fri Oct 16, 2009 11:31 pm

Re: soundex

Post by void »

Sounds like you want Levenshtein distance.

I have plans to add support for this.
However, it will require sorting results by relevance.
There's too many unwanted results without a relevance sort.
ChrisGreaves
Posts: 684
Joined: Wed Jan 05, 2022 9:29 pm

Re: soundex

Post by ChrisGreaves »

void wrote: Thu Jun 27, 2024 10:42 pmSounds like you want Levenshtein distance.
Thank you Void.
The section "Upper and lower bounds" suggests a means to do a preliminary filter of potential matches. Where the bounds are given as e.g. "It is at most the length of the longer string" one could sort the strings by ascending length, and treat the longest strings last as having a lesser probability of finding a match.
I haven't thought about that in detail; just tucked it away for now.
Cheers, Chris
meteorquake
Posts: 504
Joined: Thu Dec 15, 2016 9:44 pm

Re: soundex

Post by meteorquake »

Looking at that article you would need to solve word rearrangement.
For example Edinburgh Tasks should be viewed as almost matching Tasks Edinburgh.
One of the reasons I adopted a simple word pair count for my own searching is that rearranged blocks will come out with a close score to the original as will small drops, changes and insertions. The price you pay is you can get some surprises included, but I've always thought it's better to have a few false inclusions than some good ones not showing.
It may be there are some very optimised processes to tackle block rearrangement, although if a routine is made too intricate it can run the risk of being slower than desirable... possibly not an issue for 500,000 filenames with something written in Assembly.
d
Post Reply