soundex
Re: soundex
I will look into an option to make this easier.1. This option is very useful! Could you please add a setting to make it the default?
For now, please try creating the following soundex filter:
- In Everything, from the Search menu, click Add to filters....
- Change the Name to: Soundex
- Change the Search to: soundex:$param:
- Click OK.
The problem here is you can only have one search term when this filter is active.
(this search modifier was never really designed to be used with multiple terms)
soundex:
https://en.wikipedia.org/wiki/Soundex
This is outside the scope of soundex.2. Could you please add support for letters in the Hebrew language?
Only A-Z is supported.
I will look into other algorithms that support Hebrew.
Thank you for the suggestions.
Re: soundex
Thank you for your detailed reply!
This is especially necessary in Hebrew, because in Hebrew there are letters "י" and "ו" that some write and some omit them, for example there are those who write "ביאור" and there are those who write "באור", there are those who write "שלחן" and there are those who write "שולחן", and many more. (You can do a search for "ב*אור" and "ש*לחן" but because there are so many such words, it would be more helpful if there was such a default option).
I also use a program called "Fluent Search" a lot, and choose there your search engine, and if there was such a default setting in "Everything" it would be useful there too.
On this occasion I would like to personally thank you for your software, I use it a lot!
If it is possible, with the option to make it the default (with a setting in "Advanced"?), that would be great!I will look into other algorithms that support Hebrew.
This is especially necessary in Hebrew, because in Hebrew there are letters "י" and "ו" that some write and some omit them, for example there are those who write "ביאור" and there are those who write "באור", there are those who write "שלחן" and there are those who write "שולחן", and many more. (You can do a search for "ב*אור" and "ש*לחן" but because there are so many such words, it would be more helpful if there was such a default option).
I also use a program called "Fluent Search" a lot, and choose there your search engine, and if there was such a default setting in "Everything" it would be useful there too.
On this occasion I would like to personally thank you for your software, I use it a lot!
Re: soundex
Wonder if something in the Diacritics end could work?
Or even a filter that filtered out vowels?
Or even a filter that filtered out vowels?
-
- Posts: 501
- Joined: Thu Dec 15, 2016 9:44 pm
Re: soundex
The trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; I'm sure there may be better as I've not researched what algorithms there are but I found it works very well.
d
d
-
- Posts: 684
- Joined: Wed Jan 05, 2022 9:29 pm
Re: soundex
Hi meteorquake. I agree that a language-independent algorithm would be good.meteorquake wrote: ↑Wed Jun 26, 2024 9:04 pmThe trick is to use a language-independent algorithm that doesn't analyse sound, like the one I mentioned that in essence simply counts up the Find's letter pairs present in the Target; ...
In particular a mobile algorithem (which a language-independent algorithm satisfies) is good.
I am thinking of a spell-checker using symbol-pairs to match likely suggestions for corrections. A bonus would/should be the ability to include longer strings, for example: phrases built of hyphenated or space-separated strings.
Cheers, Chris
-
- Posts: 501
- Joined: Thu Dec 15, 2016 9:44 pm
Re: soundex
It should work all the same. However if doing that and you're looking for total matching you could speed it up by only searching words whose length is within a certain % of the Find phrase. For example you'd not expect hello to be a match for a 12 letter sequence. You'd just be looking at character symbols and you might or might not want to completely ignore anything else (so "It was highly-priced" would be "itwashighlypriced") or treat all sequences of non-characters as a single space.
d
d
-
- Posts: 684
- Joined: Wed Jan 05, 2022 9:29 pm
Re: soundex
Quite so! And those values would be part of my parameters of the application.meteorquake wrote: ↑Thu Jun 27, 2024 2:41 pm... only searching words whose length is within a certain % of the Find phrase.
Thanks, Chris
Re: soundex
Sounds like you want Levenshtein distance.
I have plans to add support for this.
However, it will require sorting results by relevance.
There's too many unwanted results without a relevance sort.
I have plans to add support for this.
However, it will require sorting results by relevance.
There's too many unwanted results without a relevance sort.
-
- Posts: 684
- Joined: Wed Jan 05, 2022 9:29 pm
Re: soundex
Thank you Void.
The section "Upper and lower bounds" suggests a means to do a preliminary filter of potential matches. Where the bounds are given as e.g. "It is at most the length of the longer string" one could sort the strings by ascending length, and treat the longest strings last as having a lesser probability of finding a match.
I haven't thought about that in detail; just tucked it away for now.
Cheers, Chris
-
- Posts: 501
- Joined: Thu Dec 15, 2016 9:44 pm
Re: soundex
Looking at that article you would need to solve word rearrangement.
For example Edinburgh Tasks should be viewed as almost matching Tasks Edinburgh.
One of the reasons I adopted a simple word pair count for my own searching is that rearranged blocks will come out with a close score to the original as will small drops, changes and insertions. The price you pay is you can get some surprises included, but I've always thought it's better to have a few false inclusions than some good ones not showing.
It may be there are some very optimised processes to tackle block rearrangement, although if a routine is made too intricate it can run the risk of being slower than desirable... possibly not an issue for 500,000 filenames with something written in Assembly.
d
For example Edinburgh Tasks should be viewed as almost matching Tasks Edinburgh.
One of the reasons I adopted a simple word pair count for my own searching is that rearranged blocks will come out with a close score to the original as will small drops, changes and insertions. The price you pay is you can get some surprises included, but I've always thought it's better to have a few false inclusions than some good ones not showing.
It may be there are some very optimised processes to tackle block rearrangement, although if a routine is made too intricate it can run the risk of being slower than desirable... possibly not an issue for 500,000 filenames with something written in Assembly.
d