|
Pedserve provides two different types of similarity searching for text fields - "double metaphone" phonetic searches, and "Levenshtein distance" searching for similarly spelled words. Double MetaphoneIn June 2000 Lawrence Phillips published a highly regarded algorithm called Double Metaphone in the C/C++ Users Journal for determining when two words "sound alike", according to the English speaking pronunciation. Pedserve uses this to let you search for fields containing words or phrases that "sound like" the search pattern entered. Levenshtein DistanceThe Levenshtein Distance is a measure of how close two string are, in terms of the number of additions, deletions and substitutions that are needed to tranform one string into another. The lower the "Levenshtein distance", the more similar the strings are. It is a commonly used technique to find mispellings, and will find mispellings that fundamentally alter how a word "sounds" - and therefore is generally better at finding misspellings than a phonetic search. The downside is that it is relatively slow, as it requires computing the "Levenshtein distance" between the search text and the given search field for all records being considered. With a large database, this can be a lot of work. You can customize this search by altering the maximum "Levenshtein distance" for two strings to be considered a match. The "Levenshtein distance" is named after Vladimir Levenshtein, who published it in 1965. E.g. searching the Standfast Data Golden Retriever Database for dogs whose name sounds like "Aarondale Duke" also turns up "Earndell Duke". But a "spelled like" search for "Aarondale Duke" will return "Maundale Duke". Similarity searching is a feature of the Advanced Edition and requires that you have a dedicated server. |
USEFUL LINKS: |
| Help Cure Cancer |
©1996-2007
Tenset Tech. Ltd
All Rights Reserved