Fixed to work correctly with Cyrillic symbols. by injDakov · Pull Request #7 · JakeBayer/FuzzySharp · GitHub
Skip to content

Fixed to work correctly with Cyrillic symbols.#7

Open
injDakov wants to merge 1 commit into
JakeBayer:masterfrom
injDakov:master
Open

Fixed to work correctly with Cyrillic symbols.#7
injDakov wants to merge 1 commit into
JakeBayer:masterfrom
injDakov:master

Conversation

@injDakov

@injDakov injDakov commented Sep 2, 2019

Copy link
Copy Markdown

No description provided.

@JakeBayer

Copy link
Copy Markdown
Owner

@injDakov

Copy link
Copy Markdown
Author

Hi Jake,
I use the amount of Cyrillic strings and then I run into issues.

My workaround is similar to your suggested.

method ()
{
...
var results = Process.ExtractAll(queryName, namesList, input => ProcessorFunction(input), cutoff: cutoffValue).ToList();
...
}
private static string ProcessorFunction(string input)
{
input = Regex.Replace(input, "[^ a-zA-Z0-9а-яА-Я]", " ");
input = input.ToLower();
return input.Trim();
}

But my opinion is that the native library without workaround should support Latin and Cyrillic alphabet.

@ahamidou ahamidou Feb 26, 2022

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting and good initiative.
I think a better way to do this is by updating the PreprocessMode enum to accept, the enum is confusing and Full vs None does not make much sense.
Also flags makes sense in case I'm working with more than one language.
I propose the following:

[Flags]
public enum PreprocessMode
    {
        NotSet = 0,
        English = 1,
        Russian = 2,
        Gibberish = 5 
    }

Then here, in this method use the correct pattern(s).
If PreprocessMode==1 then pattern = "[^ a-zA-Z0-9]"; // English
If PreprocessMode==2 then pattern = "[^а-зА-З0-9]"; // Russian
If PreprocessMode==3 then pattern = "[^a-zA-Z0-9а-зА-З]"; //Both English & Russian

Finally, even the name PreprocessMode isn't very descriptive, maybe LanguageProcessor or something like that would be a better name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants