iframe-proxy

injDakov · 2019-09-02T19:28:19Z

No description provided.

JakeBayer · 2019-09-04T13:33:44Z

injDakov · 2019-09-12T15:02:54Z

Hi Jake,
I use the amount of Cyrillic strings and then I run into issues.

My workaround is similar to your suggested.

method ()
{
...
var results = Process.ExtractAll(queryName, namesList, input => ProcessorFunction(input), cutoff: cutoffValue).ToList();
...
}
private static string ProcessorFunction(string input)
{
input = Regex.Replace(input, "[^ a-zA-Z0-9а-яА-Я]", " ");
input = input.ToLower();
return input.Trim();
}

But my opinion is that the native library without workaround should support Latin and Cyrillic alphabet.

ahamidou · 2022-02-26T07:02:11Z

    internal class StringPreprocessorFactory
    {
-        private static string pattern = "[^ a-zA-Z0-9]";
+        private static string pattern = "[^ a-zA-Z0-9а-зА-З]";


This is interesting and good initiative.
I think a better way to do this is by updating the PreprocessMode enum to accept, the enum is confusing and Full vs None does not make much sense.
Also flags makes sense in case I'm working with more than one language.
I propose the following:

[Flags] public enum PreprocessMode { NotSet = 0, English = 1, Russian = 2, Gibberish = 5 }

Then here, in this method use the correct pattern(s).
If PreprocessMode==1 then pattern = "[^ a-zA-Z0-9]"; // English
If PreprocessMode==2 then pattern = "[^а-зА-З0-9]"; // Russian
If PreprocessMode==3 then pattern = "[^a-zA-Z0-9а-зА-З]"; //Both English & Russian

Finally, even the name PreprocessMode isn't very descriptive, maybe LanguageProcessor or something like that would be a better name.

Fixed to work correctly with Cyrillic symbols.

c10ecd8

ahamidou reviewed Feb 26, 2022

View reviewed changes

This was referenced Aug 12, 2024

Performance optimization #48

Open

Less alloc add raffinert company name Raffinert/FuzzySharp#1

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed to work correctly with Cyrillic symbols.#7

Fixed to work correctly with Cyrillic symbols.#7
injDakov wants to merge 1 commit into
JakeBayer:masterfrom
injDakov:master

injDakov commented Sep 2, 2019

Uh oh!

JakeBayer commented Sep 4, 2019

Uh oh!

injDakov commented Sep 12, 2019

Uh oh!

ahamidou Feb 26, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

injDakov commented Sep 2, 2019

Uh oh!

JakeBayer commented Sep 4, 2019

Uh oh!

injDakov commented Sep 12, 2019

Uh oh!

ahamidou Feb 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahamidou Feb 26, 2022 •

edited

Loading