Make sure we use the `C` locale for `regcomp` on macOS. by arthurschreiber · Pull Request #3953 · libgit2/libgit2 · GitHub
Skip to content

Make sure we use the C locale for regcomp on macOS.#3953

Merged
carlosmn merged 1 commit into
libgit2:masterfrom
arthurschreiber:arthur/fix-regcomp-locale-issues
Oct 6, 2016
Merged

Make sure we use the C locale for regcomp on macOS.#3953
carlosmn merged 1 commit into
libgit2:masterfrom
arthurschreiber:arthur/fix-regcomp-locale-issues

Conversation

@arthurschreiber

@arthurschreiber arthurschreiber commented Oct 5, 2016

Copy link
Copy Markdown
Member

On macOS (and probably on other systems as well), regcomp behaves differently based on the current locale.

On macOS, this was causing some issues when enabling diff drivers: The [\xc0-\xff][\x80-\xbf] part of the regular expression defined at https://github.com/libgit2/libgit2/blob/master/src/userdiff.h#L23 was incorrectly parsed as multibyte if the locale character type was set to UTF-8 (the default on macOS), and made regcomp fail with a REG_ILLSEQ error.

Fortunately, macOS ships with regcomp_l, which allows specifying a fixed locale instead of using whatever locale was set on the system or via environment variables.

/cc @carlosmn @ethomson

Comment thread CMakeLists.txt Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use CHECK_SYMBOL_EXISTS(regcomp_l "xlocale.h" HAVE_REGCOMP_L) instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what it looks like from its documentation. I wonder if CHECK_FUNCTION_EXISTS might work here as well, since it's system-provided.

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch 2 times, most recently from c4fedd9 to 74d243e Compare October 5, 2016 13:11
Comment thread src/unix/posix.h
#include <xlocale.h>
GIT_INLINE(int) p_regcomp(regex_t *preg, const char *pattern, int cflags)
{
return regcomp_l(preg, pattern, cflags, (locale_t) 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If (locale_t) 0 defined somewhere to be special? I've found LC_GLOBAL_LOCALE in the headers which is defined as (locale_t) -1 but not for 0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't find anything either.

Initially, I created a new locale through newlocale(LC_ALL_MASK, "C", (locale_t) 0);, but I tested using (locale_t) 0 directly and that seems to work as well.

We could change it back to using newlocale and storing that locale somewhere if you think that'd be safer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Apple's macOS source code release:

int
tre_regncomp_l(regex_t *preg, const char *regex, size_t n, int cflags, locale_t loc)
{
  ...

  NORMALIZE_LOCALE(loc);

  ...
}
#define NORMALIZE_LOCALE(x) if ((x) == NULL) { \
    (x) = _c_locale; \
} else if ((x) == LC_GLOBAL_LOCALE) { \
    (x) = &__global_locale; \
}

So we should be fine passing (locale_t) 0 (or even just NULL) here. Do you want me to change anything here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newlocale does mention LC_GLOBAL_LOCALE as not being a valid input as the base, but (locale_t) 0 is, so I suppose it's fine to leave it with 0.

@carlosmn

carlosmn commented Oct 5, 2016

Copy link
Copy Markdown
Member

It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet.

@arthurschreiber

Copy link
Copy Markdown
Member Author

It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet.

What git does for testing the userdiff patterns is just looping over every pattern and trying to regcomp each one. Would that be good enough as a test?

@carlosmn

carlosmn commented Oct 5, 2016

Copy link
Copy Markdown
Member

Yes, that should be enough. The main thing would be to test out few patterns which we know should fail if we don't do this fix, to make sure this does do something.

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch from dbc9985 to 371acaa Compare October 5, 2016 16:02
@arthurschreiber

Copy link
Copy Markdown
Member Author

I added some tests, I hope they cover the basic functionality here.

@carlosmn

carlosmn commented Oct 5, 2016

Copy link
Copy Markdown
Member

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch from 0812acc to ab96ca5 Compare October 6, 2016 11:17
@carlosmn carlosmn merged commit d11fcf8 into libgit2:master Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants