Make sure we use the C locale for regcomp on macOS.#3953
Conversation
There was a problem hiding this comment.
Should this use CHECK_SYMBOL_EXISTS(regcomp_l "xlocale.h" HAVE_REGCOMP_L) instead?
There was a problem hiding this comment.
Yes, that's what it looks like from its documentation. I wonder if CHECK_FUNCTION_EXISTS might work here as well, since it's system-provided.
c4fedd9 to
74d243e
Compare
| #include <xlocale.h> | ||
| GIT_INLINE(int) p_regcomp(regex_t *preg, const char *pattern, int cflags) | ||
| { | ||
| return regcomp_l(preg, pattern, cflags, (locale_t) 0); |
There was a problem hiding this comment.
If (locale_t) 0 defined somewhere to be special? I've found LC_GLOBAL_LOCALE in the headers which is defined as (locale_t) -1 but not for 0.
There was a problem hiding this comment.
Unfortunately, I can't find anything either.
Initially, I created a new locale through newlocale(LC_ALL_MASK, "C", (locale_t) 0);, but I tested using (locale_t) 0 directly and that seems to work as well.
We could change it back to using newlocale and storing that locale somewhere if you think that'd be safer?
There was a problem hiding this comment.
From Apple's macOS source code release:
int
tre_regncomp_l(regex_t *preg, const char *regex, size_t n, int cflags, locale_t loc)
{
...
NORMALIZE_LOCALE(loc);
...
}#define NORMALIZE_LOCALE(x) if ((x) == NULL) { \
(x) = _c_locale; \
} else if ((x) == LC_GLOBAL_LOCALE) { \
(x) = &__global_locale; \
}So we should be fine passing (locale_t) 0 (or even just NULL) here. Do you want me to change anything here?
There was a problem hiding this comment.
newlocale does mention LC_GLOBAL_LOCALE as not being a valid input as the base, but (locale_t) 0 is, so I suppose it's fine to leave it with 0.
|
It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet. |
What |
|
Yes, that should be enough. The main thing would be to test out few patterns which we know should fail if we don't do this fix, to make sure this does do something. |
dbc9985 to
371acaa
Compare
|
I added some tests, I hope they cover the basic functionality here. |
0812acc to
ab96ca5
Compare

On macOS (and probably on other systems as well),
regcompbehaves differently based on the currentlocale.On macOS, this was causing some issues when enabling diff drivers: The
[\xc0-\xff][\x80-\xbf]part of the regular expression defined at https://github.com/libgit2/libgit2/blob/master/src/userdiff.h#L23 was incorrectly parsed as multibyte if the locale character type was set toUTF-8(the default on macOS), and maderegcompfail with aREG_ILLSEQerror.Fortunately, macOS ships with
regcomp_l, which allows specifying a fixed locale instead of using whatever locale was set on the system or via environment variables./cc @carlosmn @ethomson