{{ message }}
Fix use-of-uninitialized-value in StringSearcher.h#100225
Merged
Merged
Conversation
Contributor
Contributor
LLVM Coverage ReportPR changed lines: PR changed-lines coverage: 100.00% (26/26, 0 noise lines excluded) |
evillique
approved these changes
Mar 31, 2026
This was referenced Mar 31, 2026
robot-clickhouse
added a commit
that referenced
this pull request
Mar 31, 2026
This was referenced Mar 31, 2026
robot-clickhouse
added a commit
that referenced
this pull request
Mar 31, 2026
This was referenced Mar 31, 2026
robot-clickhouse
added a commit
that referenced
this pull request
Mar 31, 2026
thevar1able
added a commit
that referenced
this pull request
Apr 14, 2026
Backport #100225 to 26.1: Fix use-of-uninitialized-value in StringSearcher.h
thevar1able
added a commit
that referenced
this pull request
Apr 14, 2026
Backport #100225 to 26.2: Fix use-of-uninitialized-value in StringSearcher.h
thevar1able
added a commit
that referenced
this pull request
Apr 14, 2026
Backport #100225 to 26.3: Fix use-of-uninitialized-value in StringSearcher.h
This was referenced May 18, 2026
leshikus
pushed a commit
to leshikus/ClickHouse
that referenced
this pull request
May 19, 2026
…earcher
`StringSearcher<false, false>` (the UTF-8 case-insensitive variant used by
`ilike`, `positionCaseInsensitiveUTF8`, `startsWithCaseInsensitiveUTF8`,
`endsWithCaseInsensitiveUTF8`, `multiSearchAnyCaseInsensitiveUTF8`, and
`VolnitskyCaseInsensitiveUTF8`) populates the `cachel` / `cacheu` SIMD
registers in its constructor from per-character lowercase / uppercase
UTF-8 byte buffers `l_seq` / `u_seq` (each 6 bytes on the stack,
uninitialized at declaration).
The first character is handled by the outer `if (*needle < 0x80u)` /
`else` block, where the `else` branch already copies bytes verbatim
when `convertUTF8ToCodePoint` reports an invalid UTF-8 sequence.
The same care was missing from the per-character cache loop. Whenever
`convertUTF8ToCodePoint` returned an empty optional for an invalid
UTF-8 sequence in the middle of the needle (so `c_u32` was null and the
`if (c_u32) { ... }` body was skipped), the inner loop still went on to
read `l_seq[j]` / `u_seq[j]` for `j = 0..src_len-1` and feed those
bytes into `cachel` / `cacheu` via `_mm_insert_epi8`. When `src_len`
exceeded the number of bytes written by a previous iteration, the
uninitialized portion of `l_seq` / `u_seq` was inserted into the SIMD
cache and surfaced later as a `MemorySanitizer` use-of-uninitialized-value
in `_mm_cmpeq_epi8` against the haystack in both `compare`
(`StringSearcher.h:496`) and `search` (`StringSearcher.h:563`).
The fix adds the missing `else` branch — when `c_u32` is empty, the
needle bytes are memcpy'd into `l_seq` / `u_seq`, identical to how the
first character is handled.
This is the same chronic family that PR ClickHouse#100225 ("Fix
use-of-uninitialized-value in `StringSearcher.h`") partially addressed
by adding `if (needle == needle_end) return true;` to `compare` for the
empty-needle case. The empty-needle case is fixed; the invalid-UTF-8-
mid-needle case is what this PR closes.
Reported by `Unit tests (msan, function_prop_fuzzer)` across many
unrelated master and PR runs (chronic noise under umbrella issue
ClickHouse#104877), with stack:
StringSearcher.h:563:61 in DB::impl::StringSearcher<false, false>::search
Volnitsky::Volnitsky
MatchImpl::vectorVector
FunctionLike (ilike)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix use-of-uninitialized-value in StringSearcher.h
Resolves #99165
Version info
26.4.1.45626.3.10.3,26.2.15.4,26.1.10.5