Revert "Revert "CI: run libFuzzers in PRs, w/o corpus upload""#99740
Conversation
| # arguments as usual, without any special measures, but initialization of libFuzer driver then should take | ||
| # arguments from FUZZER_ARGS environment variable. | ||
| use_fuzzer_args = parser.getboolean("CI", "FUZZER_ARGS", fallback=False) | ||
| # FUZZER_ARGS flag is used to make it deliver libFuzzer arguments throught FUZZER_ARGS environment variable |
There was a problem hiding this comment.
Minor typo in this comment block: throught → through, and libFuzer → libFuzzer.
There was a problem hiding this comment.
through is fixed, but libFuzer is still a typo in the same comment block (tests/fuzz/runner.py, line 193 in the current head). Please rename it to libFuzzer.
…x_parser_depth, max_parser_backtracks, max_query_size as options, set max_parser_depth = 100, max_parser_backtracks = 1000
…-workflow-conflicts-7a3b # Conflicts: # .github/workflows/pull_request.yml Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
|
|
…-99530-ci-libfuzzer-for-prs
…-workflow-conflicts-7a3b # Conflicts: # .github/workflows/pull_request.yml Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
…heriting constructor Resolve merge conflicts from PR #102159 (KQL conformance tests): - KQLStringFunctions.cpp: take master's getConvertedArgument rewrite for parse_json(dynamic(...)) - ParserKQLMakeSeries.cpp: take master's timespan.toSeconds() fix Update 18 sites in new/modified KQL parsers (ParserKQLCount, ParserKQLTop, ParserKQLUnion, ParserKQLJoin, ParserKQLQuery, ParserKQLStatement) to use the Pos(Tokens&, const Pos&) constructor instead of Pos(Tokens&, max_depth, max_backtracks), preserving recursion depth counters across re-tokenized sub-expressions. Regenerate .github/workflows/pull_request.yml. Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
| try | ||
| { | ||
| size_t pos = 0; | ||
| uint64_t val = std::stoul(it->second, &pos); |
There was a problem hiding this comment.
std::stoul accepts signed text and wraps it to an unsigned value ("-1" -> 18446744073709551615), so invalid negative limits are silently treated as huge parser/AST limits instead of being rejected.
That breaks the contract of these CLI knobs and can disable the intended safety limits in CI by typo. Please reject signed inputs before conversion (or parse as unsigned with strict validation) and fail fast on invalid values.
| try | ||
| { | ||
| size_t pos = 0; | ||
| uint64_t val = std::stoul(it->second, &pos); |
There was a problem hiding this comment.
Same issue here: std::stoul accepts signed strings and wraps them to very large unsigned values ("-1" parses successfully), so malformed negative limits are silently accepted.
Please validate and reject signed input explicitly before conversion, then fail fast; otherwise parser/AST safety limits can be unintentionally neutralized.
…-workflow-conflicts-7a3b # Conflicts: # .github/workflows/pull_request.yml Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
| out.append(*argument); | ||
| } | ||
|
|
||
| if (out.size() > DBMS_DEFAULT_MAX_QUERY_SIZE) |
There was a problem hiding this comment.
These checks use DBMS_DEFAULT_MAX_QUERY_SIZE instead of the active parser limit.
If max_query_size is configured above the default, KQL parsing can still throw here at the default threshold, which makes max_query_size ineffective for this path.
Please compare against the effective query-size limit used by this parse (instead of the hardcoded default), so KQL behavior stays consistent with normal SQL parsing limits.
LLVM Coverage ReportChanged lines: 76.61% (285/372) · Uncovered code |
Avoid calling CLD2 language detection on empty UTF-8 strings in detectLanguage and detectLanguageMixed. This prevents a MemorySanitizer use-of-uninitialized-value in UTF8OneCharLen observed in PR CI run #99740. Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
Commit b5a006c ("add parameters for select_parser_fuzzer, set default threshoulds lower", May 11 2026) made the per-input parser limits in `select_parser_fuzzer` configurable via `-ignore_remaining_args=1` plus `-max_parser_backtracks=N`. As a side effect it changed the in-source default for `max_parser_backtracks` from `5000` to `DBMS_DEFAULT_MAX_PARSER_BACKTRACKS` (= 1,000,000), i.e. a 200x increase. For sibling `create_parser_fuzzer`, the same author followed up with commit e9453b7 ("tune up create_parser_fuzzer settings") that restored a fuzzer-safe budget by adding [fuzzer_arguments] max_parser_backtracks = 10000 to `tests/fuzz/create_parser_fuzzer.options`. The matching update to `tests/fuzz/select_parser_fuzzer.options` was missed, so `libFuzzer` runs of `select_parser_fuzzer` parse each input with a 1,000,000-step backtrack budget. On pathological inputs (deeply-nested parentheses with mixed function-call / subquery shapes — exactly the pattern of closed issue ClickHouse#92282) the parser spends 20-26s before bailing, which is above the 20s `libFuzzer` timeout. CIDB confirms the regression: day select_parser_fuzzer FAIL/ERROR distinct PRs 2026-04-19 1 1 (PR ClickHouse#99740, author's branch) ... 2026-05-11 1 1 (PR ClickHouse#99740) 2026-05-13 24 24 (24 unrelated PRs) 2026-05-14 13 13 (13 unrelated PRs) The spike on 2026-05-13 starts exactly the first day master PRs picked up commit b5a006c. Every recent failure is a `timeout-*` artefact of the form `timeout after 20-26 seconds`, matching issue ClickHouse#92282. The fix is the minimum config change to restore the previously known- good per-input backtrack budget for `select_parser_fuzzer`, mirroring the `create_parser_fuzzer.options` fix. Local mechanism reproduction (clickhouse-local, debug build): SELECT te(((...((SELECT tuple(... (~470 bytes, malformed) --max_parser_depth=300 --max_parser_backtracks=10000 -> 0.16s (TOO_SLOW_PARSING) --max_parser_depth=300 --max_parser_backtracks=1000000 -> 0.92s (~6x slower for the same input) Pre-PR validation gate: - Deterministic repro: yes (see above). - Root cause explained: yes (regression date matches commit b5a006c). - Fix matches root cause: yes (restore 10k backtrack budget that was intentionally set in PR ClickHouse#92325 to fix the timeout class in ClickHouse#92282). - Test intent preserved: yes — fuzzer still exercises the SELECT parser with the same depth (300/150), AST element / depth limits, and corpus. Only the per-input backtrack budget changes. - Both directions demonstrated: yes — local clickhouse-local repro shows the 6x time difference; CIDB shows the regression spike exactly aligned with the commit. Refs: - Tracker issue ClickHouse#93438 ("libFuzzer tests are flaky"). - Originating closed issue ClickHouse#92282 (`select_parser_fuzzer` timeout on deeply nested parens). - Sibling fix: commit e9453b7 (`create_parser_fuzzer.options`). - Regression: commit b5a006c (`select_parser_fuzzer.cpp`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`clickhouse_fuzzer` runs the full `clickhouse-local` binary (see
`programs/local/fuzzers/clickhouse_fuzzer.cpp`) which spins up a complete
clickhouse-server-grade runtime: JeMalloc arenas, OpenSSL initializer,
static initializers, and full query execution. Baseline RSS is close to
the libFuzzer default `rss_limit_mb=2048`, so fuzzed inputs that trigger
even moderate query-time allocations push the process past the limit.
CIDB shows 7 unrelated PRs in 7 days all hitting OOM at 2049-2056 Mb —
just barely over the 2048 Mb default. Failure mode is uniform:
libFuzzer out-of-memory (used: 2049-2056Mb; limit: 2048Mb) oom-<hash>
Bumping to 4096 Mb follows existing precedent for fuzzers with similar
workloads:
- `execute_query_fuzzer.options`: rss_limit_mb = 4096 (also runs query
execution; same author bumped this in commit bafb7d7)
- `data_type_deserialization_fuzzer.options`: rss_limit_mb = 4096
- `create_parser_fuzzer.options`: rss_limit_mb = 6144
The limit still bounds the target (it does not disable the check via
`rss_limit_mb=0`), so genuine memory regressions above 4 GB will still
be caught.
CIDB evidence (last 30 days, all 10 hits — 0 on master because libFuzzer
does not run on master regularly):
PR ClickHouse#99740 (4 hits, Apr 19-21)
PR ClickHouse#104231, ClickHouse#104849, ClickHouse#104956, ClickHouse#96844, ClickHouse#104510, ClickHouse#104492 (1 hit each,
May 13-15)
libfuzzer arms `setitimer(ITIMER_REAL)` with an interval of `UnitTimeoutSec / 2 + 1` seconds (11 s for our 20 s timeout), so SIGALRM fires periodically — not only at the actual timeout. The wrapper handler in `clickhouse_fuzzer.cpp` had three issues that together produced spurious "timeout after 21 seconds" failures (seen on PRs ClickHouse#99740 and ClickHouse#104869): 1. SIGALRM has no per-thread routing. The handler is installed process-wide via `sigaction`, so the kernel can deliver it to either the libfuzzer main thread or the runner thread. When it landed on the runner, `fuzzerSigalrmHandler` `pthread_kill`'d SIGUSR1 to itself and raced with a concurrent invocation on the main thread, with two competing `sigaction` restores corrupting handler state. 2. The slow runner-stack dump (shells out to llvm-symbolizer) ran on every periodic SIGALRM, even when the iteration was nowhere near its timeout. The dump alone takes many seconds; combined with the 3-second wait-for-stack window it could push an otherwise-fast iteration past the 20 s limit and trigger a real timeout. 3. The wrapper permanently restored libfuzzer's original handler before `raise(SIGALRM)`, making itself one-shot. A periodic SIGALRM at second ~11 consumed the wrapper, so when the genuine timeout fired at second 20 there was no longer any wrapper to dump the runner stack. Fix all three: - Block SIGALRM on the runner thread at the top of `runLibFuzzer` so the wrapper only ever runs on the libfuzzer main thread. - Track per-iteration start time via `clock_gettime(CLOCK_MONOTONIC)` and skip the SIGUSR1 stack-dump unless elapsed ≥ 15 s, so the 11 s periodic alarms pass through without disturbing the runner. Guard with an atomic to keep the dump single-shot per process. - Call libfuzzer's original handler directly (via the saved `sigaction.sa_sigaction` / `sa_handler`) instead of permanently restoring it. The wrapper stays installed and keeps intercepting later alarms. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104869&sha=4bb6825eb6220feaca31f3d2608dae09d5de7927&name_0=PR&name_1=libFuzzer%20tests ClickHouse#104869 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts #99696
Version info
26.5.1.611