Revert "Revert "CI: run libFuzzers in PRs, w/o corpus upload"" by yakov-olkhovskiy · Pull Request #99740 · ClickHouse/ClickHouse · GitHub
Skip to content

Revert "Revert "CI: run libFuzzers in PRs, w/o corpus upload""#99740

Merged
alexey-milovidov merged 54 commits into
masterfrom
revert-99696-revert-99530-ci-libfuzzer-for-prs
May 13, 2026
Merged

Revert "Revert "CI: run libFuzzers in PRs, w/o corpus upload""#99740
alexey-milovidov merged 54 commits into
masterfrom
revert-99696-revert-99530-ci-libfuzzer-for-prs

Conversation

@yakov-olkhovskiy

@yakov-olkhovskiy yakov-olkhovskiy commented Mar 17, 2026

Copy link
Copy Markdown
Member

Reverts #99696

Version info

  • Merged into: 26.5.1.611

@clickhouse-gh

clickhouse-gh Bot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 17, 2026
Comment thread tests/fuzz/runner.py Outdated
# arguments as usual, without any special measures, but initialization of libFuzer driver then should take
# arguments from FUZZER_ARGS environment variable.
use_fuzzer_args = parser.getboolean("CI", "FUZZER_ARGS", fallback=False)
# FUZZER_ARGS flag is used to make it deliver libFuzzer arguments throught FUZZER_ARGS environment variable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo in this comment block: throughtthrough, and libFuzerlibFuzzer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

through is fixed, but libFuzer is still a typo in the same comment block (tests/fuzz/runner.py, line 193 in the current head). Please rename it to libFuzzer.

Comment thread tests/fuzz/runner.py
Comment thread ci/jobs/libfuzzer_test_check.py
alexey-milovidov and others added 2 commits March 19, 2026 05:20
…x_parser_depth, max_parser_backtracks, max_query_size as options, set max_parser_depth = 100, max_parser_backtracks = 1000
yakov-olkhovskiy and others added 4 commits March 19, 2026 17:17
…-workflow-conflicts-7a3b

# Conflicts:
#	.github/workflows/pull_request.yml

Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
@CLAassistant

CLAassistant commented Mar 20, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ yakov-olkhovskiy
✅ alexey-milovidov
❌ cursoragent
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread src/Interpreters/fuzzers/execute_query_fuzzer.cpp
Comment thread tests/fuzz/runner.py
cursoragent and others added 3 commits April 8, 2026 20:22
…-workflow-conflicts-7a3b

# Conflicts:
#	.github/workflows/pull_request.yml

Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
cursoragent and others added 4 commits May 11, 2026 02:03
…heriting constructor

Resolve merge conflicts from PR #102159 (KQL conformance tests):
- KQLStringFunctions.cpp: take master's getConvertedArgument rewrite for parse_json(dynamic(...))
- ParserKQLMakeSeries.cpp: take master's timespan.toSeconds() fix

Update 18 sites in new/modified KQL parsers (ParserKQLCount, ParserKQLTop,
ParserKQLUnion, ParserKQLJoin, ParserKQLQuery, ParserKQLStatement) to use
the Pos(Tokens&, const Pos&) constructor instead of Pos(Tokens&, max_depth, max_backtracks),
preserving recursion depth counters across re-tokenized sub-expressions.

Regenerate .github/workflows/pull_request.yml.

Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
try
{
size_t pos = 0;
uint64_t val = std::stoul(it->second, &pos);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::stoul accepts signed text and wraps it to an unsigned value ("-1" -> 18446744073709551615), so invalid negative limits are silently treated as huge parser/AST limits instead of being rejected.

That breaks the contract of these CLI knobs and can disable the intended safety limits in CI by typo. Please reject signed inputs before conversion (or parse as unsigned with strict validation) and fail fast on invalid values.

try
{
size_t pos = 0;
uint64_t val = std::stoul(it->second, &pos);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: std::stoul accepts signed strings and wraps them to very large unsigned values ("-1" parses successfully), so malformed negative limits are silently accepted.

Please validate and reject signed input explicitly before conversion, then fail fast; otherwise parser/AST safety limits can be unintentionally neutralized.

…-workflow-conflicts-7a3b

# Conflicts:
#	.github/workflows/pull_request.yml

Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
out.append(*argument);
}

if (out.size() > DBMS_DEFAULT_MAX_QUERY_SIZE)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks use DBMS_DEFAULT_MAX_QUERY_SIZE instead of the active parser limit.
If max_query_size is configured above the default, KQL parsing can still throw here at the default threshold, which makes max_query_size ineffective for this path.

Please compare against the effective query-size limit used by this parse (instead of the hardcoded default), so KQL behavior stays consistent with normal SQL parsing limits.

@clickhouse-gh

clickhouse-gh Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 76.61% (285/372) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov self-assigned this May 13, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 13, 2026
Merged via the queue into master with commit 3bbd581 May 13, 2026
167 checks passed
@alexey-milovidov alexey-milovidov deleted the revert-99696-revert-99530-ci-libfuzzer-for-prs branch May 13, 2026 12:41
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 13, 2026
github-merge-queue Bot pushed a commit that referenced this pull request May 13, 2026
Avoid calling CLD2 language detection on empty UTF-8 strings in detectLanguage and detectLanguageMixed. This prevents a MemorySanitizer use-of-uninitialized-value in UTF8OneCharLen observed in PR CI run #99740.

Co-authored-by: Yakov Olkhovskiy <yakov-olkhovskiy@users.noreply.github.com>
pull Bot pushed a commit to mikeyhodl/ClickHouse that referenced this pull request May 14, 2026
Commit b5a006c ("add parameters for select_parser_fuzzer, set default
threshoulds lower", May 11 2026) made the per-input parser limits in
`select_parser_fuzzer` configurable via `-ignore_remaining_args=1` plus
`-max_parser_backtracks=N`. As a side effect it changed the in-source
default for `max_parser_backtracks` from `5000` to
`DBMS_DEFAULT_MAX_PARSER_BACKTRACKS` (= 1,000,000), i.e. a 200x increase.

For sibling `create_parser_fuzzer`, the same author followed up with
commit e9453b7 ("tune up create_parser_fuzzer settings") that
restored a fuzzer-safe budget by adding

    [fuzzer_arguments]
    max_parser_backtracks = 10000

to `tests/fuzz/create_parser_fuzzer.options`. The matching update to
`tests/fuzz/select_parser_fuzzer.options` was missed, so `libFuzzer`
runs of `select_parser_fuzzer` parse each input with a 1,000,000-step
backtrack budget. On pathological inputs (deeply-nested parentheses
with mixed function-call / subquery shapes — exactly the pattern of
closed issue ClickHouse#92282) the parser spends 20-26s before bailing, which is
above the 20s `libFuzzer` timeout.

CIDB confirms the regression:

    day         select_parser_fuzzer FAIL/ERROR     distinct PRs
    2026-04-19  1                                   1 (PR ClickHouse#99740, author's branch)
    ...
    2026-05-11  1                                   1 (PR ClickHouse#99740)
    2026-05-13  24                                  24 (24 unrelated PRs)
    2026-05-14  13                                  13 (13 unrelated PRs)

The spike on 2026-05-13 starts exactly the first day master PRs picked
up commit b5a006c. Every recent failure is a `timeout-*` artefact
of the form `timeout after 20-26 seconds`, matching issue ClickHouse#92282.

The fix is the minimum config change to restore the previously known-
good per-input backtrack budget for `select_parser_fuzzer`, mirroring
the `create_parser_fuzzer.options` fix.

Local mechanism reproduction (clickhouse-local, debug build):

    SELECT te(((...((SELECT tuple(... (~470 bytes, malformed)
      --max_parser_depth=300 --max_parser_backtracks=10000   -> 0.16s (TOO_SLOW_PARSING)
      --max_parser_depth=300 --max_parser_backtracks=1000000 -> 0.92s (~6x slower for the same input)

Pre-PR validation gate:
- Deterministic repro: yes (see above).
- Root cause explained: yes (regression date matches commit b5a006c).
- Fix matches root cause: yes (restore 10k backtrack budget that was
  intentionally set in PR ClickHouse#92325 to fix the timeout class in ClickHouse#92282).
- Test intent preserved: yes — fuzzer still exercises the SELECT parser
  with the same depth (300/150), AST element / depth limits, and corpus.
  Only the per-input backtrack budget changes.
- Both directions demonstrated: yes — local clickhouse-local repro shows
  the 6x time difference; CIDB shows the regression spike exactly aligned
  with the commit.

Refs:
- Tracker issue ClickHouse#93438 ("libFuzzer tests are flaky").
- Originating closed issue ClickHouse#92282 (`select_parser_fuzzer` timeout on
  deeply nested parens).
- Sibling fix: commit e9453b7 (`create_parser_fuzzer.options`).
- Regression: commit b5a006c (`select_parser_fuzzer.cpp`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
BadLiveware pushed a commit to BadLiveware/ClickHouse that referenced this pull request May 16, 2026
`clickhouse_fuzzer` runs the full `clickhouse-local` binary (see
`programs/local/fuzzers/clickhouse_fuzzer.cpp`) which spins up a complete
clickhouse-server-grade runtime: JeMalloc arenas, OpenSSL initializer,
static initializers, and full query execution. Baseline RSS is close to
the libFuzzer default `rss_limit_mb=2048`, so fuzzed inputs that trigger
even moderate query-time allocations push the process past the limit.

CIDB shows 7 unrelated PRs in 7 days all hitting OOM at 2049-2056 Mb —
just barely over the 2048 Mb default. Failure mode is uniform:

    libFuzzer  out-of-memory (used: 2049-2056Mb; limit: 2048Mb)  oom-<hash>

Bumping to 4096 Mb follows existing precedent for fuzzers with similar
workloads:
  - `execute_query_fuzzer.options`: rss_limit_mb = 4096 (also runs query
    execution; same author bumped this in commit bafb7d7)
  - `data_type_deserialization_fuzzer.options`: rss_limit_mb = 4096
  - `create_parser_fuzzer.options`: rss_limit_mb = 6144

The limit still bounds the target (it does not disable the check via
`rss_limit_mb=0`), so genuine memory regressions above 4 GB will still
be caught.

CIDB evidence (last 30 days, all 10 hits — 0 on master because libFuzzer
does not run on master regularly):
    PR ClickHouse#99740  (4 hits, Apr 19-21)
    PR ClickHouse#104231, ClickHouse#104849, ClickHouse#104956, ClickHouse#96844, ClickHouse#104510, ClickHouse#104492  (1 hit each,
    May 13-15)
alexey-milovidov added a commit to BoloniniD/ClickHouse that referenced this pull request May 20, 2026
libfuzzer arms `setitimer(ITIMER_REAL)` with an interval of
`UnitTimeoutSec / 2 + 1` seconds (11 s for our 20 s timeout), so SIGALRM
fires periodically — not only at the actual timeout. The wrapper handler
in `clickhouse_fuzzer.cpp` had three issues that together produced
spurious "timeout after 21 seconds" failures (seen on PRs ClickHouse#99740 and
ClickHouse#104869):

1. SIGALRM has no per-thread routing. The handler is installed
   process-wide via `sigaction`, so the kernel can deliver it to either
   the libfuzzer main thread or the runner thread. When it landed on the
   runner, `fuzzerSigalrmHandler` `pthread_kill`'d SIGUSR1 to itself and
   raced with a concurrent invocation on the main thread, with two
   competing `sigaction` restores corrupting handler state.
2. The slow runner-stack dump (shells out to llvm-symbolizer) ran on
   every periodic SIGALRM, even when the iteration was nowhere near its
   timeout. The dump alone takes many seconds; combined with the 3-second
   wait-for-stack window it could push an otherwise-fast iteration past
   the 20 s limit and trigger a real timeout.
3. The wrapper permanently restored libfuzzer's original handler before
   `raise(SIGALRM)`, making itself one-shot. A periodic SIGALRM at second
   ~11 consumed the wrapper, so when the genuine timeout fired at
   second 20 there was no longer any wrapper to dump the runner stack.

Fix all three:

- Block SIGALRM on the runner thread at the top of `runLibFuzzer` so the
  wrapper only ever runs on the libfuzzer main thread.
- Track per-iteration start time via `clock_gettime(CLOCK_MONOTONIC)` and
  skip the SIGUSR1 stack-dump unless elapsed ≥ 15 s, so the 11 s
  periodic alarms pass through without disturbing the runner. Guard with
  an atomic to keep the dump single-shot per process.
- Call libfuzzer's original handler directly (via the saved
  `sigaction.sa_sigaction` / `sa_handler`) instead of permanently
  restoring it. The wrapper stays installed and keeps intercepting
  later alarms.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104869&sha=4bb6825eb6220feaca31f3d2608dae09d5de7927&name_0=PR&name_1=libFuzzer%20tests
ClickHouse#104869

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants