Cap BuzzHouse max_depth in CI to prevent ASan allocation-size-too-big#103736
Conversation
BuzzHouse value generators for compound types `Map` and `Array` recurse on
their child types when producing SQL string values, multiplying the per-level
row count at each nesting level. With the CI fuzzer config — `max_nested_rows`
up to ~100 and `max_depth` up to 5 — that gives up to 100^5 ≈ 10^10 entries
in a single value, which trips ASan's `allocation-size-too-big` limit and
aborts the fuzzer mid-run.
Observed in 4 unrelated PRs in the last 30 days on `BuzzHouse (arm_asan_ubsan)`,
always with the same stack through `BuzzHouse::MapType::appendRandomRawValue`
(`SQLTypes.cpp:1280/:1282`):
SUMMARY: AddressSanitizer: allocation-size-too-big
std::__1::basic_string::__grow_by_and_replace
BuzzHouse::MapType::appendRandomRawValue
BuzzHouse::MapType::appendRandomRawValue
BuzzHouse::StatementGenerator::strAppendAnyValue
BuzzHouse::StatementGenerator::generateUpdateSets
Add a public `value_gen_depth` counter on `StatementGenerator` (kept distinct
from the existing private `depth` field, which tracks type-generation
recursion in `randomNextType`/`randomAggregateType`) and a small helper
`depthCappedNestedRows` that bit-shifts the configured `max_nested_rows`
cap by `2 * depth`. The cap is applied in `MapType::appendRandomRawValue`,
`MapType::insertNumberEntry`, `ArrayType::appendRandomRawValue`, and
`ArrayType::insertNumberEntry` — the four sites that loop multiplicatively
over a randomly-sampled row count.
The total entry count is then bounded by a small polynomial in
`max_nested_rows` instead of exponential. With the CI worst case
(`max_nested_rows = 105`), the cap sequence is 105, 26, 6, 1, 1, ..., so the
product across 5 levels stays under ~16k entries instead of ~13 billion.
Symmetric subsystems checked:
- `TupleType` value gen iterates a fixed `subtypes` vector — no
per-level row sampling, no multiplicative blowup.
- `VariantType` value gen picks a single subtype — no loop.
- `QBitType` iterates the type-fixed `dimension` — bounded by type.
- `AggregateFunctionType` already uses the private `depth` counter for
its own recursion guard.
Validation:
- Built with `-DENABLE_BUZZHOUSE=1` (clang-21).
- Smoke-tested: `clickhouse-client --buzz-house-config=...` ran for 90 s
against a local server with `max_depth=5`, `max_nested_rows=100` — no
ASan abort, no `allocation-size-too-big`, exit on timeout as expected.
- Math validated standalone: `depthCappedNestedRows(105, 0..6)` =
`{105, 26, 6, 1, 1, 1, 1}`; product across depth 0..4 is 16380.
|
Pre-PR validation gate (per
|
|
cc @PedroTadim @alexey-milovidov — could you review this? @PedroTadim wrote ~1500 of the BuzzHouse commits and is the natural owner of |
|
@groeneai maybe you can update the range of values that can be used at buzzhouse_job.py instead |
|
@PedroTadim — thanks for the suggestion. Looking at the current bounds in allow_hardcoded_inserts = random.choice([True, False])
min_nested_rows = random.randint(0, 5)
max_nested_rows = min_nested_rows + (5 if allow_hardcoded_inserts else 100) # up to ~105
...
"max_depth": random.randint(2, 5), # up to 5Worst case ( Current PR's C++ fix: Config-only alternatives in
I can take any of:
The C++ fix is geometrically tighter (~750× lower per nested level vs. config option 1), so anyone running BuzzHouse outside CI at the documented limits stays safe. The config approach is simpler but leaves the generator unsafe at its advertised ranges. Your call as BuzzHouse owner — happy to revise either way. |
@groeneai go for A, not many people run BuzzHouse. |
Per @PedroTadim (BuzzHouse owner) on ClickHouse#103736: > "@groeneai go for A, not many people run BuzzHouse." Reverts the generator-side cap in `MapType::appendRandomRawValue`, `ArrayType::appendRandomRawValue`, and the `value_gen_depth` counter on `StatementGenerator`. Replaces it with a single-line tightening in `ci/jobs/buzzhouse_job.py`: `max_depth = random.randint(2, 3)` (was `2..5`). Math: with `max_nested_rows` up to 105 and depth 3, the worst-case nested-value product is `105^3 ≈ 1.2 * 10^6` entries instead of `105^5 ≈ 1.3 * 10^10`, which keeps it well below ASan's `allocation-size-too-big` guard. Trade-off: the BuzzHouse generators remain capable of producing ~10^10-entry values at the documented limits — that's a latent bug in the generator, but BuzzHouse is essentially CI-only, so tightening CI config is sufficient and avoids touching generator code that @PedroTadim owns. Triggered by directive on ClickHouse#99537.
|
Thanks @PedroTadim — pushed Option A as The C++ generator-side cap ( "max_depth": random.randint(2, 3), # was random.randint(2, 5)Net PR diff is now 8 lines, single file, only Trade-off accepted as discussed: the BuzzHouse generators stay capable of producing 10^10-entry values at the documented config limits, but since BuzzHouse is essentially CI-only, tightening CI config is sufficient. Branch history kept (commit-then-revert) per repo policy of not rebasing/amending shared branches. PR title updated to reflect the new scope. |
|
Pre-PR validation gate (rewritten approach, commit (a) Deterministic repro? No — original ASan abort needs CI's BuzzHouse random seed at the moment of failure. The math is the deterministic argument: (b) Root cause explained? Yes. (c) Fix matches root cause? Explicitly a config-only band-aid, not a root-cause fix. The generator code remains capable of producing (d) Test intent preserved? / New tests added? BuzzHouse fuzzer behavior at depths 2 and 3 is preserved (these were already 50% of the prior (e) Demonstrated in both directions? With prior (f) Fix is general, not a narrow patch? N/A — config change, not a C++ fix. The wider scope (the generator continues to produce blowing-up values at the documented type-system limits) is acknowledged as out-of-scope per the BuzzHouse owner's directive. |
|
@groeneai can you can keep the longer max depth if the max nested number of rows is very small (like 0 or 1 nested rows). We also need high max depth for larger queries. |
Replace the flat `max_depth = randint(2, 3)` cap with a conditional ladder driven by `max_nested_rows`: max_nested_rows <= 5 -> max_depth up to 5 max_nested_rows <= 20 -> max_depth up to 4 max_nested_rows > 20 -> max_depth up to 3 This preserves deep type-shape exploration (depth 5) for runs that sample a tiny number of nested rows (where `rows^depth` stays trivially small) while keeping the row-heavy non-hardcoded branch (`max_nested_rows in [100, 105]`) under the ASan allocation cap by limiting depth to 3. Worst-case nested-value product `max_nested_rows ^ max_depth_high`: 5^5 = 3125 20^4 = 160000 105^3 = ~1.2M All bounded below ~2e6, well under ASan's ~1e9 `allocation-size-too-big` guard. Implements @PedroTadim's refinement on ClickHouse#103736: "keep the longer max depth if the max nested number of rows is very small (like 0 or 1 nested rows). We also need high max depth for larger queries." Session: cron:clickhouse-ci-task-worker:20260430-034500
|
@PedroTadim done — pushed if max_nested_rows <= 5:
max_depth_high = 5
elif max_nested_rows <= 20:
max_depth_high = 4
else:
max_depth_high = 3
"max_depth": random.randint(2, max_depth_high),Worst-case nested-value product
All branches are below ~2×10^6 entries, well under ASan's ~10^9 Distribution under the existing CI rollDriven by
So the deep-nesting coverage is preserved on a non-trivial fraction of runs (the tiny-rows path you wanted to keep), while the row-heavy path is bounded. Pre-PR validation gatea) Deterministic repro? Worst-case bound is now a deterministic function of b) Root cause explained? BuzzHouse value generators for c) Fix matches root cause? Yes — the geometric product is bounded directly by capping the depth as a function of the row count. The product never exceeds ~2×10^6 in any branch. d) Test intent preserved? Yes — depth-5 coverage is still reachable on the tiny-rows path (where it's safe), and the depth-4 branch covers a meaningful range of e) Both directions demonstrated? Pre-fix: Net diff is now 1 file, +23 / -7 lines, all in Session: |
|
Looks good now, we can merge. |
|
Workflow [PR], commit [7cca6d1] Summary: ✅ AI ReviewSummaryThis PR reduces BuzzHouse Findings
ClickHouse RulesFinal VerdictStatus: Minimum required actions:
|
49ac76c

BuzzHouse value generators for compound types
MapandArrayrecurse ontheir child types when producing SQL string values, multiplying the per-level
row count at each nesting level. With the CI fuzzer config —
max_nested_rowsup to ~100 and
max_depthup to 5 — that gives up to 100^5 ≈ 10^10 entriesin a single value, which trips ASan's
allocation-size-too-biglimit andaborts the fuzzer mid-run.
Observed in 4 unrelated PRs in the last 30 days on
BuzzHouse (arm_asan_ubsan),always with the same stack trace through
BuzzHouse::MapType::appendRandomRawValue(
SQLTypes.cpp:1280/:1282):CIDB cross-PR cross-check (last 30 days, all unrelated PRs):
INDEX_LENGTHcolumn alias toinformation_schema.tables"use_top_k_dynamic_filteringanduse_skip_indexes_for_top_kby default"These four PRs share no code or domain — the bug is in the BuzzHouse client
fuzzer itself, not in any of them.
Approach
Add a public
value_gen_depthcounter onStatementGenerator(kept distinctfrom the existing private
depthfield, which tracks type-generationrecursion in
randomNextType/randomAggregateType) and a small helperdepthCappedNestedRowsthat bit-shifts the configuredmax_nested_rowscap by
2 * depth. The cap is applied inMapType::appendRandomRawValue,MapType::insertNumberEntry,ArrayType::appendRandomRawValue, andArrayType::insertNumberEntry— the four sites that loop multiplicativelyover a randomly-sampled row count.
The total entry count is then bounded by a small polynomial in
max_nested_rowsinstead of exponential. With the CI worst case(
max_nested_rows = 105), the cap sequence is 105, 26, 6, 1, 1, ..., so theproduct across 5 nesting levels stays under ~16k entries instead of ~13 billion.
Symmetric subsystems checked
TupleTypevalue gen iterates a fixedsubtypesvector — noper-level row sampling, no multiplicative blowup.
VariantTypevalue gen picks a single subtype — no loop.QBitTypeiterates the type-fixeddimension— bounded by type.AggregateFunctionTypealready uses the privatedepthcounter forits own recursion guard.
Validation
-DENABLE_BUZZHOUSE=1(clang-21).clickhouse-client --buzz-house-config=...for90 s against a local server with
max_depth=5,max_nested_rows=100,max_string_length=50— no ASan abort, noallocation-size-too-big,exit on timeout as expected.
depthCappedNestedRows(105, 0..6)={105, 26, 6, 1, 1, 1, 1}; product across depth 0..4 is16380.Reproducing the exact ASan trip locally would require running the BuzzHouse
fuzzer with the same random seed CI used at the moment of the failure;
instead, this PR is bounded by analyzing the recursion math and confirming
the cap reduces the worst-case entry count by ~6 orders of magnitude.
Triggered by directive on #99537.
Session: cron:clickhouse-ci-task-worker:20260429-174500
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Version info
26.5.1.190