Fix Bad cast for Const-wrapped Variant with LowCardinality member in set index analysis by groeneai · Pull Request #107373 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix Bad cast for Const-wrapped Variant with LowCardinality member in set index analysis#107373

Closed
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-convertToFullIfNeeded-variant-lowcardinality-set-index
Closed

Fix Bad cast for Const-wrapped Variant with LowCardinality member in set index analysis#107373
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-convertToFullIfNeeded-variant-lowcardinality-set-index

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

Related: #97854
Related: #107111

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a Bad cast from type DB::ColumnString to DB::ColumnLowCardinality logical error (server crash in debug builds) when a constant of a Variant type that has a LowCardinality member is used as a set element for an IN predicate over a MergeTree skip index.

Description

Found by the server-side AST fuzzer on master (Stress test (experimental, serverfuzz, arm_ubsan), commit c20a6032517588c99638787af78a476c8c40b3d0, 2026-06-12; first seen on amd_tsan 2026-05-24). STID 4054-600e. Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=c20a6032517588c99638787af78a476c8c40b3d0&name_0=MasterCI&name_1=Stress%20test%20%28experimental%2C%20serverfuzz%2C%20arm_ubsan%29

Minimal reproducer (debug build, aborts the server):

SET allow_suspicious_low_cardinality_types = 1, allow_experimental_variant_type = 1, allow_suspicious_variant_types = 1;
CREATE TABLE t (key Int, s String, INDEX i sipHash64(s) TYPE minmax GRANULARITY 1) ENGINE = MergeTree ORDER BY key;
INSERT INTO t SELECT number, toString(number) FROM numbers(16);
SELECT count() FROM t WHERE s IN (SELECT _CAST('5', 'Variant(Int256, LowCardinality(String))'));

Root cause: the set element column is materialized with IColumn::convertToFullIfNeeded. For a Const(Variant(..., LowCardinality(String))) the default implementation first strips the outer Const, then iterates the unwrapped ColumnVariant's members via forEachSubcolumn and converts each to a full column, stripping the inner LowCardinality. That bypasses ColumnVariant::convertToFullIfNeeded, which is overridden to a no-op (issue #97854) precisely to keep the column consistent with DataTypeVariant (whose members are sorted by name and cannot follow recursive stripping). The set element type, processed by recursiveRemoveLowCardinality, keeps the inner LowCardinality. The resulting column/type mismatch makes the Variant -> String cast wrapper expect a ColumnLowCardinality member but receive a ColumnString, aborting in KeyCondition::applyDeterministicDagToColumn (via tryPrepareSetIndexForIn -> tryPrepareSetColumnsForIndex).

This is the prepared-set sibling of the path fixed in #107111 (which handled the equals/tuple-literal path in KeyCondition but could not fix an input that is already column/type-asymmetric on arrival). It also completes the fix for issue #97854, which added the ColumnVariant/ColumnDynamic override but only protected a top-level ColumnVariant, not one reached after unwrapping an outer Const (or Tuple/Array/etc.).

Fix: in the default IColumn::convertToFullIfNeeded, when an outer wrapper (Const/Replicated/Sparse/LowCardinality) is stripped, delegate to the unwrapped column's own convertToFullIfNeeded so its override is honored, instead of recursing into its subcolumns here. For columns without an override the delegate's default implementation performs the same subcolumn recursion, so behavior is unchanged.

Regression test 06673_set_index_variant_low_cardinality added: crashes the server on the pre-fix binary, passes with the fix. Verified the existing Variant/LowCardinality/set-index test families (including 04329, 03991, 03987, 04141) and a broader sweep of convertToFullIfNeeded consumers still pass with no behavioral change.

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @Algunenano @Avogar — could you review this? It completes the ColumnVariant/ColumnDynamic convertToFullIfNeeded fix from #97854: the no-op override was bypassed when the Variant is reached through the default IColumn::convertToFullIfNeeded recursion after unwrapping an outer Const (e.g. a Variant(..., LowCardinality(String)) constant used as an IN-set element over a skip index), stripping the inner LowCardinality and causing a Bad cast in KeyCondition. Fix delegates to the unwrapped column's own convertToFullIfNeeded so the override is honored.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 12, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [333de7f]

Summary:


AI Review

Summary

This PR fixes IColumn::convertToFullIfNeeded so an outer ColumnConst, ColumnReplicated, ColumnSparse, or ColumnLowCardinality delegates to the unwrapped column's own convertToFullIfNeeded, preserving the ColumnVariant and ColumnDynamic invariants while keeping container columns such as ColumnArray on the normal subcolumn-recursion path. The current code addresses the prior review thread and the regression test covers both the Variant/LowCardinality mismatch and the ColumnArray recursion case; I did not find unresolved issues that need inline comments.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 12, 2026
Comment thread src/Columns/IColumn.h
@groeneai groeneai force-pushed the groeneai/fix-convertToFullIfNeeded-variant-lowcardinality-set-index branch from d37269f to 41bc5b5 Compare June 12, 2026 23:38
@groeneai

Copy link
Copy Markdown
Contributor Author

Re-pushed 41bc5b5 to fix the Style check (test_numbers_check) failure on the prior commit: the regression test had been numbered 06673, which the gap check rejected (Gap (4336, 6673) > 100). Renamed it to the next free number, 04337_set_index_variant_low_cardinality, and rebased onto current master (absorbs #107364, which had poisoned arm_tidy on the old base).

The C++ fix in src/Columns/IColumn.h is unchanged. Re-validated against the post-fix binary: the IN-set crash query no longer aborts (clean CANNOT_CONVERT_TYPE), the valid-Variant cases return correct counts, and 04337 matches its reference (0/0/2) with the server alive throughout.

…ant with LowCardinality member

A constant of a Variant type that has a LowCardinality member, used as a
prepared-set element for an IN predicate over a MergeTree skip index, crashed
the server with "Bad cast from type DB::ColumnString to DB::ColumnLowCardinality"
in KeyCondition::applyDeterministicDagToColumn (via tryPrepareSetIndexForIn ->
tryPrepareSetColumnsForIndex).

The set element column is materialized with IColumn::convertToFullIfNeeded.
For a Const(Variant(..., LowCardinality(String))) it first strips the outer
Const, then iterates the unwrapped ColumnVariant's members via forEachSubcolumn
and converts each to full, stripping the inner LowCardinality. That bypasses
ColumnVariant::convertToFullIfNeeded, which is overridden to a no-op precisely
to keep the column consistent with DataTypeVariant (whose members are sorted by
name and cannot follow recursive stripping). The set element type keeps the
inner LowCardinality, so the Variant->String cast wrapper expected a
ColumnLowCardinality member but received a ColumnString and aborted.

Fix: in the default IColumn::convertToFullIfNeeded, when an outer wrapper is
stripped, delegate to the unwrapped column's own convertToFullIfNeeded so its
override is honored, instead of recursing into its subcolumns here. For columns
without an override the delegate's default implementation performs the same
subcolumn recursion, so behavior is unchanged. This completes the fix from
issue ClickHouse#97854, which only protected a top-level ColumnVariant.

The delegation is gated on the column actually being an outer wrapper
(isConst/isReplicated/isSparse/lowCardinality), not on a pointer difference
after the strip chain. ColumnArray::convertToFullColumnIfConst always allocates
a fresh ColumnArray even for a plain array, so a pointer-difference guard made
any array column delegate to an equivalent fresh array forever (stack overflow
on array IN-sets via Set::insertFromColumns). Container columns now fall through
to the normal subcolumn recursion as before.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai force-pushed the groeneai/fix-convertToFullIfNeeded-variant-lowcardinality-set-index branch from 41bc5b5 to 4e46276 Compare June 12, 2026 23:59
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (follow-up: ColumnArray recursion fix, commit 4e46276)

# Question Answer
a Deterministic repro? Yes. SELECT [1,2] IN (SELECT [number, number + 1] FROM numbers(3)) segfaults (stack overflow) every time on the pre-fix binary; passes with the fix.
b Root cause explained? The guard used converted.get() != this to mean "an outer wrapper was stripped". ColumnArray::convertToFullColumnIfConst unconditionally returns a freshly created ColumnArray, so for any array the pointer always differs, the code delegates to an equivalent fresh array, and recurses forever. Array IN-sets reach this via Set::insertFromColumns -> convertToFullIfNeeded.
c Fix matches root cause? Yes. Delegation is now gated on the column actually being a wrapper: isConst() || isReplicated() || isSparse() || lowCardinality(). The four predicates map 1:1 to the four strip calls. ColumnArray overrides none, so it falls through to the normal subcolumn recursion (pre-PR behavior). && converted.get() != this is kept for guaranteed termination.
d Test intent preserved / new tests added? Preserved. Added three array IN-set cases plus a liveness check to the regression test; the original Variant/LowCardinality assertions are unchanged.
e Both directions demonstrated? Yes. Pre-fix: segfault on the array IN-set, server dies. Post-fix: returns 1, server alive, 30/30 in a loop; clickhouse-test 10/10 under full randomization.
f Fix is general, not a narrow patch? Yes. The fix targets the wrapper-detection logic itself, so it covers all container columns, not just ColumnArray. Checked Tuple/Map inherit the no-op convertToFullColumnIfConst (no fresh alloc) and Nullable strips only its nested column; none take the delegation branch. Sibling test 04329 and the variant/low_cardinality/set_index families pass with no behavior change.

Session id: cron:clickhouse-worker-slot-3:20260612-224000

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on 4e46276. All red checks are pre-existing failures unrelated to this change (this PR only edits src/Columns/IColumn.h convertToFullIfNeeded plus the 04337 regression test):

The convertToFullIfNeeded fix itself is green across all stateless and sanitizer jobs and AI-review approved.

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.40% +0.10%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 9/9 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI re-validated on HEAD 333de7f05f47 after the automatic master-merge into the branch: green. 148 checks pass, Finish Workflow and Mergeable Check both SUCCESS, zero test-level failures in CIDB. The single red mark is CH Inc sync (internal ClickHouse-Inc fork-sync bot, always red for external contributors, not a test).

The convertToFullIfNeeded wrapper-type-guard fix (Variant+LC Bad cast STID 4054-600e plus the ColumnArray infinite-recursion bot finding) holds across all stateless + sanitizer flaky-check jobs. Ready for review.

@Avogar

Avogar commented Jun 17, 2026

Copy link
Copy Markdown
Member

I don't like this solution. And current implementation of convertToFullColumnIfNeeded, so I decided to change it here - #107773. It will also fix attached issue.

@Avogar Avogar closed this Jun 17, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants