Fix QBit column structureEquals (logical error in writeSlice) by groeneai · Pull Request #103084 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix QBit column structureEquals (logical error in writeSlice)#103084

Merged
rienath merged 4 commits into
ClickHouse:masterfrom
groeneai:fix-qbit-structure-equals
Apr 22, 2026
Merged

Fix QBit column structureEquals (logical error in writeSlice)#103084
rienath merged 4 commits into
ClickHouse:masterfrom
groeneai:fix-qbit-structure-equals

Conversation

@groeneai

@groeneai groeneai commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

Motivation

Master BuzzHouse (experimental, serverfuzz, amd_msan) hit a logical error in writeSlice on 2026-04-17 (STID 2670-0e56, commit 4c74bee46e8b):

Logical error: Function writeSlice expects same column types for
GenericArraySlice and GenericArraySink.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=4c74bee46e8b3d257df5320ac90f735593ef61bc&name_0=MasterCI&name_1=BuzzHouse%20%28experimental%2C%20serverfuzz%2C%20amd_msan%29

Root cause

ColumnQBit::structureEquals ignored the right-hand side ColumnQBit wrapper:

bool structureEquals(const IColumn & rhs) const override { return tuple->structureEquals(rhs); }

this->tuple is a ColumnTuple; rhs is another ColumnQBit. ColumnTuple::structureEquals starts with typeid_cast<const ColumnTuple *>(&rhs) and returns false immediately when the argument is any other column type. As a result, QBit columns compared against themselves always reported "different structure", breaking every consumer that uses structureEquals to validate column compatibility.

The visible symptom on master was a fuzzed INSERT pattern:

CREATE TABLE t (c0 Tuple(Map(String, Nullable(QBit(Float64, 4))))) ENGINE = Memory;
INSERT INTO t SELECT NULL;

ConvertingTransform rewrites this as ifNull(NULL::Nullable(Nothing), default_tuple). The dispatch descends through FunctionIfNull::executeImpl -> if -> FunctionIf::executeTuple -> executeImpl -> FunctionIf::executeMap -> executeImpl -> FunctionIf::executeGenericArray -> GatherUtils::conditional -> writeSlice. writeSlice(GenericArraySlice, GenericArraySink) at src/Functions/GatherUtils/Algorithms.h:75-84 checks slice.elements->structureEquals(sink.elements) and throws on false. The slice elements and sink elements are both ColumnNullable(ColumnQBit), structurally identical, but the broken ColumnQBit::structureEquals made the check fail.

Fix

Follow the standard wrapper-column pattern used by ColumnSparse, ColumnReplicated, ColumnLowCardinality, and ColumnNullable: typeid_cast the rhs to ColumnQBit first, then delegate the inner comparison to the unwrapped tuple on both sides.

Also require dimension to match: two QBit columns with different dimensions can pad to the same FixedString byte length (e.g. dimension 1 and dimension 8 both pad to 1 byte), so the inner tuple structure alone does not distinguish them.

Verification

Before fix — server crashes on the minimal reproducer:

Logical error: 'Function writeSlice expects same column types for GenericArraySlice and GenericArraySink.'

After fix — same queries succeed and the server stays alive:

({})                          -- Tuple(Map(String, Nullable(QBit(Float64, 4))))
(NULL,'a',{})                 -- fuzzer-shaped Tuple(Nullable(Int8), Enum16(...), Map(MultiLineString, Nullable(QBit(...))))
({})                          -- ifNull(NULL::Nullable(Tuple(...)), tuple(map()))

Existing QBit stateless tests (03363...03379) all pass (9/9).

Regression test added: tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.{sql,reference}.

Scope

The change is localized to ColumnQBit::structureEquals. No behavior change for existing QBit workloads that already worked; only operations that previously failed with structureEquals-based validation will now proceed correctly.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a logical error Function writeSlice expects same column types for GenericArraySlice and GenericArraySink raised when evaluating if/ifNull over tuples, maps, or arrays containing a QBit element. ColumnQBit::structureEquals incorrectly compared the inner tuple of one QBit column against the outer wrapper of the other, so structurally identical QBit columns were reported as different.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.8

ColumnQBit::structureEquals previously delegated the comparison to the
inner tuple by calling tuple->structureEquals(rhs). But the rhs is a
ColumnQBit, not a ColumnTuple, so ColumnTuple::structureEquals failed
the typeid_cast up-front and returned false for any pair of ColumnQBit
columns.

This broke any operation using structureEquals to validate two
structurally equivalent QBit columns. The most visible case was
writeSlice(GenericArraySlice, GenericArraySink) (src/Functions/
GatherUtils/Algorithms.h:75-84) throwing a logical error:

    Function writeSlice expects same column types for GenericArraySlice
    and GenericArraySink.

A fuzzed workload that crashed on master via BuzzHouse (STID 2670-0e56,
experimental serverfuzz amd_msan, 2026-04-17) distills to:

    CREATE TABLE t (c0 Tuple(Map(String, Nullable(QBit(Float64, 4)))))
      ENGINE = Memory;
    INSERT INTO t SELECT NULL;

The INSERT goes through ConvertingTransform -> ifNull -> if ->
executeTuple -> executeMap -> executeGenericArray -> writeSlice. The
executeGenericArray path builds a GenericArraySink from the declared
result type and a GenericArraySource from the argument column; both
carry structurally identical ColumnNullable(ColumnQBit) elements, but
the broken structureEquals made writeSlice reject them.

Fix ColumnQBit::structureEquals to follow the standard wrapper-column
pattern (cf. ColumnSparse, ColumnReplicated, ColumnLowCardinality,
ColumnNullable): typeid_cast the rhs to ColumnQBit first, then delegate
the inner comparison to the unwrapped tuples on both sides. Also
require dimension to match, because two QBit columns with different
dimensions that pad to the same FixedString size (e.g. dimension 1 and
8 both pad to 1 byte) would otherwise be reported as structurally
equal.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=4c74bee46e8b3d257df5320ac90f735593ef61bc&name_0=MasterCI&name_1=BuzzHouse%20%28experimental%2C%20serverfuzz%2C%20amd_msan%29

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @rienath @Avogar @alexey-milovidov — could you review this? Fixes a logical error in writeSlice reaching master via BuzzHouse (STID 2670-0e56) by restoring the standard typeid_cast-first pattern in ColumnQBit::structureEquals (it was delegating to the inner tuple's structureEquals with the outer ColumnQBit as argument, which always returned false for matching QBit columns).

@rienath rienath self-assigned this Apr 20, 2026
@@ -0,0 +1,40 @@
-- Tags: no-fasttest
-- Tag `no-fasttest`: `QBit` depends on non-fasttest libraries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the library QBit depends on that is not available in fasttest?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai please take a look at the review

Comment thread tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.sql Outdated
Comment thread tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.sql Outdated
Comment thread tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.sql Outdated
@rienath rienath added the can be tested Allows running workflows for external contributors label Apr 20, 2026
@clickhouse-gh

clickhouse-gh Bot commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [7bbb4a8]

Summary:

job_name test_name status info comment
Performance Comparison (arm_release, master_head, 5/6) FAIL
Check Results FAIL
Fast test (arm_darwin) ERROR

AI Review

Summary

This PR fixes ColumnQBit::structureEquals so it compares two ColumnQBit wrappers correctly (including dimension) instead of comparing one side's inner tuple to the other side's outer wrapper, which could trigger a logical error in writeSlice for structurally equal columns. The change is small, targeted, and backed by a new stateless regression test covering the previously failing query shape.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 20, 2026
…e tests

Feedback from @rienath on PR ClickHouse#103084:
- Remove the 'no-fasttest' tag; QBit has no fasttest-excluded dependencies
  (sibling tests 03363-03374 do not use the tag either).
- Collapse the long description into a three-line regression-test blurb.
- Drop the inline comments before the BuzzHouse-shape and ifNull queries.
- Add negative tests that CAST between QBit columns with non-matching
  dimensions. Dimensions 1 and 8 are the interesting boundary because
  both pad to a single byte inside the tuple's FixedString storage, so
  the dimension field is what structureEquals relies on to tell them
  apart.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@rienath thanks for the review, pushed 319fe24 addressing all five comments:

  • Dropped the no-fasttest tag (QBit has no fasttest-excluded deps, matching 03363–03374).
  • Collapsed the long regression-test description into three lines.
  • Deleted the inline comments above the full-shape and ifNull queries.
  • Added three negative cases at the end that CAST between QBit columns with non-matching dimensions. Dimensions 1 and 8 are the specific boundary the dimension check protects: both pad to a single byte in the underlying FixedString storage, so without comparing dimension the two columns would look structurally identical. All three CASTs fail cleanly with TYPE_MISMATCH.

Session: cron:clickhouse-ci-task-worker:20260420-121500

Comment on lines +23 to +29
-- Negative cases: QBits with non-matching dimensions are different types and must
-- not be conflated. Dimensions 1 and 8 are particularly interesting because both
-- pad to a single byte in the underlying `FixedString` storage, so the dimension
-- field is what `structureEquals` relies on to tell them apart.
SELECT CAST(CAST([1.0] AS QBit(Float64, 1)) AS QBit(Float64, 8)); -- { serverError TYPE_MISMATCH }
SELECT CAST(CAST([1., 2., 3., 4., 5., 6., 7., 8.] AS QBit(Float64, 8)) AS QBit(Float64, 1)); -- { serverError TYPE_MISMATCH }
SELECT CAST(CAST([1., 2., 3., 4.] AS QBit(Float64, 4)) AS QBit(Float64, 8)); -- { serverError TYPE_MISMATCH }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test structureEquals path, does it? @groeneai

@rienath pointed out that the previous CAST TYPE_MISMATCH cases
did not exercise ColumnQBit::structureEquals — they are rejected
at FunctionCast type resolution (DataTypeQBit::equals) before any
column-level structureEquals call. The positive regression cases
in the SQL test already walk through structureEquals on same-
dimension QBits (via writeSlice in the ifNull/INSERT paths), so
the negative dimension-mismatch case needs a direct unit test
rather than a SQL test.

New gtest_column_qbit.cpp covers:

  - StructureEqualsSameDimensionIsTrue: fails without the
    typeid_cast<ColumnQBit> rewrite (the original bug).
  - StructureEqualsDifferentDimensionIsFalse: exercises the
    `dimension ==` guard with the 1-vs-8 pair where both padded
    FixedStrings are 1 byte wide, so the inner ColumnTuple would
    otherwise claim the columns match. Also covers 4-vs-8 as a
    routine mismatch.
  - StructureEqualsDifferentElementTypeIsFalse: defensive
    coverage — different element types give different tuple
    widths.
  - StructureEqualsNonQBitRhsIsFalse: typeid_cast guard for any
    non-QBit rhs.

Verified both directions locally:

  - Without the PR fix (return tuple->structureEquals(rhs)): the
    SameDimension test fails.
  - With the PR fix but the `dimension ==` guard removed:
    DifferentDimension fails with d1->structureEquals(d8) == true
    and d4->structureEquals(d8) == true.
  - With the full fix: all four pass.

The SQL regression test drops the CAST TYPE_MISMATCH block and
points at the gtest instead; the positive ifNull/INSERT scenarios
that exercised the original writeSlice crash are kept unchanged.

50x runs of tests/queries/0_stateless/04104_qbit_writeslice_structure_equals
and all 16 tests/queries/0_stateless/*qbit* still pass.
@groeneai

Copy link
Copy Markdown
Contributor Author

@rienath you're right — the CAST(QBit(d1) AS QBit(d2)) cases never reach ColumnQBit::structureEquals; FunctionCast bails out at DataTypeQBit::equals before any column-level comparison runs. Pushed cdbff1e replacing them with a unit test that exercises the method directly.

New src/Columns/tests/gtest_column_qbit.cpp (four cases):

  • StructureEqualsSameDimensionIsTrue — regression test for the original bug; fails against the old return tuple->structureEquals(rhs) because that typeid-casts the ColumnTuple against the outer ColumnQBit and always returns false.
  • StructureEqualsDifferentDimensionIsFalse — the negative case you asked for. Uses QBit(Float64, 1) vs QBit(Float64, 8) specifically because both pad to a 1-byte FixedString, so the inner ColumnTuple::structureEquals would happily claim they match if the dimension == guard were missing. Also covers 4-vs-8 as a routine mismatch.
  • StructureEqualsDifferentElementTypeIsFalseFloat32 vs Float64 give different tuple widths; defensive coverage.
  • StructureEqualsNonQBitRhsIsFalse — compares against a ColumnString, proving the typeid_cast guard rejects non-QBit rhs.

I verified both directions locally with build/src/unit_tests_dbms --gtest_filter='ColumnQBit.*':

State of ColumnQBit::structureEquals Failing tests
old buggy return tuple->structureEquals(rhs) StructureEqualsSameDimensionIsTrue fails (actual: false, expected: true)
typeid-cast only, no dimension == guard StructureEqualsDifferentDimensionIsFalse fails — d1->structureEquals(d8) and d4->structureEquals(d8) return true
full fix all 4 pass

tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.sql keeps the positive ifNull/INSERT scenarios that exercised the original writeSlice crash and points at the gtest for direct structureEquals coverage; the CAST block is dropped.

Session: cron:clickhouse-ci-task-worker:20260420-141500

@rienath

rienath commented Apr 21, 2026

Copy link
Copy Markdown
Member

@groeneai delete the gtest, it is artificial. It is not clear how to test for size mismatch through SQL. Let's leave the dimension check as is for safety. Do not modify other files. Just delete the gtest

Per @rienath's request on PR ClickHouse#103084: the gtest is artificial and it is
not clear how to test for dimension mismatch through SQL, so drop the
unit test. The dimension check in `ColumnQBit::structureEquals` is left
in place for safety, and the positive SQL regression test in
`tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.sql`
remains as coverage for the original `writeSlice` logical error.
@groeneai

Copy link
Copy Markdown
Contributor Author

@rienath done in 7bbb4a8src/Columns/tests/gtest_column_qbit.cpp deleted, nothing else changed. ColumnQBit::structureEquals (typeid_cast + dimension == check) is untouched; tests/queries/0_stateless/04104_qbit_writeslice_structure_equals.{sql,reference} remain as the regression test for the original writeSlice logical error.

Verified locally (Clang 21 debug, Linux):

  • ninja -C build unit_tests_dbms — links cleanly after deletion
  • ninja -C build clickhouse — rebuilt
  • 04104_qbit_writeslice_structure_equals — 25/25 OK with randomization
  • full QBit suite (16 tests: 03363–03380, 03730, 03926, 04104) — 16/16 OK

Ready for re-review.

@clickhouse-gh

clickhouse-gh Bot commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 60.00% (3/5) · Uncovered code

Full report · Diff report

@rienath rienath added this pull request to the merge queue Apr 22, 2026
Merged via the queue into ClickHouse:master with commit 6bfa37d Apr 22, 2026
158 of 161 checks passed
@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants