Fix arrayRemove logical error when comparing incompatible Variant types by groeneai · Pull Request #105248 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix arrayRemove logical error when comparing incompatible Variant types#105248

Merged
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-2508-1fb5-arrayremove-variant
Jun 13, 2026
Merged

Fix arrayRemove logical error when comparing incompatible Variant types#105248
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-2508-1fb5-arrayremove-variant

Conversation

@groeneai

@groeneai groeneai commented May 18, 2026

Copy link
Copy Markdown
Contributor

Related: #95839

When arrayRemove's first argument is an Array(Variant(...)) whose Variant alternatives are all incompatible with the type of the second argument and variant_throw_on_type_mismatch = 0, equals returns Nullable(Nothing) instead of throwing. The previous code stripped the Nullable and then declared the resulting ColumnNothing as having type UInt8 when handing it to FunctionIf. The UInt8/UInt8 typed paths inside if did not match the actual column class, dispatch fell into executeGeneric, and result_column->insertFrom(*col_else, i) then tripped the assertTypeEquality chassert (typeid(ColumnUInt8) != typeid(ColumnNothing)) inside IColumn::insertFrom.

Detect the type mismatch right after removeNullable: when the column's underlying type is not UInt8, no real equality comparison was possible, so substitute a constant-zero UInt8 column. The downstream if then yields the correct filter (no element of arr is considered equal to elem) and arrayRemove returns the array unchanged.

Reproducer (STID 2508-1fb5, hit on PR #94517 since 2026-05-04):

SET variant_throw_on_type_mismatch = 0;
SELECT arrayRemove([NULL, [257, 65537]], [['hello']]);
-- before: Logical error 'assertTypeEquality' in IColumn::insertFrom
-- after : [NULL,[257,65537]]

This also covers the multi-alternative Variant query reported on issue #95839 (verified locally: the query aborts with the STID 2508-1fb5 assertion on master and returns a clean result with this fix):

SET variant_throw_on_type_mismatch = 0;
SELECT arrayRemove([[9223372036854775807, -9223372036854775808], [0.9999, 7]], [['hello']]);
-- before: Logical error '(isConst() || ...) ? getDataType() == rhs.getDataType() : typeid(*this) == typeid(rhs)'
-- after : [[9223372036854775807,-9223372036854775808],[0.9999,7]]

Follow-up to #101105 (which widened assertTypeEquality to handle wrapper columns on rhs); this PR closes the remaining call-site where arrayRemove was producing a column whose runtime type did not match its declared type.

Discovered via @alexey-milovidov directive on #94517 (2026-05-18T15:08:20Z), STID 2508-1fb5; re-confirmed on #94517 (2026-06-13) and tracked by issue #95839.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a logical error in arrayRemove when the first argument is an array of Variant whose alternatives are all incompatible with the type of the second argument and variant_throw_on_type_mismatch is disabled. The function now treats the comparison as "never equal" and returns the array unchanged instead of triggering a server-side assertTypeEquality failure.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Version info

  • Merged into: 26.6.1.768

When `arrayRemove`'s first argument is an `Array(Variant(...))` whose Variant
alternatives are all incompatible with the type of the second argument AND
`variant_throw_on_type_mismatch = 0`, `equals` returns `Nullable(Nothing)`
instead of throwing. After `removeNullable` we therefore had a `ColumnNothing`
that `arrayRemove` then wrapped under a declared type of `UInt8` and passed to
the `if` resolver. `FunctionIf::executeImpl` dispatched the `UInt8/UInt8` typed
paths, failed to match the actual `ColumnNothing` column, fell into
`executeGeneric`, and tripped the `assertTypeEquality` chassert inside
`IColumn::insertFrom` (`typeid(ColumnUInt8) != typeid(ColumnNothing)`).

Detect this mismatch right after the `removeNullable`: when the column's
underlying type is not `UInt8`, no real equality comparison was possible, so
substitute a constant-zero `UInt8` column. The downstream `if` then computes
the correct filter (no element of `arr` is considered equal to `elem`) and
`arrayRemove` returns the array unchanged.

Reproducer (STID 2508-1fb5, hit on PR ClickHouse#94517 since 2026-05-04):

```
SET variant_throw_on_type_mismatch = 0;
SELECT arrayRemove([NULL, [257, 65537]], [['hello']]);
-- before: server logical error
-- after : [NULL,[257,65537]]
```

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @rschu1ze @Avogar — could you review this? Fix for STID 2508-1fb5 (an arrayRemove/FunctionIf/assertTypeEquality chassert that fires when an Array(Variant(...)) is compared with an incompatible second argument while variant_throw_on_type_mismatch = 0). It's the follow-up to #101105 that @alexey-milovidov asked for on #94517. Touches src/Functions/array/arrayRemove.cpp and adds a new 04256_* stateless test.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label May 18, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [1fcf1c2]

Summary:


AI Review

Summary

This PR fixes arrayRemove for impossible Variant comparisons when variant_throw_on_type_mismatch is disabled by translating the expected Nullable(Nothing)/ColumnNothing comparison result into a false UInt8 filter, while preserving a LOGICAL_ERROR for any other unexpected non-UInt8 result. The added stateless test covers the original [NULL, X] trigger, the multi-alternative Variant case from #95839, and the default throwing behavior. I found no remaining correctness issues.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 18, 2026
@groeneai

groeneai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger — covered SHA: 2006f54cbb3c41b600ae7e1b9faf259ec095c157

CI fully finished (Mergeable Check completed; 0 pending checks).

Check Test Disposition
Unit tests (asan_ubsan) (check-level) infra/chronic UBSan UntrackedMemoryCounter — task 2026-06-03-ci-p1-chronic-ubsan-in-untrackedmemoryc
Stateless tests (amd_tsan, parallel, 1/2) (check-level, no test_name in CIDB) likely the obfuscator chronic flaky pattern; no specific PR-caused failures captured.

No PR-caused failures captured. Awaiting human review.

Session: cron:our-pr-ci-monitor:20260604-213000

@antaljanosbenjamin antaljanosbenjamin self-assigned this Jun 5, 2026

@antaljanosbenjamin antaljanosbenjamin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good bot.

…House#95839

Extend the 04256 regression test with the exact query reported on issue
ClickHouse#95839 (STID 2508-1fb5): arrayRemove over a multi-alternative
Array(Variant(Array(Int64), Array(Float64))) with no NULL element, paired
with Array(Array(String)). This exercises the same assertTypeEquality
chassert path as the existing [NULL, X] cases but via a distinct
Variant-construction trigger, guarding against the regression the
arrayRemove fix addresses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit c8f5c4e — added #95839 regression query)

# Question Answer
a Deterministic repro? Yes. SET variant_throw_on_type_mismatch=0; SELECT arrayRemove([[9223372036854775807, -9223372036854775808], [0.9999, 7]], [['hello']]); aborts deterministically on a debug binary without the fix (built locally from master arrayRemove.cpp).
b Root cause explained? [[...Int64...], [...Float64...]] builds Array(Variant(Array(Int64), Array(Float64))); none of the Variant alternatives is compatible with Array(Array(String)). Under variant_throw_on_type_mismatch=0, equals returns Nullable(Nothing) instead of throwing; the old code stripped Nullable to a ColumnNothing but declared it UInt8, so FunctionIf::executeGeneric -> IColumn::insertFrom tripped the assertTypeEquality chassert (STID 2508-1fb5, issue #95839). Identical mechanism to the existing [NULL, X] cases, different Variant-construction trigger.
c Fix matches root cause? Unchanged from this PR's existing fix (substitute a constant-zero UInt8 column when the post-removeNullable column is not UInt8). This commit only adds a regression query that exercises the same code path.
d Test intent preserved / new tests added? Added alexey-milovidov's #95839 query to 04256_arrayRemove_variant_no_throw_on_mismatch.sql (+1 reference line). It is a distinct multi-alternative-Variant, no-NULL trigger not previously covered by the [NULL, X] cases. Existing assertions untouched.
e Both directions demonstrated? Yes. Without fix (master arrayRemove.cpp, debug build): SIGABRT with '(isConst() || ...) ? getDataType() == rhs.getDataType() : typeid(*this) == typeid(rhs)' via assertTypeEquality -> insertFrom -> FunctionIf -> arrayRemove.cpp:25. With fix: returns [[9223372036854775807,-9223372036854775808],[0.9999,7]]. Test runner: 30/30 OK with randomization.
f Fix is general, not a narrow patch? The fix already guards the single arrayRemove call-site that produced a column whose runtime type disagreed with its declared type. This commit widens test coverage (two distinct Variant-construction triggers now exercised); no new sibling code path identified for arrayRemove.

Session id: cron:clickhouse-worker-slot-0:20260613-050800

@groeneai

Copy link
Copy Markdown
Contributor Author

CI fully finished on HEAD c8f5c4e0660ad27d0509192338a1e38e4dcad792 (commit added the issue #95839 regression query + em-dash cleanup). All checks complete (Finish Workflow passed; only CH Inc sync / Mergeable Check reflect the items below). Triaged every failure; none is caused by this PR (diff is src/Functions/array/arrayRemove.cpp + the 04256 stateless test only):

  • Fast test (arm_darwin)04336_parallel_blocks_marshalling_low_cardinality_native_format + 04327_rewrite_aggregate_function_with_if_distributed. Chronic arm_darwin infra flake (connect-timeout / ConnectionPool warning, return code 23); 60 and 21 unrelated PRs in 30d, 0 master, both tests unrelated to arrayRemove. Tracked separately.
  • Stress test (arm_tsan)future_error "The associated promise has been destructed" (STID 2508-38c6). A pre-existing trunk teardown race in threadPoolCallbackRunnerUnsafe (a dropped-on-shutdown packaged_task), independent of this PR (also fired on master via serverfuzz). A root-cause fix is already in flight in Fix broken-promise server abort in threadPoolCallbackRunnerUnsafe on shutdown #107383.
  • CH Inc sync / Mergeable Check — aggregators reflecting the above; no Fast (real test) / Bugfix / Style failure attributable to this PR.

The arrayRemove fix and both-directions validation are unchanged from the prior clean CI runs.

Comment thread src/Functions/array/arrayRemove.cpp Outdated
The recovery added for the Variant no-throw-on-mismatch case caught any
non-UInt8 result from equals() and silently turned it into an all-zero
filter, which would make arrayRemove quietly keep every element if equals()
ever returned an unexpected type, masking a broken comparison contract.

Restrict the constant-zero substitution to TypeIndex::Nothing (the
documented Nullable(Nothing) result of an impossible Variant comparison
with variant_throw_on_type_mismatch=0) and throw LOGICAL_ERROR for any
other non-UInt8 result so a genuine contract violation surfaces instead of
being hidden.

Addresses clickhouse-gh[bot] review on PR ClickHouse#105248.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit 1fcf1c2b163)

# Question Answer
a Deterministic repro? Yes. With the previous broad branch removed, the bot's concern (silent all-zero filter for any non-UInt8 equals result) is reproduced by reasoning about the contract; the two known Nothing-producing repros (SET variant_throw_on_type_mismatch=0; SELECT arrayRemove([NULL,[257,65537]],[['hello']]) and SELECT arrayRemove([[9223372036854775807,-9223372036854775808],[0.9999,7]],[['hello']])) deterministically take the TypeIndex::Nothing branch.
b Root cause explained? The recovery branch was too broad: if (else_col->getDataType() != TypeIndex::UInt8) converted any non-UInt8 equals result into a constant-zero filter. Only Nullable(Nothing) (impossible Variant comparison with variant_throw_on_type_mismatch=0, stripped to ColumnNothing by removeNullable) is a legitimate recovery case; any other non-UInt8 type would be a broken equals contract that the code silently swallowed (keeping every element).
c Fix matches root cause? Yes. The substitution is now gated on else_col->getDataType() == TypeIndex::Nothing; any other non-UInt8 result throws LOGICAL_ERROR instead of being masked. This is exactly the bot's request.
d Test intent preserved / new tests added? Preserved. Existing test 04256_arrayRemove_variant_no_throw_on_mismatch (incl. the #95839 multi-alternative-Variant case) continues to exercise the Nothing recovery path; no assertions weakened. The new LOGICAL_ERROR branch is unreachable under any normal equals result, so no new test is added for it (it is a defensive contract guard, not a user-facing behavior).
e Both directions demonstrated? Yes. Built debug binary (Build ID e983a5c...). clickhouse local: the two Nothing-path repros return the array unchanged; default variant_throw_on_type_mismatch=1 throws clean ILLEGAL_TYPE_OF_ARGUMENT (no abort); ordinary arrayRemove cases unaffected. Test runner: 04256 30/30 OK with randomization; arrayRemove/assertTypeEquality/variant_throw suites 15/15 OK.
f Fix is general, not a narrow patch? Yes. This narrows a recovery branch rather than guarding a symptom: the legitimate recovery (Nothing) is handled and every other deviation now surfaces as LOGICAL_ERROR at the single point where arrayRemove consumes the equals result. No sibling arrayRemove path builds an if-filter this way.

Session id: cron:clickhouse-worker-slot-3:20260613-131600

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.40% 92.30% -0.10%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 6/8 (75.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit 8427ed3 Jun 13, 2026
166 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants