Fix LOGICAL_ERROR in MergeTreeIndexConditionSet under MV layer type change by groeneai · Pull Request #105552 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix LOGICAL_ERROR in MergeTreeIndexConditionSet under MV layer type change#105552

Open
groeneai wants to merge 3 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-89802-merge-tree-index-set-storage-type
Open

Fix LOGICAL_ERROR in MergeTreeIndexConditionSet under MV layer type change#105552
groeneai wants to merge 3 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-89802-merge-tree-index-set-storage-type

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

Closes #89802.

A set skip index on a non-Nullable storage column raised:

Logical error: 'Unexpected return type from equals.
Expected Nullable(UInt8). Got UInt8. Action:
FUNCTION equals(c0 :: 0, 0 :: 1) -> equals(c0, 0) Nullable(UInt8) : 2,
input block structure: c0 Int32 ...'

when read through a MaterializedView declaring the column as Nullable, with
query_plan_merge_expressions = 0 keeping the MV's _CAST step separate
from the WHERE filter. Reproduces from version 25.5 onward
(https://fiddle.clickhouse.com/6dc1980e-30b2-4082-b570-5ef45287afc9).

MergeTreeIndexConditionSet cloned the predicate verbatim. The equals
function's result_type was resolved against the MV-layer Nullable(Int32)
input. The granule block, however, carries the storage column at the storage
type (Int32), so ExpressionActions::execute ran equals on
non-Nullable inputs (returning UInt8) and rejected the result against
the saved Nullable(UInt8).

The fix: in MergeTreeIndexConditionSet::atomFromDAG, look up the granule
(storage) type for each key column from index_description.sample_block,
add an INPUT of that type, and CAST it back to the predicate's expected
type. The downstream pre-resolved functions then see the type they were
resolved for, keeping the function result_type invariant intact while the
block fed into the actions remains the storage column unchanged.

Added tests/queries/0_stateless/04272_89802_set_index_mv_nullable_column_type.sql,
which reproduces the failure of the original report plus matching, AND/OR/NOT,
IN (...), and UInt8-vs-Nullable(Int32) variants. Confirmed:

  • without the fix the new test crashes the debug server on the first query;
  • with the fix it passes 30 consecutive randomized runs;
  • the 10 existing 0_stateless set-index tests
    (00907_*, 00931_*, 00965_*, 00979_*, 00997_*, 01583_*, 02112_*, 02499_*)
    continue to pass.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a logical error Unexpected return type from equals. Expected Nullable(UInt8). Got UInt8. thrown by MergeTreeIndexConditionSet::getPossibleGranules when a set skip index sits on a non-Nullable storage column read through a MaterializedView that declares the column as Nullable, with query_plan_merge_expressions = 0.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

…hange

Closes ClickHouse#89802.

When a `set` skip index is built on a non-`Nullable` storage column and the
column is read through a MaterializedView declaring it `Nullable`, with
`query_plan_merge_expressions = 0` keeping the MV's `_CAST` step separate
from the WHERE filter, the index condition exception:

    Logical error: 'Unexpected return type from equals.
    Expected Nullable(UInt8). Got UInt8...'

`MergeTreeIndexConditionSet` cloned the predicate verbatim, including
`equals`' pre-resolved `result_type` resolved against the MV-layer
`Nullable(Int32)` input. The granule block carries the storage column at the
storage type (`Int32`), so `ExpressionActions::execute` runs `equals` on
non-`Nullable` inputs (returning `UInt8`) but checks against the saved
`Nullable(UInt8)` and throws.

Fix: in `MergeTreeIndexConditionSet::atomFromDAG`, look up the granule
(storage) type for each key column from `index_description.sample_block`,
add an `INPUT` of that type, and `CAST` it back to the predicate's expected
type. The downstream pre-resolved functions then see the type they were
resolved for, keeping the function `result_type` invariant intact while the
block fed into the actions remains the storage column unchanged.

Added `tests/queries/0_stateless/04272_89802_set_index_mv_nullable_column_type.sql`.

Verified the new test crashes the server without the fix and passes 30/30
runs with it; the existing 10 set-index stateless tests continue to pass.
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @rschu1ze @alexey-milovidov — could you review this? The fix is in MergeTreeIndexConditionSet::atomFromDAG: when a set skip index sits on a non-Nullable storage column but the predicate sees it through an MV layer at the declared Nullable type, look up the granule (storage) type from index_description.sample_block, add an INPUT of that type, and CAST it back to the predicate's expected type so the pre-resolved function's result_type invariant stays intact.

@PedroTadim PedroTadim added the can be tested Allows running workflows for external contributors label May 21, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [c5346e6]

Summary:

job_name test_name status info comment
Bugfix validation (functional tests) FAIL
04272_89802_set_index_mv_nullable_column_type FAIL cidb
Stateless tests (amd_tsan, parallel, 2/2) FAIL
Server died FAIL cidb, issue
04064_recursive_cte_multiple_subqueries UNKNOWN cidb
04033_tpc_ds_q10 UNKNOWN cidb
00975_move_partition_merge_tree UNKNOWN cidb
02040_clickhouse_benchmark_query_id_pass_through UNKNOWN cidb
03701_replicated_column_short_circuit_filter UNKNOWN cidb
03040_dynamic_type_alters_2_wide_merge_tree UNKNOWN cidb
01169_old_alter_partition_isolation_stress UNKNOWN cidb
00933_ttl_with_default UNKNOWN cidb
04222_hypothetical_index_statistical UNKNOWN cidb
3 more test cases not shown
Fast test (arm_darwin) DROPPED
Build (amd_release) DROPPED
Build (arm_release) DROPPED
Build (amd_darwin) DROPPED
Build (arm_darwin) DROPPED
Build (arm_v80compat) DROPPED
Build (amd_freebsd) DROPPED
Build (ppc64le) DROPPED

AI Review

Summary
  • This PR fixes MergeTreeIndexConditionSet so a set skip index evaluates granule values through the predicate's expected column type when a MaterializedView exposes the same storage column with a different declared type. The change is narrow, covered by a focused regression test for the reported Nullable mismatch and related predicate shapes, and I did not find a correctness issue that needs an inline review comment.
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 21, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

Bugfix validation FAIL update — investigated.

Bugfix validation (functional tests) reports Failed to reproduce the bug for 04272_89802_set_index_mv_nullable_column_type on master @ 3dadc439d799. Looking at the job log:

[15:41:52] Run test 04272_89802_set_index_mv_nullable_column_type
[15:42:57] Hung check failed: server is not responding
[15:43:06] Unable to locate any ClickHouse server process. It must have crashed or exited prematurely!
...
Server died? ... core.TCPHandler.1594-8848.zst (encrypted)

On master the test triggers a server crash (core dump captured), not a clean exception. clickhouse-test records this as SERVER_DIED, but bugfix_validate_check.py only treats Result.Status.FAIL as bug-reproduced, so the framework reports "not reproduced". The bug is real and user-visible (fiddle: https://fiddle.clickhouse.com/6dc1980e-30b2-4082-b570-5ef45287afc9), the framework just cannot see this shape.

Two options:

  1. Keep Bug Fix category and override the failing check (the bug IS user-visible from 25.5 onward, so changelog entry is appropriate).
  2. Re-categorise to CI Fix or Improvement (changelog entry is not required) following the PR Fix inconsistent AST formatting for CREATE TABLE with UNIQUE KEY and TTL #104968 precedent of "Bugfix validation cannot reproduce" → CI Fix.

@PedroTadim @rschu1ze — which do you prefer? I'll push the PR-body edit either way.

@PedroTadim

Copy link
Copy Markdown
Member

@groeneai I go with option 1

@clickhouse-gh

clickhouse-gh Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.40% 84.40% +0.00%
Functions 92.40% 92.30% -0.10%
Branches 77.00% 77.00% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 19/22 (86.36%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

groeneai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger — covered SHA: 2e098198d4009b2f0cb8b5fbdbd4d8bb6f36c6e9

CI fully finished (Mergeable Check completed; 0 pending checks).

Check Test Disposition
Stateless tests (amd_tsan, parallel, 1/2) 00175_obfuscator_schema_inference flaky chronic trunk — task 2026-06-02-ci-00175obfuscatorschemainference-ab
Unit tests (asan_ubsan) (check-level) infra/chronic UBSan UntrackedMemoryCounter — task 2026-06-03-ci-p1-chronic-ubsan-in-untrackedmemoryc
Bugfix validation (functional tests) 04272_89802_set_index_mv_nullable_column_type this PR's OWN regression test — Bugfix validation framework limitation when MV layer type-change bug isn't reproducible on master HEAD without the right ordering. Not a real regression.

No PR-caused failures. Awaiting human review.

Session: cron:our-pr-ci-monitor:20260604-213000

@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Going with option 1: category stays Bug Fix.

Bugfix validation FAIL is the framework limitation: server crashed (SERVER_DIED, captured core) instead of Result.Status.FAIL, which the framework does not count as "bug reproduced". User-visible from 25.5+, so the changelog entry is retained.

CI ledger for current HEAD 2e098198: #105552 (comment). The other two failures are chronic trunk issues, tracked separately. Awaiting human review.

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finish ledger — c5346e6 (post master-merge)

CI has fully finished (Finish Workflow passed; last checks 2026-06-24 13:14). This PR fixes a set-index LOGICAL_ERROR (MergeTreeIndexSet); the two TSan reds are an unrelated chronic trunk race. Every red check has an owner; only CH Inc sync is exempt.

Check / test Reason Owner / fixing PR
Stateless tests (amd_tsan, parallel, 2/2) / ThreadSanitizer: data race (STID 4071-3348) data race in QueryStatus::releaseWorkloadResources vs MemoryReservation release — chronic trunk (41 hits / 35 PRs / 6 master in 30d), unrelated to this PR's set-index change #108391 (external, open — serxa "Fix data race on MemoryReservation release at query finish")
Stateless tests (amd_tsan, parallel, 2/2) / Server died the TSan halt-on-error abort from STID 4071-3348 above (same job) #108391 (external, open)
Bugfix validation (functional tests) / 04272_89802_set_index_mv_nullable_column_type framework limitation: the bug DOES reproduce on the master release binary (server dies), but the harness only counts Result.Status.FAIL as "reproduced" and records SERVER_DIED as "failed to reproduce" @ PedroTadim chose "option 1" (2026-05-21) — keep Bug Fix category; accepted, not actionable
Mergeable Check / PR rollups of the reds above
CH Inc sync CH Inc sync (private, not actionable by us)

The fix + regression test were AI-Review-approved; the only "failures" are the unrelated chronic TSan race (external fix #108391) and the maintainer-accepted Bugfix-validation framework artifact. Ready for review.

Session id: cron:our-pr-ci-monitor:20260625-170000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Another Unexpected return type from equals. Expected Nullable(UInt8). Got UInt8

3 participants