iframe-proxy

vdimir · 2026-06-10T18:40:36Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix incorrect row order in ORDER BY queries over UNION ALL with optimize_read_in_order and read_in_order_use_virtual_row enabled, close Virtual row with union produces wrong result #106879

clickhouse-gh · 2026-06-10T18:41:16Z

Workflow [PR], commit [6f86ef8]

Summary: ✅

Performance Comparison: Performance dashboard

AI Review

Summary

This PR fixes wrong row ordering for ORDER BY over UNION ALL when optimize_read_in_order and virtual rows are enabled by preventing union narrowing and by allowing FinishSorting to consume virtual rows when all branches can preserve them. The main fix direction is reasonable, but two existing major issues remain open in the current head: a debug-only boundary check still fires in a valid disabled-conversion mode, and the union-wide virtual-row reset can invalidate the INNER JOIN read-in-order-through-join contract.

PR Metadata

The Bug Fix category is correct, and a changelog entry is required and present. The entry is specific enough, but the issue reference should be a separate full Closes line instead of text inside the changelog entry.

Suggested PR body adjustment:

Closes: https://github.com/ClickHouse/ClickHouse/issues/106879

* Fixes incorrect row ordering for `ORDER BY` queries over `UNION ALL` when `optimize_read_in_order` and `read_in_order_use_virtual_row` are enabled.

Findings

⚠️ Majors

[src/Processors/Merges/Algorithms/MergingSortedAlgorithm.cpp:179] The virtual-row boundary check still runs when apply_virtual_row_conversions is false. In that valid preliminary-merge mode, setVirtualRow intentionally does not execute the conversion DAG, so ORDER BY expression columns in the virtual chunk can contain default values and the next real chunk can throw LOGICAL_ERROR in debug builds for a valid plan. Keep the apply_virtual_row_conversions guard around both remembering and checking the boundary, or guarantee that virtual-row chunks cannot reach this algorithm when conversions are disabled.
[src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp:1713] The union-wide reset can clear virtual-row conversions after a child branch has already been accepted through INNER JOIN and had requestReadingInOrder plus keepLeftPipelineInOrder applied. buildInputOrderInfo only allows non-LEFT joins when virtual rows remain available, so clearing them later leaves the join-through-order optimization without the boundary signal it depends on and can make ORDER BY ... LIMIT read far more left-table rows. Defer mutating the reader until the union decision is known, or mark that child ineligible when the union cannot keep virtual rows.

Tests

⚠️ The new stateless test covers the direct UNION ALL row-order cases, including a branch that needs extra sorting, but it does not cover the INNER JOIN branch case above. Please add a focused regression that combines UNION ALL, query_plan_read_in_order_through_join = 1, an INNER JOIN child, and a sibling that forces the union to disable virtual rows, then verify the plan either falls back cleanly or preserves the bounded-read behavior expected by 03711_read_in_order_through_join.

Final Verdict

Not ready as-is. The row-order regression coverage is useful, but the two major virtual-row state issues should be fixed before merge.

clickhouse-gh · 2026-06-17T09:32:24Z

-        else
-            checkVirtualRowBoundary(cursors[source_num], virtual_row_boundary[source_num], description, source_num);
-    }
+    if (is_virtual_row && !has_collation)


This debug check is now running even when apply_virtual_row_conversions is false, which is a valid mode for existing read-in-order preliminary merges. In that mode setVirtualRow deliberately does not execute the conversion DAG, so for ORDER BY expressions the virtual chunk can contain default values for the computed sort columns instead of the announced boundary. The next real chunk then trips checkVirtualRowBoundary in debug builds for a valid plan.

For example, ReadFromMergeTree can still emit VirtualRowTransform chunks before the preliminary MergingSortedTransform created with apply_virtual_row_conversions=false when read_in_order_two_level_merge_threshold is exceeded. Please keep the apply_virtual_row_conversions guard around both remembering and checking the boundary, or otherwise guarantee that no virtual-row chunks can reach this algorithm when conversions are disabled.

clickhouse-gh · 2026-06-17T09:32:31Z

+        if (!apply_virtual_row)
+        {
+            for (auto * reader : virtual_row_readers)
+                reader->resetVirtualRowConversions();


This reset can leave a child optimized through an INNER JOIN in a state that buildInputOrderInfo explicitly rejects. The helper only allows non-LEFT joins when uses_virtual_row is true; it then calls requestReadingInOrder and keepLeftPipelineInOrder. If a later union-wide decision sets apply_virtual_row=false, this loop clears the reader conversion, so that same join branch keeps the read-in-order plan but no longer emits virtual rows.

That loses the boundary signal needed for the join-through-order optimization and can make ORDER BY ... LIMIT read far more left-table rows, the case covered by 03711_read_in_order_through_join. Please either defer mutating the reader until the union decision is known, or mark such a child as not eligible when the union cannot keep virtual rows, and add a union + INNER JOIN regression case.

vdimir · 2026-06-18T12:28:54Z

@groeneai Find appropriate reviewer for this PR

groeneai · 2026-06-18T12:40:43Z

cc @KochetovNicolai @nihalzp - could you review this? It fixes wrong ORDER BY results over UNION ALL by restricting UnionStep pipeline narrowing and adjusting virtual-row boundary handling (optimizeReadInOrder.cpp, MergingSortedAlgorithm.cpp). Picking you two as the recent authors of the read-in-order / buildInputOrderFromUnorderedKeys path and the ReadInOrder merge-table fix. The clickhouse-gh bot also flagged a possible INNER JOIN read-in-order-through-join eligibility regression (03711_read_in_order_through_join) worth a look.

vdimir · 2026-06-25T14:54:09Z

Integration tests (amd_asan_ubsan, db disk, old analyzer, 2/6) FAIL
test_quorum_inserts_parallel/test.py::test_parallel_quorum_actually_quorum

Some analysis:

Failure: test_parallel_quorum_actually_quorum failed at test.py:178 — node2.query("SYSTEM SYNC REPLICA q", timeout=10) → client timeout. Server actually finished the SYNC in 12.24s, just past the 10s client deadline.

Root cause chain:

SYNC waits for node2's whole replication queue. The blocking entry was GET_PART all_0_0_0 — the part from the first, deliberately-failed quorum insert (INSERT (1,'Hello'), quorum=3).
Node2's fetch of all_0_0_0 reused a pooled keep-alive interserver connection opened before the partition. A reused connection skips the 1s connect timeout and instead blocks on the 30s http_receive_timeout while the partition swallows the request — it hung ~26s ("No message received") before failing, holding the queue entry hostage. (Fresh connects fail in 1s and recover fast — that's why all_1_1_0/all_2_2_0 synced fine.)
As a side effect, node2 then took the quorum-failed fallback and abandoned the Hello row (would also fail the next count==3 assert).

Structural gap: the pool's staleness check isStale() = socket().poll(SELECT_READ) can't detect a DROP-blackholed peer (socket looks healthy locally), so it always reuses such a connection.

Why now (from your CI data): regression is PR #103786 "Try the next resolved address if HTTP connect fails" (merged 2026-06-25). Evidence: its own pre-merge CI failed this exact test on 06-13; a cross-PR burst (14 then 11 failures across ~unrelated PRs) brackets its merge-queue/merge on 06-24/25. Before that, only rare failures on slow (coverage/tsan) builds.

Honest caveat: the one log I have fails via the reuse-hang, which is a path PR #103786 doesn't touch — so the correlation nails the culprit, but the exact causal mechanism isn't proven by static analysis.

So, root-cause not yet clear, we can try increasing timeout form SYNC query, but it's 10s for quite a while so not sure.

groeneai · 2026-06-25T15:24:56Z

@vdimir Thanks for the deep analysis. This test_parallel_quorum_actually_quorum FAIL is not from your virtual-row diff: it is a pre-existing flaky on the slow sanitizer integration shards (36 hits / 28 distinct PRs / 5 on master over the last 30 days, all on amd_asan_ubsan, db disk, old analyzer, e.g. master report). It is already fixed and up for review in #108433.

On your open question (just bump the 10s SYNC timeout): we found that alone is not enough. The deterministic part of the root cause is the ReplicatedMergeTree fetch backoff, not only the client deadline. While node2 is partitioned off node1/node3 on 9009, its GET_PART entries fail repeatedly; each failure bumps num_tries, and ReplicatedMergeTreeQueue::getPostponeTimeMsForEntry schedules the next try 1 << num_tries ms out (capped at max_postpone_time_for_failed_replicated_fetches_ms, default 60000). After the partition heals, SYSTEM SYNC REPLICA has to wait out that accumulated backoff, which on the slow shards crosses 10s. So #108433 does both:

SETTINGS max_postpone_time_for_failed_replicated_fetches_ms = 0 on table q (removes the deterministic backoff stall in the fault window), and
raises the final SYSTEM SYNC REPLICA q barrier 10s -> 60s (the value 13 other integration tests already use) for residual parallel-sanitizer slowness.

I confirmed (1) alone is insufficient: with only backoff=0 the Integration tests (..., flaky) check still failed under --dist=each ASAN load, which is what drove adding (2). Neither change weakens the gate: the assert_eq_with_retry(node2, "SELECT COUNT() FROM q", "3") that follows still fails a genuine "node2 never syncs" regression.

Your #103786 attribution matches our data: the burst starts 2026-06-13 (its pre-merge CI date) and clusters around the 06-24/25 merge window, and we have it on record as a cause of connection/fetch test flakiness. Your reused-interserver-connection hang (~26s on http_receive_timeout) and our fetch-backoff are complementary paths to the same >10s SYNC: the hung fetch is itself a failed GET_PART that feeds num_tries, and either way the 10->60 barrier in #108433 absorbs your observed 12-26s case.

The structural gap you raise (isStale() = socket().poll(SELECT_READ) cannot detect a DROP-blackholed peer, so the pool keeps reusing a dead interserver connection) is a real but larger change in the HTTPConnectionPool / #103786 area. I would leave the approach to you and the connection-pool owners rather than patch it speculatively; happy to help reproduce or test if you want it pursued separately.

vdimir · 2026-06-26T10:08:08Z

@groeneai all CI failures have linked issue, but also check that that issue have any fixes in flight and if not prepare a fix

groeneai · 2026-06-26T10:38:48Z

@vdimir Checked each current failure on the head build (982a1a03). Every one has a fix in flight; nothing new to prepare.

Failure (check)	Linked issue	Fix in flight
`Block structure mismatch in UnionStep` STID 0993-27f0 (AST fuzzer amd_debug)	#108142	#107719 (ours, open). Plan-time `UnionStep::updateOutputHeader` rejects a const-folded branch vs a full sibling; this PR makes the header check tolerant the same way `updatePipeline` already is. Closes #106956; sparse facet was #107041 (merged).
TSan data race STID 4071-3348 + `Server died` (amd_tsan, s3)	#107487	#108391 by serxa, merged 2026-06-25 23:49Z. `BlockIO::onFinish` released the `MemoryReservation` before pipeline finalize, racing executor threads. Verified: 0 master hits after the merge (7 in the prior 2 days). `982a1a03` predates it, so a rebase clears it.
`Hung check ... possible deadlock` (Stress arm_tsan)	#107941	Chronic stress hung-check bucket (1042 hits / 464 PRs / 330 master in 30d), unrelated to the virtual-row diff. Tracked under our #105905 and #101680.
`test_parallel_quorum_actually_quorum` (Integration, asan db disk old analyzer)	(per your 06-25 note)	#108433 (ours, open) as discussed above.

Note #107487/#107941 are generic auto-match buckets keyed by the test_name string, so the precise owner is the STID/signature, not the issue text. Net: all linked, all covered. The only one in this set blocking a clean run on rebase is the UnionStep const-fold mismatch (#107719); the rest are either already merged (#108391) or pre-existing flaky unrelated to your change.

vdimir · 2026-06-30T14:31:39Z

@groeneai what is the issue with 02354_vector_search_rescoring in last CI run?

groeneai · 2026-06-30T15:47:14Z

@vdimir Not related to this PR. 02354_vector_search_rescoring was a master regression, already reverted.

Base-skew between two PRs merged the same day, neither seeing the other in CI:

Throw ILLEGAL_COLUMN when _distance is selected directly in vector search queries #108423 (merged 06-28) added a hard ILLEGAL_COLUMN guard making _distance internal-only (cannot be referenced directly).
Apply exact row-positioning for vector search rescoring queries #105591 (merged 06-29 11:27) added a rescoring fast-path whose test deliberately does SELECT id, round(_distance, 2) ... ORDER BY L2Distance(...), expecting success.

The guard was already on master when #105591 merged, so every config failed deterministically with Code 44 ... The _distance column is an internal virtual column ... cannot be referenced directly at useVectorSearch.cpp:237 (267/267 reruns, CI diagnosis "not a randomization issue").

Fix: revert #108812, merged 06-29 14:13 UTC. Master clean since (06-29: 46 fails; 06-30: 0/163 over the day). This PR head 83e5e7f2 last ran CI ~12:00 UTC 06-29, so it carries the #105591 merge but predates the revert. A rebase onto master clears it.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107053&sha=83e5e7f2177d4db99ccfb15b3ccb011aa046b98f&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29

clickhouse-gh · 2026-07-01T12:22:42Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered: 39/39 (100.00%) · Uncovered code

Full report · Diff report

Fix Virtual row with union produces wrong result

a359897

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 10, 2026

Merge branch 'master' into vdimir/virtual-row-union-bug

b50dda1

clickgapai mentioned this pull request Jun 12, 2026

Read in order with UNION and max_streams setting return out of order results #106880

Closed

vdimir mentioned this pull request Jun 16, 2026

Mark caught LOGICAL_ERROR exception as logged in BackupDataFileNameGenerator gtest #106668

Merged

Merge branch 'master' into vdimir/virtual-row-union-bug

eca6643

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

vdimir requested a review from nihalzp June 18, 2026 14:42

nihalzp self-assigned this Jun 18, 2026

nihalzp approved these changes Jun 24, 2026

View reviewed changes

Merge branch 'master' into vdimir/virtual-row-union-bug

d1309d3

Merge branch 'master' into vdimir/virtual-row-union-bug

982a1a0

vdimir added 2 commits June 26, 2026 13:38

Merge branch 'master' into vdimir/virtual-row-union-bug

e7dae3b

Merge branch 'master' into vdimir/virtual-row-union-bug

83e5e7f

Merge branch 'master' into vdimir/virtual-row-union-bug

6f86ef8

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

vdimir commented Jun 10, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

clickhouse-gh Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

PR Metadata

Findings

Tests

Final Verdict

Uh oh!

clickhouse-gh Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

vdimir commented Jun 18, 2026

Uh oh!

groeneai commented Jun 18, 2026

Uh oh!

vdimir commented Jun 25, 2026

Uh oh!

groeneai commented Jun 25, 2026

Uh oh!

vdimir commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

vdimir commented Jun 30, 2026

Uh oh!

groeneai commented Jun 30, 2026

Uh oh!

clickhouse-gh Bot commented Jul 1, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clickhouse-gh Bot commented Jun 10, 2026 •

edited

Loading