Reject make_distributed_plan over a direct read from a text index#108818
Reject make_distributed_plan over a direct read from a text index#108818groeneai wants to merge 1 commit into
Conversation
A direct read from a text index adds ephemeral virtual columns (__text_index_idx_...) to the MergeTree read plus a separate index-read step. A distributed worker fragment cannot reproduce these: the virtual column is not in the table and ReadFromDistributedPlanSource cannot materialize it. buildQueryPipeline runs the second optimization pass once via optimize() and, for make_distributed_plan, again via convertToDistributed(). The text index optimization is not idempotent (it appends the virtual column to all_column_names), so the single-stage re-optimization re-adds the same column and aborts with the LOGICAL_ERROR "Column ... already added for reading" (server abort in debug/sanitizer builds). The multi-stage path instead fails at execution with NOT_FOUND_COLUMN_IN_BLOCK. Reject the unsupported combination cleanly at planning time in checkDistributedReadSupported, next to the existing rejections for pinned block-number boundaries and part-order virtual columns. Found by the AST fuzzer (STID 4792-6322). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @kssenii @CurtizJ — could you review this? It rejects |
|
Workflow [PR], commit [867d48c] Summary: ❌
AI ReviewSummaryThis PR rejects Final VerdictStatus: ✅ Approve No required actions. |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 9/9 (100.00%) | Lost baseline coverage: none · Uncovered code |
CI finish ledger — 867d48cEvery failure below has an owner: a fixing PR (ours or external), or a full-effort fix task whose fixing-PR link will be posted here when it opens. Only This PR's own fix (the make_distributed_plan + text-index direct-read LOGICAL_ERROR, STID 4792-6322) and its regression test 04417 are green. The lone red check is the pre-existing chronic UnionStep block-structure-mismatch (STID 0993-27f0), owned by open fixing PR #107719. Session id: cron:our-pr-ci-monitor:20260629-183000 |

Related: #108256
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a
LOGICAL_ERROR(Column ... already added for reading) when running a query withmake_distributed_plan = 1over aMergeTreetable that does a direct read from a text index (hasToken,hasAnyTokens,hasAllTokens). Such reads are now rejected at planning time instead of aborting the server.Description
Found by the AST fuzzer:
AST fuzzer (amd_debug, targeted), STID 4792-6322.CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108806&sha=70025015949d86cc3d68f7f9f0440ae1a0b6bf18&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%29
at
src/Processors/QueryPlan/ReadFromMergeTree.cpp:4554, on thebuildQueryPipeline -> convertToDistributed -> optimizeTreeSecondPasspath.A direct read from a text index adds ephemeral virtual columns
(
__text_index_idx_...) to the read and a separate index-read step. Adistributed worker fragment cannot reproduce these: the column is not in the
table and
ReadFromDistributedPlanSourcecannot materialize it.buildQueryPipelineruns the second optimization pass once viaoptimize()and, for
make_distributed_plan, again viaconvertToDistributed(). The textindex optimization is not idempotent (it appends the virtual column to
all_column_names), so the single-stage re-optimization re-adds the samecolumn and aborts with the
LOGICAL_ERRORabove (a server abort indebug/sanitizer builds). The multi-stage path instead fails at execution with
NOT_FOUND_COLUMN_IN_BLOCK.The fix rejects the unsupported combination cleanly at planning time in
checkDistributedReadSupported, next to the existing rejections for pinnedblock-number boundaries and the part-order virtual columns. This mirrors the
recently merged #108256, which rejects projections for distributed reads for
the same reason. Reproducer: