Do not cache query condition results for sampled reads#108488
Conversation
|
cc @rschu1ze @KochetovNicolai — could you review? The query condition cache key encodes only the WHERE/PREWHERE predicate, not the SAMPLE clause, so a sampled read writes a sampling-narrowed mark mask that a later non-sampled query with the same predicate reuses, silently returning too few rows. This disables the cache (both write paths) for sampled reads, mirroring the existing FINAL / unique-key / bucket_id guards. |
|
Workflow [PR], commit [cfd4be6] Summary: ✅
AI ReviewSummaryThis PR closes the query-condition-cache poisoning hole for sampled Final Verdict
|
CI finish ledger — f9c5177Every failure below has an owner: a fixing PR (ours or external). Only
Session id: cron:our-pr-ci-monitor:20260625-180000 |
The query condition cache key encodes only the WHERE/PREWHERE predicate, not the SAMPLE clause. A `SELECT ... SAMPLE x WHERE cond` query reads only the marks the sampling key selects, then records that sampling-narrowed mark mask under the predicate hash. A later non-sampled query with the same predicate reuses the under-counted mask, skips the marks SAMPLE excluded, and silently returns too few rows. Disable the query condition cache (both the index-analysis write and the runtime PREWHERE/WHERE write) whenever the read uses sampling, mirroring the existing FINAL / bucket_id guards in the disable-cascade. The consult side is left unchanged: once writes are blocked during sampling the cache only ever holds full-predicate masks, and a predicate-false mark stays predicate-false under any sample subset, so sampled reads still benefit from previously cached full-scan verdicts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f9c5177 to
cfd4be6
Compare
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 9/9 (100.00%) · Uncovered code |
Backport #108488 to 26.6: Do not cache query condition results for sampled reads
Cherry pick #108488 to 26.3: Do not cache query condition results for sampled reads
Cherry pick #108488 to 26.4: Do not cache query condition results for sampled reads
Cherry pick #108488 to 26.5: Do not cache query condition results for sampled reads

Closes: #104203
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed wrong results when using
SELECT [...] SAMPLE [...]together with the query condition cache (settinguse_query_condition_cache = 1which is also the default).Version info
26.7.1.42126.6.2.21