Do not cache query condition results for sampled reads by groeneai · Pull Request #108488 · ClickHouse/ClickHouse · GitHub
Skip to content

Do not cache query condition results for sampled reads#108488

Merged
rschu1ze merged 2 commits into
ClickHouse:masterfrom
groeneai:qcc-sample-poisoning-fix
Jul 2, 2026
Merged

Do not cache query condition results for sampled reads#108488
rschu1ze merged 2 commits into
ClickHouse:masterfrom
groeneai:qcc-sample-poisoning-fix

Conversation

@groeneai

@groeneai groeneai commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Closes: #104203

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed wrong results when using SELECT [...] SAMPLE [...] together with the query condition cache (setting use_query_condition_cache = 1 which is also the default).

Version info

  • Merged into: 26.7.1.421
  • Backported to: 26.6.2.21

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @rschu1ze @KochetovNicolai — could you review? The query condition cache key encodes only the WHERE/PREWHERE predicate, not the SAMPLE clause, so a sampled read writes a sampling-narrowed mark mask that a later non-sampled query with the same predicate reuses, silently returning too few rows. This disables the cache (both write paths) for sampled reads, mirroring the existing FINAL / unique-key / bucket_id guards.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 25, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [cfd4be6]

Summary:


AI Review

Summary

This PR closes the query-condition-cache poisoning hole for sampled MergeTree reads by blocking both the index-analysis write path and the runtime PREWHERE/WHERE write path whenever result.sampling.use_sampling is true. I did not find a remaining correctness issue in the current code, and there are no prior review threads or replies that still require follow-up.

Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 25, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

CI finish ledger — f9c5177

Every failure below has an owner: a fixing PR (ours or external). Only CH Inc sync (private sync) is exempt. None of these are caused by this PR's ReadFromMergeTree sampling-guard change (this PR touches only QCC sampling + a new stateless test); all are pre-existing chronic-trunk / flaky signatures (CIDB 30d cross-PR counts shown).

Check / test Reason Owner / fixing PR
Integration tests (amd_asan_ubsan, db disk, old analyzer, 2/6) / test_quorum_inserts_parallel::test_parallel_quorum_actually_quorum flaky (30 PRs, 7 master / 30d) #108433 (ours, open)
Stress test (arm_asan_ubsan) / Block structure mismatch in UnionStep stream (STID 0993-27f0) chronic trunk logical error (62 PRs, 9 master / 30d) #107719 (ours, open)
CH Inc sync private sync CH Inc sync (private, not actionable)
Mergeable Check / PR aggregator roll-up of the rows above, no independent failure

Session id: cron:our-pr-ci-monitor:20260625-180000

groeneai and others added 2 commits July 1, 2026 21:21
The query condition cache key encodes only the WHERE/PREWHERE predicate,
not the SAMPLE clause. A `SELECT ... SAMPLE x WHERE cond` query reads only
the marks the sampling key selects, then records that sampling-narrowed
mark mask under the predicate hash. A later non-sampled query with the same
predicate reuses the under-counted mask, skips the marks SAMPLE excluded,
and silently returns too few rows.

Disable the query condition cache (both the index-analysis write and the
runtime PREWHERE/WHERE write) whenever the read uses sampling, mirroring
the existing FINAL / bucket_id guards in the disable-cascade.

The consult side is left unchanged: once writes are blocked during
sampling the cache only ever holds full-predicate masks, and a
predicate-false mark stays predicate-false under any sample subset, so
sampled reads still benefit from previously cached full-scan verdicts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@rschu1ze rschu1ze force-pushed the qcc-sample-poisoning-fix branch from f9c5177 to cfd4be6 Compare July 1, 2026 21:48
@rschu1ze rschu1ze self-assigned this Jul 1, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.60% 92.70% +0.10%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 9/9 (100.00%) · Uncovered code

Full report · Diff report

@rschu1ze rschu1ze added this pull request to the merge queue Jul 2, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2026
clickhouse-gh Bot pushed a commit that referenced this pull request Jul 2, 2026
clickhouse-gh Bot added a commit that referenced this pull request Jul 2, 2026
Backport #108488 to 26.6: Do not cache query condition results for sampled reads
robot-ch-test-poll3 added a commit that referenced this pull request Jul 3, 2026
Cherry pick #108488 to 26.3: Do not cache query condition results for sampled reads
robot-clickhouse added a commit that referenced this pull request Jul 3, 2026
robot-ch-test-poll3 added a commit that referenced this pull request Jul 3, 2026
Cherry pick #108488 to 26.4: Do not cache query condition results for sampled reads
robot-clickhouse added a commit that referenced this pull request Jul 3, 2026
robot-ch-test-poll3 added a commit that referenced this pull request Jul 3, 2026
Cherry pick #108488 to 26.5: Do not cache query condition results for sampled reads
robot-clickhouse added a commit that referenced this pull request Jul 3, 2026
@robot-clickhouse robot-clickhouse added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QueryConditionCache is poisoned by AST fuzzer queries — subsequent normal WHERE queries return zero rows

7 participants