Fix flaky test 02354_vector_search_rescoring_and_prewhere by groeneai · Pull Request #101573 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix flaky test 02354_vector_search_rescoring_and_prewhere#101573

Merged
rschu1ze merged 1 commit into
ClickHouse:masterfrom
groeneai:fix/02354-vector-search-merge-expressions-flaky
Apr 2, 2026
Merged

Fix flaky test 02354_vector_search_rescoring_and_prewhere#101573
rschu1ze merged 1 commit into
ClickHouse:masterfrom
groeneai:fix/02354-vector-search-merge-expressions-flaky

Conversation

@groeneai

@groeneai groeneai commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Not required

What this PR does

Pin query_plan_merge_expressions = 1 to fix a chronic flaky test.

Root cause: The vector search query plan optimizer (tryUseVectorSearch in useVectorSearch.cpp) requires a specific plan structure: LimitStep → SortingStep → ExpressionStep → [FilterStep] → ReadFromMergeTree. When query_plan_merge_expressions is randomized to 0, extra unmerged ExpressionSteps appear between SortingStep and ReadFromMergeTree, breaking the optimizer's pattern matching. It silently falls back to brute-force scan, producing different (L2Distance-sorted full scan) results instead of HNSW index-based results.

Reproduction: SET query_plan_merge_expressions = 0 deterministically triggers the failure — all queries in the test return brute-force results (e.g., 16, 19, 12, 11 instead of index-filtered 16, 19).

CIDB evidence: 34 failures in 30 days — 30 from PR #100874 flaky check (which adds query_plan_merge_expressions to the SettingsRandomizer), plus historical hits in PRs #96130, #99181, and 1 master failure.

Verification: 50/50 passes with full randomization after the fix.

Version info

  • Merged into: 26.4.1.520

Pin query_plan_merge_expressions=1 because the vector search query plan
optimizer (tryUseVectorSearch) requires a specific plan structure:
LimitStep → SortingStep → ExpressionStep → [FilterStep] → ReadFromMergeTree.

When query_plan_merge_expressions is randomized to 0, extra unmerged
ExpressionSteps appear between SortingStep and ReadFromMergeTree, causing
the optimizer to fail pattern matching and silently fall back to brute-force
scan. This produces different (brute-force) results instead of index-based
results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@groeneai

groeneai commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

@groeneai

groeneai commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

cc @rschu1ze — could you review this? It pins query_plan_merge_expressions=1 in the vector search rescoring test because tryUseVectorSearch() needs merged expression steps to recognize the LimitStep→SortingStep→ExpressionStep→ReadFromMergeTree plan pattern. When expressions aren't merged, the optimizer silently falls back to brute-force scan, producing wrong results.

@shankar-iyer shankar-iyer added the can be tested Allows running workflows for external contributors label Apr 2, 2026
@clickhouse-gh

clickhouse-gh Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [3107817]

Summary:

job_name test_name status info comment
Stress test (arm_msan) failure
Server died FAIL cidb IGNORED
MemorySanitizer: use-of-uninitialized-value (STID: 1478-2063) FAIL cidb IGNORED

AI Review

Summary

This PR stabilizes tests/queries/0_stateless/02354_vector_search_rescoring_and_prewhere.sql by pinning query_plan_merge_expressions = 1, preventing random-plan variation from switching vector-search optimization off and changing expected results. The change is minimal, targeted, and appropriate for a flaky-test fix. I did not find correctness, safety, performance, or compliance issues in the diff.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Apr 2, 2026
@rschu1ze rschu1ze added this pull request to the merge queue Apr 2, 2026
Merged via the queue into ClickHouse:master with commit a185cdc Apr 2, 2026
161 of 163 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 2, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 2, 2026
The vector search optimizer (tryUseVectorSearch in useVectorSearch.cpp)
requires a specific merged expression step sequence in the query plan:
LimitStep→SortingStep→ExpressionStep→ReadFromMergeTree. When
query_plan_merge_expressions is randomized to 0, expressions are not
merged and the optimizer fails to detect the pattern, causing fallback
to brute-force scan with different results.

Pin query_plan_merge_expressions=1 in 14 vector search tests that
depend on index usage. This follows the same fix pattern as PR ClickHouse#101573
which addressed the identical issue in rescoring_and_prewhere.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants