iframe-proxy

alexey-milovidov · 2026-03-16T02:59:41Z

Summary

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k settings by default
Add corresponding entries in settings changes history for version 26.3

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k settings by default to improve performance of ORDER BY ... LIMIT N queries.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

No documentation changes needed — the settings and their documentation already exist; only the default values changed.

Version info

Merged into: 26.5.1.436

… by default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-16T03:00:20Z

shankar-iyer · 2026-03-16T08:02:59Z

I will check the failures.

alexey-milovidov · 2026-03-16T08:26:13Z

@shankar-iyer FYI, it is #91782

shankar-iyer · 2026-03-17T10:07:47Z

@alexey-milovidov Let's enable only use_skip_indexes_for_top_k now? That aligns with our overall plan of skip indexes by default. use_top_k_dynamic_filtering is good when skip index is not present on the ORDER BY column, and provides only marginal improvement if skip index was present and used by use_skip_indexes_for_top_k.

alexey-milovidov · 2026-03-18T11:45:46Z

use_top_k_dynamic_filtering is good when skip index is not present

I see it as a solid use-case. And indices by default will not always be enabled; there will be plenty of use cases without them. Why to afraid?

shankar-iyer · 2026-03-18T12:22:52Z

Why to afraid?

@alexey-milovidov

Best case -

SELECT URL, EventTime FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

10 rows in set. Elapsed: 3.374 sec. Processed 100.00 million rows, 10.02 GB (29.64 million rows/s., 2.97 GB/s.)


SELECT URL, EventTime FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=1

10 rows in set. Elapsed: 0.526 sec. Processed 100.00 million rows, 1.20 GB (189.97 million rows/s., 2.28 GB/s.)

But this one -


SELECT URL, EventTime FROM hits WHERE URL LIKE '%shankar%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

0 rows in set. Elapsed: 3.154 sec. Processed 100.00 million rows, 9.90 GB (31.70 million rows/s., 3.14 GB/s.)


SELECT URL, EventTime FROM hits WHERE URL LIKE '%shankar%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

0 rows in set. Elapsed: 3.507 sec. Processed 100.00 million rows, 10.30 GB (28.52 million rows/s., 2.94 GB/s.)

If there is not many rows to sort after filters, we have additional overhead of reading a 4-byte/8-byte column (Numeric/Date). What do you say - enable it and real world workloads should balance out?

All the test failures are not due to any issue in the optimization, but due to change in rows_read profile event, which I can address.

alexey-milovidov · 2026-03-18T12:52:23Z

What do you say - enable it and real world workloads should balance out?

Yes, I think it will be beneficial on average!

alexey-milovidov · 2026-03-18T12:53:00Z

Working on that in #91782

Disable `use_top_k_dynamic_filtering` and `use_skip_indexes_for_top_k` in the test that checks read_rows for ORDER BY LIMIT, since these optimizations change the reading strategy and are not what this test validates. Restore the accidentally removed empty 26.4 version placeholders in SettingsChangesHistory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…o enable-top-k-optimizations-by-default

…ickHouse/ClickHouse into enable-top-k-optimizations-by-default

…zations-by-default

CLAassistant · 2026-03-26T23:18:31Z

All committers have signed the CLA.

alexey-milovidov · 2026-05-05T10:39:27Z

As #103553 is merged, let's update the branch here.

…zations-by-default

alexey-milovidov · 2026-05-05T21:26:15Z

@groeneai, fix Logical error: AggregateFunctionArray and send a separate PR with the fix.

groeneai · 2026-05-06T02:48:35Z

@alexey-milovidov — Logical error: AggregateFunctionArray on this PR is the chronic trunk bug STID 4870 family (parameters mismatch between Array wrapper and nested function for timeSeries*ToGridArray). It is not caused by this PR.

CIDB cross-PR (30 days, STID 4870 family): 43 master hits + 30+ unrelated PRs. The hit on this PR is STID 4870-4f21 (AST fuzzer (amd_debug) 2026-05-05 20:14 UTC), failing query:

SELECT timeSeriesResampleToGridWithStalenessArray(2, 120, toNullable(10), 10)(timestamps, `values`) FROM (...)

Error:

AggregateFunctionArray: parameters mismatch between Array wrapper 'timeSeriesResampleToGridWithStalenessArray' and nested function 'timeSeriesResampleToGridWithStaleness'.
Wrapper has 4 parameter(s): [UInt64_2, UInt64_120, UInt64_10, UInt64_10]
Nested function has 4 parameter(s): [Decimal64_'2', Decimal64_'120', Decimal64_'10', Decimal64_'10']

Root cause: AggregateFunctionTimeseriesBase normalizes its 4 timestamp parameters to Decimal64, but the Array combinator copies the user-typed UInt64 params. The chassert wrapper.params == nested.getParameters() was promoted to LOGICAL_ERROR in #102868, exposing the family.

Fix in flight: PR #103428 by @vitlibar — "Resubmit: Preserve parameter types in timeseries aggregate functions". OPEN + mergeable, awaiting review from @tavplubix since 2026-04-23.

My earlier PR #103079 (fixing it from the Array-combinator side: have the wrapper inherit the nested function's normalized params) was closed by @vitlibar on 2026-04-23 in favor of his approach (preserve original parameter types in TimeSeries instead of normalizing). Not filing a duplicate — would conflict with the active work.

Action: This failure on #99537 will resolve once #103428 merges. The bottleneck is review on #103428. Tracking under our internal P0 task since 2026-04-17.

Session: cron:clickhouse-ci-task-worker:20260506-024500

…zations-by-default

alexey-milovidov · 2026-05-06T10:52:05Z

  Genuine regressions caused by the PR (changed, never seen on master in last 30d)                                                                                                                             
                                                                  
  ┌──────────┬───────────────────┬──────────────────────────────────────────────────┬─────────────────────────────┐                                                                                            
  │ Slowdown │       Test        │                      Query                       │ Master 30d (slower / total) │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.63x    │ string_sort #10   │ ORDER BY MobilePhoneModel LIMIT 300              │ 0 / 35 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.38x    │ sparse_column #52 │ test_sparse_1000 ORDER BY u8, u64 DESC LIMIT 100 │ 0 / 49 — new                │                                                                                            
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.35x    │ sparse_column #47 │ test_full_1000 ORDER BY u8, u64 DESC LIMIT 100   │ 0 / 45 — new                │                                                                                            
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.34x    │ sparse_column #49 │ test_full_10 ORDER BY u8, u64 DESC LIMIT 100     │ 0 / 58 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.20x    │ string_sort #117  │ ORDER BY MobilePhoneModel, SearchPhrase LIMIT 10 │ 0 / 30 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.16x    │ string_sort #108  │ ORDER BY SearchPhrase, URL LIMIT 10              │ 0 / 24 — new                │
  └──────────┴───────────────────┴──────────────────────────────────────────────────┴─────────────────────────────┘

shankar-iyer · 2026-05-06T11:22:54Z

  Genuine regressions caused by the PR (changed, never seen on master in last 30d)                                                                                                                             
                                                                  
  ┌──────────┬───────────────────┬──────────────────────────────────────────────────┬─────────────────────────────┐                                                                                            
  │ Slowdown │       Test        │                      Query                       │ Master 30d (slower / total) │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.63x    │ string_sort #10   │ ORDER BY MobilePhoneModel LIMIT 300              │ 0 / 35 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.38x    │ sparse_column #52 │ test_sparse_1000 ORDER BY u8, u64 DESC LIMIT 100 │ 0 / 49 — new                │                                                                                            
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.35x    │ sparse_column #47 │ test_full_1000 ORDER BY u8, u64 DESC LIMIT 100   │ 0 / 45 — new                │                                                                                            
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.34x    │ sparse_column #49 │ test_full_10 ORDER BY u8, u64 DESC LIMIT 100     │ 0 / 58 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.20x    │ string_sort #117  │ ORDER BY MobilePhoneModel, SearchPhrase LIMIT 10 │ 0 / 30 — new                │
  ├──────────┼───────────────────┼──────────────────────────────────────────────────┼─────────────────────────────┤                                                                                            
  │ 1.16x    │ string_sort #108  │ ORDER BY SearchPhrase, URL LIMIT 10              │ 0 / 24 — new                │
  └──────────┴───────────────────┴──────────────────────────────────────────────────┴─────────────────────────────┘

Let me check.

…zations-by-default

The `SettingsChangesHistory` entry for `use_top_k_dynamic_filtering_for_variable_length_types` previously recorded `{false, false}`, which means `compatibility` for older versions would still leave variable-length types out of the dynamic filter. In older versions, when a user opted into `use_top_k_dynamic_filtering = 1` (or once PR #99537 turns it on by default), variable-length sort columns received the dynamic filter unconditionally — i.e. the effective value of the new gate was `true`. Change the entry to `{true, false}` so `compatibility` correctly restores the previous behavior. Flagged by AI Review on PR #104216. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…zations-by-default

PR ClickHouse#99537 turns on use_top_k_dynamic_filtering by default. For sort columns of fixed-length types (numbers, dates, etc.) this is a clear win: the per-row __topKFilter prewhere lets minmax/threshold-based mark skipping kick in early. For variable-length types (String, Array, Map, Tuple containing such), the per-row threshold comparison runs the generic less-or-equals over the column with no granule pruning, so when the lex-min value of the column dominates (mostly empty strings, e.g. MobilePhoneModel where 94/100M rows are "") the prewhere costs CPU without saving any I/O. Locally reproducing the regression flagged in the perf comparison for PR ClickHouse#99537, batch 3/6 amd_release: SELECT MobilePhoneModel FROM hits_100m_single ORDER BY MobilePhoneModel LIMIT 300 master default (both off): 0.013-0.015s PR ClickHouse#99537 default (both on): 0.022-0.026s 1.47-1.63x slower Profile events show identical SelectedRows (99,997,497) between the two — the dynamic filter skips zero granules but still pays a per-block lessOrEquals(String, String) cost, increasing UserTime by ~1.8x. The fix introduces use_top_k_dynamic_filtering_for_variable_length_types (off by default). When use_top_k_dynamic_filtering is enabled, the optimization is now restricted to columns whose values have a fixed maximum size in memory — i.e. haveMaximumSizeOfValue() returns true. That predicate forwards through Nullable and Tuple, so Nullable of a fixed-length type and Tuples of fixed-length types remain eligible, matching the user's request. Variable-length types only get the optimization when the user explicitly opts in. Test queries follow MobilePhoneModel's distribution and check the EXPLAIN output for the __topKFilter prewhere with each gate combination. End-to-end timing on the same data confirms the regression is gone with the new default settings: PR ClickHouse#99537 settings on, new gate off (default): 0.017s All on (old broken behavior): 0.026s master baseline (top-k off): 0.013s CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99537&sha=9d86b8cf27ca90c5223f949656f0141685ea3f6f&name_0=PR&name_1=Performance%20Comparison%20%28amd_release%2C%20master_head%2C%203%2F6%29 Original PR: ClickHouse#99537 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-05-07T13:58:20Z

Remove unrelated changes in vectorsearchstress.yml.

Update the branch to include the new changes that should fix the performance.

Restore the file to its master state — the runner change to `arm-large` was unrelated to enabling top-k optimizations and should not be in this PR. Per review comment: #99537 (comment)

…zations-by-default

clickhouse-gh · 2026-05-07T18:01:49Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.10%	84.20%	+0.10%
Functions	91.10%	91.10%	+0.00%
Branches	76.60%	76.70%	+0.10%

Changed lines: 100.00% (27/27) · Uncovered code

Full report · Diff report

alexey-milovidov · 2026-05-08T11:45:07Z

Two failures showed up only on the internal CI flavours: 1. `test_distributed_stateless::test_simple_query` failed with `Unknown function __topKFilter` once `use_top_k_dynamic_filtering` became the default. The dynamic-filtering path injects an internal `__topKFilter` that is created on demand with a runtime threshold tracker and is not registered in `FunctionFactory`; the skip-index-on-data-read path likewise relies on a `TopKThresholdTracker` shared between `SortingStep` and `ReadFromMergeTree`. Neither survives serialization to remote workers, so when `make_distributed_plan = 1` the optimizer must skip the rewrite. Thread `make_distributed_plan` through `Optimization::ExtraSettings` and bail out early in `tryOptimizeTopK`. 2. `01926_order_by_desc_limit` timed out in the `amd_asan_ubsan, flaky check, s3 storage, meta in keeper` flavour: the flaky check runs the test many times in parallel and each run takes >240 s because of `INSERT` + `OPTIMIZE FINAL` over remote storage under sanitizers. The only modification from the previous revision is two `SET` lines that are no-ops for this query (sort column is the primary key with `optimize_read_in_order`, no skip index defined), so re-running it adds no signal. Add the `no-flaky-check` tag with a comment explaining why. CI: Upstream PR: ClickHouse#99537 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k…

496512e

… by default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 16, 2026

shankar-iyer self-assigned this Mar 16, 2026

shankar-iyer mentioned this pull request Mar 18, 2026

Support non-numeric & nullable types for FunctionTopKFilter #99033

Merged

1 task

shankar-iyer approved these changes Mar 19, 2026

View reviewed changes

Merge branch 'master' into enable-top-k-optimizations-by-default

163d73c

alexey-milovidov added the v26.3-must-backport label Mar 19, 2026

clickhouse-gh Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread src/Core/SettingsChangesHistory.cpp Outdated

Merge master into enable-top-k-optimizations-by-default

0e5b34d

alexey-milovidov removed the v26.3-must-backport label Mar 21, 2026

alexey-milovidov added 7 commits March 21, 2026 18:32

Update SettingsChangesHistory.cpp

e3e7780

Merge branch 'master' into enable-top-k-optimizations-by-default

c919427

Merge remote-tracking branch 'origin/fix-topk-read-in-order-test' int…

2b30b36

…o enable-top-k-optimizations-by-default

Merge branch 'enable-top-k-optimizations-by-default' of github.com:Cl…

6327c4c

…ickHouse/ClickHouse into enable-top-k-optimizations-by-default

Merge branch 'master' into enable-top-k-optimizations-by-default

f71f16d

Merge master into enable-top-k-optimizations-by-default

2ae329a

Merge branch 'master' into enable-top-k-optimizations-by-default

a51e547

groeneai mentioned this pull request Mar 25, 2026

Fix topK dynamic filtering crash with Dynamic/Variant columns #100742

Merged

4 tasks

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

7b20276

…zations-by-default

alexey-milovidov force-pushed the enable-top-k-optimizations-by-default branch from 1c18b6e to 7b20276 Compare March 26, 2026 23:47

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

888ad9c

…zations-by-default

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

9d86b8c

…zations-by-default

alexey-milovidov mentioned this pull request May 6, 2026

ASan leak in TemporaryTableHolder from EXPLAIN analyzer path (cross-test, cross-PR) #104176

Closed

alexey-milovidov mentioned this pull request May 6, 2026

Avoid TopK dynamic filter regression on variable-length sort columns #104216

Merged

1 task

alexey-milovidov added 2 commits May 6, 2026 13:12

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

b7b4a7f

…zations-by-default

Keep workflow files at previous form to avoid push rejection

b2fd5d5

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

3344c62

…zations-by-default

alexey-milovidov added 3 commits May 7, 2026 14:01

Remove unrelated change in vectorsearchstress.yml

7a38d01

Restore the file to its master state — the runner change to `arm-large` was unrelated to enabling top-k optimizations and should not be in this PR. Per review comment: #99537 (comment)

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

dad6d06

…zations-by-default

Merge remote-tracking branch 'origin/master' into enable-top-k-optimi…

64600d9

…zations-by-default

alexey-milovidov added this pull request to the merge queue May 8, 2026

Merged via the queue into master with commit b1037f3 May 8, 2026
165 checks passed

alexey-milovidov deleted the enable-top-k-optimizations-by-default branch May 8, 2026 11:56

robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 8, 2026

groeneai mentioned this pull request May 22, 2026

Logical error: 'parameters == nested_func->getParameters()' (STID: 2508-495a) #102993

Closed

gtarpenning mentioned this pull request Jun 10, 2026

chore(weave): bump CI clickhouse to 26.4, disable query condition cache wandb/weave#7117

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alexey-milovidov commented Mar 16, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

clickhouse-gh Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

shankar-iyer commented Mar 16, 2026

Uh oh!

alexey-milovidov commented Mar 16, 2026

Uh oh!

shankar-iyer commented Mar 17, 2026

Uh oh!

alexey-milovidov commented Mar 18, 2026

Uh oh!

shankar-iyer commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented Mar 18, 2026

Uh oh!

alexey-milovidov commented Mar 18, 2026

Uh oh!

Uh oh!

CLAassistant commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented May 5, 2026

Uh oh!

alexey-milovidov commented May 5, 2026

Uh oh!

groeneai commented May 6, 2026

Uh oh!

alexey-milovidov commented May 6, 2026

Uh oh!

shankar-iyer commented May 6, 2026

Uh oh!

alexey-milovidov commented May 7, 2026

Uh oh!

clickhouse-gh Bot commented May 7, 2026

LLVM Coverage Report

Uh oh!

alexey-milovidov commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alexey-milovidov commented Mar 16, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Mar 16, 2026 •

edited

Loading

shankar-iyer commented Mar 18, 2026 •

edited

Loading

CLAassistant commented Mar 26, 2026 •

edited

Loading