Add functions arrayTopK and arrayBottomK#104563
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces new higher-order array functions arrayTopK and arrayBottomK to return the top/bottom K elements of an array (with optional lambda sort keys and NULL skipping). It also updates the Prometheus Query-to-SQL converter to leverage these functions for topk/bottomk/limitk, enabling support for non-constant k aligned to the time grid, and adds test coverage for the new behavior.
Changes:
- Add
arrayTopK/arrayBottomKimplementations and function registration. - Refactor Prometheus
applyLimitAggregationOperatorto use the new functions and support scalar-gridk, enabling an integration test previously disabled. - Add stateless SQL tests + references, and update aspell dictionary for the new function names.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/queries/0_stateless/04105_array_top_k_bottom_k.sql | Adds stateless queries covering top/bottom K behavior, NULL handling, lambdas, and errors. |
| tests/queries/0_stateless/04105_array_top_k_bottom_k.reference | Expected outputs for the new stateless tests. |
| tests/integration/test_prometheus_protocols/test_evaluation.py | Re-enables a PromQL topk test with time-varying k now that it’s supported. |
| src/Storages/TimeSeries/PrometheusQueryToSQL/applyLimitAggregationOperator.cpp | Uses arrayTopK/arrayBottomK for index selection and implements scalar-grid k handling. |
| src/Functions/array/arrayTopK.h | Declares arrayTopK/arrayBottomK via FunctionArrayMapped and defines return type / argument checks. |
| src/Functions/array/arrayTopK.cpp | Implements the core top/bottom K selection logic and registers both functions + docs. |
| ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt | Adds arrayTopK and arrayBottomK to spelling ignore list. |
|
The closest existing alternative is |
d61b373 to
3afdcfe
Compare
3afdcfe to
aaf640d
Compare
|
@groeneai Investigate CI failures |
|
@vitlibar — investigated CI failures on 1.
|
aaf640d to
cfbaa0d
Compare
cfbaa0d to
7c9b2ab
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/Functions/array/arrayTopK.cpp:276
- The implementation and tests define signed negative K values as clamped to 0, but the public function documentation for
arrayBottomKdoes not mention that behavior. Please document this edge case alongside the K argument description.
FunctionDocumentation::Syntax syntax = "arrayBottomK([f,] K, arr [, arr1, ... ,arrN])";
FunctionDocumentation::Arguments arguments = {
{"f(arr[, arr1, ... ,arrN])", "Optional. A lambda function to compute the sort key for each element.", {"Lambda function"}},
{"K", "The number of smallest elements to return.", {"(U)Int*"}},
{"arr", "An array.", {"Array(T)"}},
7c9b2ab to
c4d82eb
Compare
c4d82eb to
06d75cf
Compare
c192177 to
5728e8f
Compare
5728e8f to
8ddb9dd
Compare
|
@groeneai Investigate CI failures |
|
@vitlibar Triage on head
Failing test: Root cause family: the Recommendation: rebase + retry on the amd_msan WasmEdge job. If it persists, treat as infra and proceed — this PR has the assignee approval and is functionally clean. |
Co-authored-by: Nihal Z. Miaji <81457724+nihalzp@users.noreply.github.com>
cbb8e0e to
afa1c99
Compare
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 244/255 (95.69%) | Lost baseline coverage: none · Uncovered code |
`CoalescingMergeTree` delegates to `SummingSortedAlgorithm` with `aggregate_all_columns = true`. The dedicated branch for that case in `defineColumns` (`src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp:267`) wrapped its work in `if (map_name == column.name || !endsWith(map_name, "Map"))` and then fell through to an unconditional `continue`. For any nested column whose parent table name ends with `Map` (e.g. `testMap.key`, `legacy_features_Map.id`), the inner branch was skipped while the `continue` still fired, so the column was added to neither `columns_to_aggregate` nor `column_numbers_not_to_aggregate`. `postprocessChunk` then constructed `res_columns` of size `num_result_columns` and left those slots `nullptr`. `Chunk::setColumns` -> `Chunk::checkNumRowsIsConsistent` dereferenced the null `ColumnPtr`, producing a SIGSEGV in release builds, a `px != 0` assertion in debug builds, and an UBSan `member call on null pointer` report in `arm_asan_ubsan` (STID 1003-326e on the AST fuzzer for PR ClickHouse#104281 and PR ClickHouse#104563). CoalescingMergeTree performs per-column `last_value` coalescing, so nested `Map` arrays must be coalesced element-wise like any other array column. The legacy `nestedName.suffix` routing into `discovered_maps` for `sumMap`-by-key merging is a SummingMergeTree concept that does not apply here, so the Map-suffix exclusion is dropped from the `aggregate_all_columns = true` branch. SummingMergeTree code paths are untouched. Closes ClickHouse#101937 CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/104563/c2e0f0fe3b92d48b5a47fa90555f7d4aca54c553/ast_fuzzer_arm_asan_ubsan/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add functions arrayTopK and arrayBottomK:
arrayTopK(k, array)returns the K largest elements in descending orderarrayBottomK(k, array)returns the K smallest elements in ascending orderFor example
arrayBottomK(3, [1, 5, 2, 7, 3])returns[1,2,3]Likewise
arraySort()an optional lambda is supported, soarrayBottomK((x, y) -> y, 2, ['a', 'b', 'c'], [3, 1, 2])returns['b', 'c']Version info
26.6.1.320