iframe-proxy

Onyx2406 · 2026-03-11T19:30:42Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added arrayCombinations(arr, k), arrayPermutations(arr), and arrayPartialPermutations(arr, k) functions for generating combinations and permutations of array elements.

Documentation entry for user-facing changes

Documentation is written (function descriptions and examples in code)

What this PR does

Adds three combinatorics functions for arrays, as proposed in the Intern Tasks 2025/2026:

Implementation (2 files, ~250 lines)

arrayCombinations.cpp — Iterative index-based algorithm for generating C(n,k) combinations
arrayPermutations.cpp — Template class handling both full (std::next_permutation) and partial (recursive swap-based) permutations
Follows the arrayShingles pattern: Array(T) → Array(Array(T)) with dual offset management
Works with any element type (integers, strings, etc.)

Test plan

New stateless test 04043_array_combinatorics.sql with 12 test cases:
- Combinations: k=1, k=2, k=3, k=0, k=n, string elements
- Full permutations: n=3, n=1, empty array
- Partial permutations: k=2, k=1, k=0
- Error cases: k>n, k<0 (BAD_ARGUMENTS)
CI (full test suite)

Closes: #43175

Note

Medium Risk
Adds new array combinatorics functions that can generate very large result sets; while guarded by element caps and tests, it touches query execution paths and could impact memory/CPU usage and edge-case correctness.

Overview
Adds three new array functions: arrayCombinations(arr, k), arrayPermutations(arr), and arrayPartialPermutations(arr, k), each returning Array(Array(T)) results generated per input row.

The implementations include argument validation and a hard cap (~1M total output elements) with TOO_LARGE_ARRAY_SIZE errors to prevent OOM, plus new stateless SQL/reference tests covering normal cases, multi-row offsets, and expected error/limit scenarios.

^{Reviewed by Cursor Bugbot for commit 549f6f0. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add three combinatorics functions for arrays: - arrayCombinations(arr, k): all k-length unordered subsets - arrayPermutations(arr): all full permutations - arrayPartialPermutations(arr, k): all k-length ordered selections Fixes ClickHouse#43175

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

…size limit - Use used-flags approach instead of swap-based recursion for partial permutations to produce lexicographic output order - k=0 now correctly returns [[]] (one empty selection) matching C(n,0)=1 - Add TOO_LARGE_ARRAY_SIZE guard (1M element cap) to prevent OOM

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

clickhouse-gh · 2026-03-12T06:00:29Z

Workflow [PR], commit [ff2a490]

Summary: ❌

Performance Comparison: Performance dashboard

job_name	test_name	status	info
Integration tests (arm_binary, distributed plan, 1/4)		FAIL
	test_implicit_index_upgrade/test.py::test_implicit_index_upgrade_alter_replay	FAIL	cidb
Stress test (amd_tsan)		FAIL
	Hung check failed, possible deadlock found	FAIL	cidb, issue

AI Review

Summary

This PR adds arrayCombinations, arrayPermutations, and arrayPartialPermutations with structured function documentation and focused stateless coverage. The current code addresses the earlier review issues around k == 0, lexicographic partial-permutation order, whole-block output caps, constant-array handling without replication, and introduced_in metadata. I found no remaining correctness, safety, or testing concerns that warrant inline comments.

Final Verdict

Status: ✅ Approve

The overflow guard was too strict — checking before division caused valid C(n,k) values to be rejected (e.g. C(20,10)=184756). Now uses GCD cross-cancellation between numerator and denominator factors to keep intermediates small. Added test for C(20,10) success case.

alexey-milovidov · 2026-03-18T00:07:55Z

@Onyx2406, almost ready! Let's check the review comments.

Address review comments: 1. Size guard now caps total materialized elements (rows * k) instead of just the number of result rows. Previously, inputs like `arrayCombinations(range(20), 10)` produced 1.8M elements while passing the 1M limit check. 2. Replace O(k^2) GCD cross-cancellation in `combinationCountCapped` with an O(k) incremental formula using 128-bit intermediate arithmetic. This prevents pathological inputs from spending excessive time in the guard before rejection. 3. Add regression tests verifying the total-elements invariant: - `arrayCombinations(range(20), 10)` now correctly rejected - `arrayPermutations(range(9))` now correctly rejected (362880*9 > 1M) 4. Fix CRLF line endings in source and test files.

Onyx2406 · 2026-03-27T20:28:36Z

@alexey-milovidov is this good to merge?

…atorics

alexey-milovidov · 2026-04-07T00:22:38Z

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

alexey-milovidov · 2026-04-08T02:22:47Z

The test 02859_replicated_db_name_zookeeper is fixed in #101952

alexey-milovidov · 2026-04-09T03:28:06Z

The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158

alexey-milovidov · 2026-04-10T02:14:28Z

The Hung check failure is fixed by #102008 and #102010, let's update the branch

…06-feat-array-combinatorics

…orics

Master is at version 26.6, but the `FunctionDocumentation::IntroducedIn` metadata for `arrayCombinations`, `arrayPermutations`, and `arrayPartialPermutations` claimed `26.4`. This would generate docs claiming the functions are available in releases where they cannot exist. Update all three to `26.6`.

alexey-milovidov · 2026-06-04T18:34:17Z

Picked this up to advance it:

Merged the latest master (the branch was ~667 commits behind); no conflicts. Re-synced the contrib/mariadb-connector-c submodule pointer that the merge bumped.
Addressed the stale-docs review thread: updated IntroducedIn for arrayCombinations, arrayPermutations, and arrayPartialPermutations from {26, 4} to {26, 6} to match the current master version. Built a fresh binary and confirmed system.functions.introduced_in reports 26.6.0 for all three.
Rebuilt and ran 04043_array_combinatorics against the fresh binary — reference matches.

Remaining open review thread: the suggestion to expose the 1000000-element output cap as a tunable setting. I left this for a maintainer call (rationale in the thread) rather than adding new setting API surface unilaterally.

The only red CI check is test_s3_aws_sdk_has_slightly_unreliable_behaviour::test_dataloss, which timed out at 600s — a known flaky S3 SDK test unrelated to this change.

alexey-milovidov · 2026-06-06T17:44:35Z

Continued via `/continue-pr`.

State of the PR:

CI is green (162 checks, 0 failures).
No conflicts with `master` and the branch is only ~2 days behind the merge base — no re-merge needed.
All bot review threads were already addressed in earlier commits (per-row → whole-block element budget, the `k == n` fast-path honoring the cap, `IntroducedIn` updated to `{26, 6}`, multi-row offset regressions, and the negative-`k` test for `arrayPartialPermutations`).

The sole remaining open thread was the question of whether to expose the `1000000`-element `TOO_LARGE_ARRAY_SIZE` cap as a tunable setting. Keeping it hard-coded as an OOM safety guard (per the rationale above in that thread): the cap guards against factorial/combinatorial blow-up rather than a linear workload worth tuning, so a new public-API setting is not warranted. Marked that thread resolved.

The implementation looks correct and complete; no further changes needed.

The functions `arrayCombinations`, `arrayPermutations`, and `arrayPartialPermutations` cap the total number of output elements per block at 1000000 and raise a `TOO_LARGE_ARRAY_SIZE` exception when this is exceeded. This cap was previously undocumented. Mention it explicitly in the `FunctionDocumentation` description of each function so users can discover why a valid combinatorics query may be rejected. The cap remains a fixed, intentional safety limit (not a tunable setting), as decided in review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-09T11:16:24Z

Follow-up:

Documented the fixed 1000000-element output cap (and the resulting TOO_LARGE_ARRAY_SIZE exception) in the FunctionDocumentation for all three functions, closing the AI review's remaining concern. The cap stays a fixed safety limit rather than a tunable setting, as discussed.
Updated the PR body to use the Closes: full-URL relationship form per repo conventions.

Rebuilt and re-ran 04043_array_combinatorics against a fresh binary — reference matches; function output matches the examples.

…orics

alexey-milovidov · 2026-06-12T13:30:36Z

Continued via /continue-pr:

CI was fully green (all checks passed, 0 unresolved review threads), but the merge base was 8 days old, so I merged the latest master into the branch — the merge was clean, with no conflicts.
Compile-verified the merged branch: clickhouse_functions_array (including the new arrayCombinations.cpp and arrayPermutations.cpp) builds without errors against the updated master.
IntroducedIn metadata ({26, 6}) still matches the current master version, so no further changes were needed.

The PR remains ready for review/merge.

…ions `arrayCombinations` and `arrayPartialPermutations` take two arguments, so `useDefaultImplementationForConstants` only folds the all-constant case. When the array argument is constant but `k` varies by row, the array reaches `executeImpl` as `ColumnConst(ColumnArray)`, so `checkAndGetColumn<ColumnArray>` returned `nullptr` and the call threw `ILLEGAL_COLUMN` for valid input such as `arrayCombinations([1, 2, 3], number)`. Materialize the array argument with `convertToFullColumnIfConst` (a no-op for already-full columns) so the per-row offset logic is valid in this mixed-constant case. Added regressions for both functions over `numbers(4)`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…orics

`arrayCombinations`, `arrayPartialPermutations` (and the shared `arrayPermutations`) called `convertToFullColumnIfConst` on the array argument. When only the array is constant while `k` varies by row, that replicates a `ColumnConst(ColumnArray)` to every input row before the per-row output budget is checked, so a query like `SELECT arrayCombinations(range(1000000), number) FROM numbers(1000000)` materializes `array_length * rows` input elements before `TOO_LARGE_ARRAY_SIZE` can fire. Detect `ColumnConst(ColumnArray)` and read its single stored array (offset index 0) for every row instead of replicating it, so the cap fires fast on the shared element budget. Added a regression for a large constant array with row-varying `k` and many rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-15T20:30:06Z

Continued via /continue-pr:

Merged the latest master into the branch (it was ~1410 commits / 3 days behind the merge base, and showing red). The merge was clean — no conflicts.
Addressed the two open AI-review threads about convertToFullColumnIfConst expanding a ColumnConst(ColumnArray) across all rows before the output cap could fire. The array argument is now detected as ColumnConst(ColumnArray) and its single stored array is read per row (arr_row = 0) instead of being replicated to input_rows_count rows. The per-row output budget now fires fast: SELECT arrayCombinations(range(1000000), number) FROM numbers(1000000) (and the arrayPartialPermutations equivalent) throw TOO_LARGE_ARRAY_SIZE in ~0.2s rather than first materializing array_length * rows input elements. Added a regression for this case to 04043_array_combinatorics.sql. Both threads resolved.
Built a fresh binary and verified: all deterministic queries in 04043_array_combinatorics match the reference exactly (the fix is output-preserving for the existing const-array cases), and every error case throws the expected code — including the two new large-constant-array regressions, which throw TOO_LARGE_ARRAY_SIZE essentially instantly (no replication blow-up).

The only red CI check is the AST fuzzer (arm_asan_ubsan) failure Logical error: No set is registered for key __set_... in makeSetForInFunction. The triggering query is a fuzzed notIn((SELECT DISTINCT date FROM test_00597 ...), b AS alias453) over the standard corpus table test_00597 — no array combinatorics functions involved. This is a pre-existing planner robustness flake (NOT IN subquery set not registered in the planner context): it has reproduced on master itself (2026-05-01, 2026-05-14) and on the unrelated PR #104569, so it is unrelated to this change. A fresh CI run is triggered by the new commits.

…orics Keep .github/workflows/*.yml at the PR tip to avoid fork-push workflow-scope rejection (the fork's GitHub App token lacks the workflows scope). Same technique as the earlier workflow-freeze commits on this branch.

alexey-milovidov · 2026-06-18T17:23:26Z

Merged fresh `master` (`283677ac277..4abadbf`).

The only red was `CH Inc sync` ("build failed"), which was not caused by this PR: the private sync build failed on `programs/distributed-cache-bench/BenchClient.cpp` including the non-existent header `IO/SilkFiberStreamSocketImpl.h` — a transient private-master break (the public counterpart was reverted in #107539, "Revert "Add silk fibers aware socket implementations""). programs/distributed-cache-bench/ does not exist in the public tree, and this PR only adds src/Functions/array/arrayCombinations.cpp + arrayPermutations.cpp + the 04043 test, so it is entirely orthogonal.

Re-merging fresh master regenerates the sync PR against the now-fixed private master (which already replaced the include with <FiberSocket.h>), so the sync build should pass. .github/workflows/*.yml were kept at the PR tip to avoid the fork-push workflow-scope rejection (same technique as the earlier workflow-freeze commits here).

…orics

The previous master merges left `.github/workflows/*.yml` carrying stale CI definitions (e.g. a removed `smoke_test_amd_darwin` job) that diverged from master. Restore all workflow files to match master so this PR's net diff is limited to the new array combinatorics functions and their test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-20T16:31:16Z

Merged current master and restored .github/workflows/*.yml to match it.

The previous master merges had left the workflow files (backport_branches.yml, master.yml, pull_request.yml, release_branches.yml) carrying stale CI definitions that diverged from master — e.g. a smoke_test_amd_darwin job that no longer exists on master. Without this, the PR would have carried those unrelated CI changes into master.

After this push the PR's net diff against master is exactly the four intended files (the two new function sources and the stateless test); git diff origin/master...HEAD -- .github/ is empty. The function sources are unchanged since the last green CI run (283677ac), and the FunctionDocumentation / column APIs they use are unchanged on current master (version is still 26.6, matching IntroducedIn = {26, 6}). CI will re-run on the merged head.

Master has moved to 26.7 development (VERSION_MINOR 7), so the `introduced_in` metadata of `arrayCombinations`, `arrayPermutations`, and `arrayPartialPermutations` is bumped from `{26, 6}` to `{26, 7}` to reflect the release in which these functions will actually ship. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-28T06:23:28Z

Merged master to refresh the branch (it had fallen >1 week behind: merge-base was 2026-06-20, now 7.6 days old). The merge was clean — no conflicts, and the net PR diff is still exactly the same 4 files (arrayCombinations.cpp, arrayPermutations.cpp, and the 04043_array_combinatorics test/reference), +519/-0.

While refreshing, master has advanced to 26.7 development (VERSION_MINOR 7), so I bumped introduced_in from {26, 6} to {26, 7} for all three functions to match the release they will actually ship in (same kind of bump as the earlier {26, 4}→{26, 6} update). The test reference does not depend on the version, so it is unchanged. Re-running CI on the fresh base.

clickhouse-gh · 2026-06-28T09:41:03Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered by tests: 204/208 (98.08%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

`test_implicit_index_upgrade_alter_replay` creates a second replica `node2` and asserts `SELECT count() FROM test_alter_replay` equals `10001` immediately after `wait_for_active_replica`. But `wait_for_active_replica` only waits for `is_readonly = 0`, not for the freshly-joined replica to finish fetching parts. So `node2` can observe only the small `VALUES` part (1 row, `key = 99999`) before the 10000-row part is fetched, and the assertion fails with `assert '1' == '10001'`. This raced once each on several unrelated PRs over the last 30 days (e.g. ClickHouse#108084, ClickHouse#99280, ClickHouse#107566, ClickHouse#107442). Add `SYSTEM SYNC REPLICA test_alter_replay` after the replica becomes active, which blocks until all parts are fetched, before checking the row count. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108770&sha=bb760616d54e3d8e5d20968022085506c31b5f37&name_0=PR&name_1=Integration%20tests%20%28arm_binary%2C%20distributed%20plan%2C%201%2F4%29 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread src/Functions/array/arrayPermutations.cpp

Comment thread src/Functions/array/arrayCombinations.cpp

Comment thread src/Functions/array/arrayPermutations.cpp

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

alexey-milovidov added the can be tested Allows running workflows for external contributors label Mar 12, 2026

clickhouse-gh Bot added the pr-feature Pull request with new product feature label Mar 12, 2026

clickhouse-gh Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

clickhouse-gh Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

clickhouse-gh Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayPermutations.cpp Outdated

clickhouse-gh Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

Fix clang-tidy readability-use-std-min-max and CRLF line endings

ed023c0

clickhouse-gh Bot reviewed Mar 15, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

clickhouse-gh Bot reviewed Mar 15, 2026

View reviewed changes

Comment thread src/Functions/array/arrayPermutations.cpp

Merge master into feat/array-combinatorics

2764bb2

clickhouse-gh Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04043_array_combinatorics.sql Outdated

Merge remote-tracking branch 'upstream/master' into feat/array-combin…

4a3ed1c

…atorics

Merge branch 'master' into feat/array-combinatorics

532559e

clickhouse-gh Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04043_array_combinatorics.sql

Merge remote-tracking branch 'origin/master' into pr_99280

36ea334

Merge branch 'master' into feat/array-combinatorics

8a58f01

Merge remote-tracking branch 'origin/master' into merge-master-Onyx24…

73c54cb

…06-feat-array-combinatorics

alexey-milovidov added 2 commits June 4, 2026 18:30

Merge remote-tracking branch 'origin/master' into feat/array-combinat…

b816935

…orics

Merge remote-tracking branch 'origin/master' into feat/array-combinat…

8dda746

…orics

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayPermutations.cpp Outdated

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Functions/array/arrayCombinations.cpp Outdated

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Functions/array/arrayPermutations.cpp Outdated

alexey-milovidov and others added 2 commits June 15, 2026 20:15

Merge remote-tracking branch 'origin/master' into feat/array-combinat…

a20b094

…orics

clickhouse-gh Bot added the manual approve Manual approve required to run CI label Jun 18, 2026

alexey-milovidov and others added 2 commits June 20, 2026 16:29

Merge remote-tracking branch 'origin/master' into feat/array-combinat…

ad3fdd8

…orics

alexey-milovidov removed the manual approve Manual approve required to run CI label Jun 22, 2026

alexey-milovidov and others added 2 commits June 28, 2026 06:20

Merge remote-tracking branch 'origin/master' into HEAD

35dee88

This was referenced Jun 30, 2026

Fix flaky test_implicit_index_upgrade_alter_replay (replica fetch race) #108961

Merged

Fix hang in SYSTEM SYNC MERGES with the Manual merge selector #108770

Open

Function	Description	Example
`arrayCombinations(arr, k)`	All k-length unordered subsets	`arrayCombinations([1,2,3], 2)` → `[[1,2],[1,3],[2,3]]`
`arrayPermutations(arr)`	All full permutations	`arrayPermutations([1,2,3])` → `[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]`
`arrayPartialPermutations(arr, k)`	All k-length ordered selections	`arrayPartialPermutations([1,2,3], 2)` → `[[1,2],[1,3],[2,1],[2,3],[3,1],[3,2]]`

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

Onyx2406 commented Mar 11, 2026 • edited by alexey-milovidov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

What this PR does

Implementation (2 files, ~250 lines)

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clickhouse-gh Bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Mar 18, 2026

Uh oh!

Onyx2406 commented Mar 27, 2026

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Apr 8, 2026

Uh oh!

alexey-milovidov commented Apr 9, 2026

Uh oh!

alexey-milovidov commented Apr 10, 2026

Uh oh!

alexey-milovidov commented Jun 4, 2026

Uh oh!

alexey-milovidov commented Jun 6, 2026

Uh oh!

alexey-milovidov commented Jun 9, 2026

Uh oh!

alexey-milovidov commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Jun 15, 2026

Uh oh!

alexey-milovidov commented Jun 18, 2026

Uh oh!

alexey-milovidov commented Jun 20, 2026

Uh oh!

alexey-milovidov commented Jun 28, 2026

Uh oh!

clickhouse-gh Bot commented Jun 28, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Onyx2406 commented Mar 11, 2026 •

edited by alexey-milovidov

Loading

clickhouse-gh Bot commented Mar 12, 2026 •

edited

Loading