Preserve existing parameters in groupConcat two-argument form by yariks5s · Pull Request #104882 · ClickHouse/ClickHouse · GitHub
Skip to content

Preserve existing parameters in groupConcat two-argument form#104882

Merged
yariks5s merged 7 commits into
masterfrom
yarik/preserve-existing-parameters-groupConcat
Jul 3, 2026
Merged

Preserve existing parameters in groupConcat two-argument form#104882
yariks5s merged 7 commits into
masterfrom
yarik/preserve-existing-parameters-groupConcat

Conversation

@yariks5s

@yariks5s yariks5s commented May 13, 2026

Copy link
Copy Markdown
Member

Reference: #72540 (comment).
Before, a query that mixes both spellings — e.g. groupConcat(',', 2)(x, '/') — silently dropped the 2 and returned every row instead of the requested two

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Preserves the original parameters and lets the second-argument delimiter
override position 0 (the two-argument spelling is the more specific one
and already documented to "override current parameter" on the test
above this case)

Version info

  • Merged into: 26.7.1.483

@clickhouse-gh

clickhouse-gh Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 13, 2026
@scanhex12 scanhex12 self-assigned this May 13, 2026

@scanhex12 scanhex12 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but there is one more test to fix

yariks5s and others added 4 commits May 13, 2026 23:48
The flaky check randomized `max_threads`/`aggregation_in_order`, which
split the 5-row scan across threads and let rows arrive at the aggregator
in a different order than the table's `ORDER BY x`. The limit then kept
the wrong rows.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104882&sha=70e845a033a99309f793439af79cfeddc43b5312&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous fix only set `max_threads = 1` per `SELECT`, so the random
flaky-check setting `max_insert_threads = 2` could still split the 5-row
`INSERT` into two parts. When the part holding `[1,2,3,4]` was written
before the one holding `[0]`, the read order became `1,2,3,4,0` and the
limited outputs picked the wrong rows.

Apply session-level `SET max_insert_threads = 1, max_threads = 1,
min_insert_block_size_rows = 0, min_insert_block_size_bytes = 0` and
`OPTIMIZE TABLE ... FINAL` to guarantee a single ordered part, matching
the pattern in `03156_group_concat.sql`.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104882&sha=2edb3a55709250d1e767650200b6542cb15ddc30

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flaky check also randomizes `enable_parallel_replicas = 1` and
`parallel_replicas_for_non_replicated_merge_tree = 1`, which splits the
5-row scan across replicas and reorders rows when results are merged —
yielding the `1,2,3,4,0` pattern even with `max_threads = 1` and a
single part. Pin both off via session-level `SET`.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104882&sha=a80143bd52713b67cd92f957f8372710adb3cd1a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flaky check applies its random settings as URL parameters on every
HTTP request, which take per-query precedence over session-level `SET`
and over per-query `SETTINGS` from prior queries. So even with the
previous defensive `SET max_insert_threads = 1` etc., the `INSERT`
still ran with `max_insert_threads >= 2`, created multiple parts, and
the resulting read order was non-deterministic.

Drop the order-dependent reference and insert identical rows instead.
The test still checks both that the limit (`2`/`3`/`100`) and the
overridden delimiter (`/`, `|`) are honoured.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104882&sha=ae4473b6606021924841c341bc1f5a08fa8bcfa9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

📊 Cloud Performance Report

🔴 AI verdict: degradation2 query(s) regressed out of 38 analysed

This PR makes two narrow changes: fixing groupConcat to preserve its limit parameter when a second delimiter argument is supplied (analyzer/query-tree path), and carrying a parent column's DEFAULT into projection metadata. Neither touches the join or aggregation execution that the flagged TPC-H queries run. Q4 (+185%) and Q7 (+779%) are large enough to clear the automatic trust threshold so their verdicts stand, but their baseline runs were extremely noisy and the query paths are unrelated to the diff -- a re-run is recommended before treating them as real. Q8, Q16, and ClickBench Q33 are off-path and consistent with run-to-run variance, so they were downgraded to not-sure.

clickbench

⚠️ 1 inconclusive

Flagged queries (1 of 43)
Query Verdict Baseline median (ms) PR median (ms) Change q-value Hint
⚠️ 33 not_sure 1380 1452 +5.2% 0.0004 PR only changes groupConcat analyzer path + projection column defaults; Q33 exercises neither. 5% is off-path variance.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

tpch_adapted_1_official

🔴 2 regressed · ⚠️ 2 inconclusive

Flagged queries (4 of 22)
Query Verdict Baseline median (ms) PR median (ms) Change q-value Hint
🔴 4 regression 327 933 +185.3% <0.0001 join_operator: Large delta clears the hard trust threshold so verdict stands; base samples were very noisy, re-run to confirm.
🔴 7 regression 71 624 +778.9% <0.0001 join_operator: 778% clears hard trust gate; but base was extremely noisy (71ms median highly variable). Re-run advised.
⚠️ 8 not_sure 168 208 +23.8% <0.0001 join_operator: Join query untouched by a groupConcat parser + projection-default diff; 23.8% delta on a noisy base is off-path.
⚠️ 16 not_sure 99 50 -49.5% <0.0001 aggregation: Flagged improvement implausible from a parser-only diff; base was noisy, so -49% is variance not a PR effect.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

Debug info
  • StressHouse run: f44c1557-3b60-407a-8500-2dff2d15623d
  • MIRAI run: d6f9d5d8-3f0c-4596-b5d7-e02cf66da744
  • PR check IDs:
    • clickbench_654875_1782935449
    • clickbench_654881_1782935449
    • clickbench_654889_1782935449
    • tpch_adapted_1_official_654899_1782935449
    • tpch_adapted_1_official_654909_1782935449
    • tpch_adapted_1_official_654924_1782935449

@clickhouse-gh

clickhouse-gh Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Dear @scanhex12, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@clickhouse-gh clickhouse-gh Bot assigned scanhex12 and unassigned scanhex12 Jun 15, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.40% -0.10%
Functions 92.60% 92.60% +0.00%
Branches 77.70% 77.60% -0.10%

Changed lines: Changed C/C++ lines covered: 13/13 (100.00%) · Uncovered code

Full report · Diff report

@yariks5s yariks5s added this pull request to the merge queue Jul 3, 2026
Merged via the queue into master with commit 4a30bc5 Jul 3, 2026
346 of 348 checks passed
@yariks5s yariks5s deleted the yarik/preserve-existing-parameters-groupConcat branch July 3, 2026 12:44
@clickgapai

Copy link
Copy Markdown
Contributor

Found via ClickGap automated review. Please close or comment if this is incorrect or needs adjustment.

TL;DR

Preserve existing parameters in groupConcat two-argument form

ClickGap verified: reproduced it with the script below on 26.5; tested 6 version(s) — 5 affected, 1 clean (see Affects); present since the feature was introduced in v24.8 (not a recent regression).

Reproduction

-- Requires the two-argument groupConcat overload (#72540) and enable_analyzer=1.
-- The '2' is a limit (max rows to concat); '/' is the two-arg-form delimiter.
SELECT groupConcat(',', 2)(number, '/') FROM numbers(5) SETTINGS enable_analyzer = 1;
-- Buggy builds drop the limit and return 0/1/2/3/4; the fix returns 0/1.

▶ Run on ClickHouse Fiddle

Bisect

  • The reproducer requires enable_analyzer, introduced in v24.8 — so this is not pre-existing: it was introduced no earlier than v24.8, and the introducing change is in the 24.8master range (most likely the change that added the feature). Every release line down to there reproduces.

Affects

  • Fix is on master — this PR's change fixes the bug on master. Branches below still have the bug and need the backport.
  • Reproduces on: 26.5, 26.4, 26.3, 26.2, 26.1
  • Does NOT reproduce on: master
  • Backport needed: 26.5, 26.4, 26.3, 26.2, 26.1

Suggested next step

The fix has landed on master; the affected release branches above still need the backport.

Component: comp-query-analyzer
Severity: P2


ClickGap · Finding: phase_d_pr104882

Triage metadata

  • Primary component: comp-query-analyzer
  • Secondary components: comp-testing
  • Component owners (comp-query-analyzer): @KochetovNicolai @novikd

@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 3, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants