iframe-proxy

nihalzp · 2026-03-16T10:56:37Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

New GROUP BY optimization for high cardinality keys that distributes rows across threads by hashing the grouping key, so each thread aggregates a disjoint subset of keys without a merge phase. Set optimize_aggregation_by_sharding = 1 to enable it.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-03-16T10:57:20Z

Workflow [PR], commit [dfb13f4]

Summary: ❌

job_name	test_name	status	info
Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel)		FAIL
	03928_sharded_aggregation_negative_explain	ERROR	cidb
	Server died	FAIL	cidb, issue
	00116_storage_set	NOT_FAILED	cidb
	01583_const_column_in_set_index	NOT_FAILED	cidb
	01739_index_hint	NOT_FAILED	cidb
	02751_query_log_test_partitions	NOT_FAILED	cidb
	03038_recursive_cte_postgres_4	NOT_FAILED	cidb
	03032_redundant_equals	NOT_FAILED	cidb
	02154_default_keyword_insert	NOT_FAILED	cidb
	03325_sqlite_join_wrong_answer	NOT_FAILED	cidb
	15 more test cases not shown
Unit tests (msan, function_prop_fuzzer)		FAIL
Performance Comparison (amd_release, master_head, 1/6)		FAIL
	aggregate_functions_of_group_by_keys #0::old	FAIL	query history
	aggregate_functions_of_group_by_keys #0::new	FAIL	query history
	aggregation_by_partitions #2::old	FAIL	query history
	aggregation_by_partitions #2::new	FAIL	query history
	aggregation_by_partitions #6::old	FAIL	query history
	aggregation_by_partitions #6::new	FAIL	query history
	aggregation_by_partitions #8::old	FAIL	query history
	aggregation_by_partitions #8::new	FAIL	query history
	fixed_hash_table_parallel_merge #1::old	FAIL	query history
	fixed_hash_table_parallel_merge #1::new	FAIL	query history
	24 more test cases not shown
Performance Comparison (amd_release, master_head, 2/6)		FAIL
	agg_functions_min_max_any #14::old	FAIL	query history
	agg_functions_min_max_any #14::new	FAIL	query history
	cpu_synthetic #22::old	FAIL	query history
	cpu_synthetic #22::new	FAIL	query history
	group_by_multiple_strings #26::old	FAIL	query history
	group_by_multiple_strings #26::new	FAIL	query history
	if_transform_strings_to_enum #0::old	FAIL	query history
	if_transform_strings_to_enum #0::new	FAIL	query history
	if_transform_strings_to_enum #2::old	FAIL	query history
	if_transform_strings_to_enum #2::new	FAIL	query history
	16 more test cases not shown
Performance Comparison (amd_release, master_head, 3/6)		FAIL
	agg_functions_argmin_argmax #1::old	FAIL	query history
	agg_functions_argmin_argmax #1::new	FAIL	query history
	agg_functions_argmin_argmax #4::old	FAIL	query history
	agg_functions_argmin_argmax #4::new	FAIL	query history
	aggregate_functions_deserialization #1::old	FAIL	query history
	aggregate_functions_deserialization #1::new	FAIL	query history
	aggregate_functions_deserialization #3::old	FAIL	query history
	aggregate_functions_deserialization #3::new	FAIL	query history
	has_all #7::old	FAIL	query history
	has_all #7::new	FAIL	query history
	10 more test cases not shown
Performance Comparison (amd_release, master_head, 4/6)		FAIL
	file_table_function #9::old	FAIL	query history
	file_table_function #9::new	FAIL	query history
	file_table_function #12::old	FAIL	query history
	file_table_function #12::new	FAIL	query history
	file_table_function #18::old	FAIL	query history
	file_table_function #18::new	FAIL	query history
	group_by_sundy_li #0::old	FAIL	query history
	group_by_sundy_li #0::new	FAIL	query history
	group_by_sundy_li #1::old	FAIL	query history
	group_by_sundy_li #1::new	FAIL	query history
	34 more test cases not shown
Performance Comparison (amd_release, master_head, 5/6)		FAIL
	clickbench #8::old	FAIL	query history
	clickbench #8::new	FAIL	query history
	clickbench #9::old	FAIL	query history
	clickbench #9::new	FAIL	query history
	clickbench #14::old	FAIL	query history
	clickbench #14::new	FAIL	query history
	clickbench #15::old	FAIL	query history
	clickbench #15::new	FAIL	query history
	clickbench #16::old	FAIL	query history
	clickbench #16::new	FAIL	query history
	66 more test cases not shown
Performance Comparison (amd_release, master_head, 6/6)		FAIL
	formats_columns_nullable #24::old	FAIL	query history
	formats_columns_nullable #24::new	FAIL	query history
	group_by_consecutive_keys #0::old	FAIL	query history
	group_by_consecutive_keys #0::new	FAIL	query history
	group_by_consecutive_keys #1::old	FAIL	query history
	group_by_consecutive_keys #1::new	FAIL	query history
	groupby_onekey_nullable #0::old	FAIL	query history
	groupby_onekey_nullable #0::new	FAIL	query history
	groupby_onekey_nullable #1::old	FAIL	query history
	groupby_onekey_nullable #1::new	FAIL	query history
	30 more test cases not shown
Performance Comparison (arm_release, master_head, 1/6)		FAIL
	ColumnMap #1::old	FAIL	query history
	ColumnMap #1::new	FAIL	query history
	concat_hits #0::old	FAIL	query history
	concat_hits #0::new	FAIL	query history
	conditional #5::old	FAIL	query history
	conditional #5::new	FAIL	query history
	destroy_aggregate_states #0::old	FAIL	query history
	destroy_aggregate_states #0::new	FAIL	query history
	file_table_function #23::old	FAIL	query history
	file_table_function #23::new	FAIL	query history
	24 more test cases not shown
Performance Comparison (arm_release, master_head, 2/6)		FAIL
	clickbench #8::old	FAIL	query history
	clickbench #8::new	FAIL	query history
	clickbench #9::old	FAIL	query history
	clickbench #9::new	FAIL	query history
	clickbench #14::old	FAIL	query history
	clickbench #14::new	FAIL	query history
	clickbench #17::old	FAIL	query history
	clickbench #17::new	FAIL	query history
	clickbench #27::old	FAIL	query history
	clickbench #27::new	FAIL	query history
	60 more test cases not shown

AI Review

Summary

This PR adds sharded GROUP BY execution by scattering rows across shard-local aggregators and skipping the merge phase. The implementation is substantial and well-tested, but there are rollout/docs concerns that should be addressed before merge.

PR Metadata

Changelog category: ✅ Correct (Performance Improvement matches the change).
Changelog entry: ✅ Present and user-readable.
Documentation requirement: ⚠️ Not satisfied yet (Documentation is written is still unchecked for a new user-facing feature/setting).

Findings

⚠️ Majors
- [src/Core/Settings.cpp:3945] New user-facing setting optimize_aggregation_by_sharding is introduced, but PR metadata still indicates docs are not written. Please add docs for enablement guidance, trade-offs, and current limitations, then mark the template checkbox.
💡 Nits
- [src/Core/Settings.cpp:3945] Rollout risk: enabling this optimization by default in the same PR increases blast radius for a new execution strategy; consider shipping it behind an experimental gate or disabled by default first.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	⚠️	New execution strategy is enabled by default in the same PR.
No magic constants	✅
Backward compatibility	⚠️	New default behavior can change query execution characteristics immediately.
`SettingsChangesHistory.cpp`	✅
PR metadata quality	⚠️	Documentation checkbox for a new feature is still unchecked.
Safe rollout	⚠️	Safer staged rollout (experimental/disabled-by-default first) is preferable.
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
1. Add user documentation for optimize_aggregation_by_sharding and update the PR template checkbox.
2. Revisit rollout strategy (experimental gate or disabled-by-default first) for safer production adoption.

…tistics`

clickhouse-gh · 2026-04-15T17:06:05Z

+        for (size_t i = 0; i < num_rows; ++i)
+        {
+            size_t row = row_indices[i];
+            size_t current_offset = offsets[static_cast<ssize_t>(row) - 1];


❌ addBatchArrayForRows reads offsets[row - 1] without handling row == 0.

row_indices can include the first row of the chunk, so for row = 0 this becomes offsets[-1], which is an out-of-bounds read (undefined behavior).

Please guard the first-row case explicitly, e.g.:

size_t current_offset = (row == 0) ? 0 : offsets[row - 1];

clickhouse-gh · 2026-04-16T01:20:31Z

+            throw Exception(
+                ErrorCodes::LOGICAL_ERROR,
+                "Sharded aggregation requires prealloc serialized methods, but row_sizes is empty. "
+                "Sharded aggregation should be avoided for non-prealloc methods should be in AggregatingStep");


💡 The exception text has a grammar typo and reads awkwardly:

Sharded aggregation should be avoided for non-prealloc methods should be in AggregatingStep

Please reword it to something clearer, for example:

Sharded aggregation should be avoided for non-prealloc methods in AggregatingStep.

nickitat · 2026-05-05T20:44:16Z

Our MVP grew into a MegaVP. Let's break it into a few steps that will be easier to review and discuss in detail. Here is the plan we came up with:

● What gets removed if scatter materializes contiguous per-shard columns

  ┌──────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────┐
  │                                            Component                                             │  LoC saved   │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ *ForRows virtuals + default impls in IAggregateFunction.h                                        │ ~78          │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ 8 function overrides (Sum, Avg, MinMax, Any, Bitwise, ArgMinMax, Quantile, -If)                  │ ~286         │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ findExtreme*ForRows (h+cpp)                                                                      │ ~55          │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ SingleValueData*ForRows (h+cpp)                                                                  │ ~164         │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ Aggregator::executeForRows + executeImpl*ForRows family                                          │ ~200         │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ ShardedChunkInfo::row_indices + selector-passing through processors                              │ ~10          │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ ScatterByHashTransform simplification (no shared-payload trick, no per-row Selector bookkeeping) │ ~30-50       │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ ShardedAggregatingTransform simplification (gets normal Chunks, can call executeOnBlock)         │ ~30-50       │
  ├──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
  │ Subtotal                                                                                         │ ~850-900 LoC │
  └──────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┘

  If you also drop the precomputed-hash and pre-serialized-keys plumbing (separate optimizations, can be deferred):

  ┌────────────────────────────────────────────────────────────────────────────────────────────┬───────────┐
  │                                         Additional                                         │ LoC saved │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ StringHashTable.h dispatchImpl<Tag> refactor + PrecomputedHashTag                          │ ~210      │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ TwoLevelStringHashTable.h matching plumbing                                                │ ~7        │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ ColumnsHashingImpl.h emplaceKeyWithHash + LastElementCache::saved_hash                     │ ~114      │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ Aggregator.cpp consecutive-keys-with-hash + equal-keys range opt for sharding path         │ ~80       │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ SerializedKeyBuffer.h + HashMethodSerialized::external_serialized_keys hook                │ ~55       │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ Aggregator::prepareHashesAndKeysForSharding and serialization in prepareColumnsForSharding │ ~50       │
  ├────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
  │ Additional subtotal                                                                        │ ~516 LoC  │
  └────────────────────────────────────────────────────────────────────────────────────────────┴───────────┘

  Total removable: ~1350-1400 LoC out of ~2350 src LoC (≈ 60% of src). With tests untouched (~1444 LoC), the patch goes from ~3790 → ~2400 total, and src code from ~2350 → ~950-1000.

  What's left in the MVP

  ┌───────────────────────────────────────────────────────────────────────────────┬──────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                   Component                                   │   LoC    │                                                                           What it is                                                                           │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ ScatterByHashTransform.{h,cpp} (simplified, uses IColumn::scatter)            │ ~150     │ The 1→N processor: hash each row, partition with Fibonacci mixing, call IColumn::scatter on each column. Per-shard FIFO queues for back-pressure still needed. │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ ShardedAggregatingTransform.{h,cpp} (thin wrapper)                            │ ~80-100  │ One per shard. Receives normal Chunks, calls Aggregator::executeOnBlock (existing), emits aggregated chunks.                                                   │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ AggregatingStep.cpp integration                                               │ ~150     │ Eligibility checks, the Scatter → Resize(N→1) → ShardedAgg × N wiring.                                                                                         │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Aggregator::chooseAggregationMethod static refactor + getShardedPayloadHeader │ ~25      │ Planner needs to probe method type to reject incompatible keys.                                                                                                │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ HashTablesStatistics.h ShardedStatsCollector + ShardedAggregationEntry        │ ~90      │ Per-shard prealloc hints survive the simplification — they work on key counts, not column layout.                                                              │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Settings.cpp + serialization plumbing                                         │ ~25      │ The optimize_aggregation_by_sharding setting end-to-end.                                                                                                       │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Planner.cpp / PlannerCorrelatedSubqueries.cpp / InterpreterSelectQuery.cpp    │ ~12      │ Pass setting into step.                                                                                                                                        │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 8 test suites                                                                 │ ~1444    │ Unchanged — same correctness + EXPLAIN PIPELINE + negative-case + Distributed coverage.                                                                        │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ MVP src                                                                       │ ~530-550 │                                                                                                                                                                │
  ├───────────────────────────────────────────────────────────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ MVP total                                                                     │ ~2000    │                                                                                                                                                                │
  └───────────────────────────────────────────────────────────────────────────────┴──────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  Is this still substantial enough to be a meaningful MVP?

  Yes — and arguably it's a better MVP. Here's why:

  The architectural payoff is fully preserved. The whole point of sharded aggregation is "no merge phase" — each shard owns a disjoint key range. That's still there. The *ForRows machinery, the precomputed hashes, the pre-serialized keys are all
  micro-optimizations on top of that core idea. Stripping them doesn't reduce the qualitative win; it reduces the perf ceiling.

  What you're trading away (perf-wise):

  ┌──────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │             Optimization removed             │                                                             Expected perf cost on shard hot path                                                             │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Precomputed-hash reuse during emplace        │ ~5-15% throughput loss (hash recomputed twice instead of once)                                                                                               │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ IColumn::scatter materialization cost        │ One extra column copy per chunk per shard. For numeric columns: cheap memcpy. For strings: O(total_bytes) gather.                                            │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ *ForRows SIMD-aware MinMax                   │ Falls back to existing range-based vectorized findExtreme (which is fast — actually possibly faster than the scalar findExtreme*ForRows added in the branch) │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ *ForRows ArgMinMax single-pass index search  │ Bigger loss — falls back to N×add calls with potential repeated arena allocations                                                                            │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ *ForRows Any/AnyLast O(1) shortcut           │ Bigger loss — O(N) in fallback                                                                                                                               │
  ├──────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Pre-serialized keys for HashMethodSerialized │ Re-serializes during emplace. For long string keys, this is real (~10-20%).                                                                                  │
  └──────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  So expect roughly 10-25% throughput regression on perf benchmarks vs. the full branch, depending on the query — but still a substantial speedup vs. master (which has full merge-phase overhead). The point of MVP is to ship the architecture; you reclaim those %
  via follow-up PRs.

  What you gain (review-wise):

  - The reviewer reads ~550 LoC of new src code instead of ~2350. Two days of review becomes two hours.
  - The StringHashTable.h dispatch refactor — the trickiest piece — is not in the MVP. It can land as its own focused PR with a unit test.
  - Each follow-up perf PR (ForRows overrides, precomputed hashes, pre-serialized keys) becomes individually reviewable: "here's the change, here's the perf number, OK or not OK". Decoupled benchmarks.
  - Bisect bandwidth: if a regression shows up after ship, you can identify which optimization caused it. With the current branch, anything wrong with sharding requires reverting the whole thing.

  What stays substantial:

  - A new query-plan path with non-trivial pipeline construction (Scatter → Resize → per-shard agg).
  - Two new processors (ScatterByHashTransform, ShardedAggregatingTransform) including a non-trivial back-pressure scheme.
  - A new statistics infrastructure for per-shard preallocation.
  - Eligibility logic that interacts with all the existing aggregation paths (in-order, two-level, external, grouping sets, overflow guards).
  - Eight test suites covering correctness + EXPLAIN + Distributed + edge cases.
  - A user-facing setting with serialization through query-plan IO.

  That's a complete, defensible feature. Not a "draft" or "stub". Ships real value, leaves a clean follow-up plan.

  Recommendation

  I'd go further than my earlier 3-stage proposal:

  PR-1   Sharded aggregation MVP via IColumn::scatter, default OFF    ~550 src + 1444 tests
  PR-2a  ForRows API in IAggregateFunction (just the framework)        ~78
  PR-2b  ForRows fast paths for must-keep (MinMax/Any/ArgMinMax)       ~250 (with perf #s)
  PR-2c  ForRows fast paths for nice-to-have (Sum/Avg/Bitwise)         ~100 (with perf #s)
  PR-3   Precomputed-hash plumbing + StringHashTable refactor          ~330 (with unit test + perf #s)
  PR-4   Pre-serialized keys                                            ~85  (with perf #s)
  PR-5   Per-shard prealloc stats integration                           ~90  (already in MVP, can be split)
  PR-6   Flip default to ON                                             ~1   (with perf-test PR link)

  Six small, focused PRs over one ~600-LoC monolith. Each one reviewable in <1 hour with a clear "what does this change buy us?" answer. The MVP carries the architectural complexity; each follow-up carries one optimization with a benchmark number attached.

  This is honestly how I'd ship a feature this size if I wanted it merged quickly with high reviewer confidence.

I'd perhaps limit ourselves to 4 or even 3 stages instead of 6, but it is up to you. My only big ask is to move iteratively.

clickhouse-gh · 2026-05-06T10:28:54Z


 - [GROUP BY optimization](/sql-reference/statements/select/group-by#group-by-optimization-depending-on-table-sorting-key)
+)", 0) \
+    DECLARE(Bool, optimize_aggregation_by_sharding, true, R"(


⚠️ This introduces a new user-facing setting (optimize_aggregation_by_sharding) with new behavior, but the PR still has the "Documentation is written" checkbox unchecked.

Please add user docs for this setting (when to enable it, limitations such as lack of external aggregation spill support, and expected trade-offs), then mark the template checkbox accordingly.

clickhouse-gh · 2026-05-06T13:24:28Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.10%	84.00%	-0.10%
Functions	91.10%	90.70%	-0.40%
Branches	76.60%	76.40%	-0.20%

Changed lines: 80.38% (762/948) | lost baseline coverage: 15 line(s) · Uncovered code

Full report · Diff report

clickhouse-gh · 2026-06-05T14:15:48Z

nihalzp added 27 commits March 15, 2026 15:50

Add test

b66f085

Add negative tests

a7b59aa

Add more negative tests

376f8f7

Add cases for not supported yet cases

7ab1d30

Remove unnecessary long tag

5b96302

Add test for WITH TOTALS

a7f3998

Update tests

9b3e4be

Update AggregatingStep for sharded aggregation

eaac98a

Add ScatterByHashTransform

1e741c2

Integrate sharded aggregation into planner

c7af22d

Add optimize_aggregation_by_sharding setting

a190ccd

Temporarily enable optimize_aggregation_by_sharding by default

9c096e4

Improve setting description

fd29a5e

Improve documentation

5517f20

Add ShardedAggregatingTransform

d8816c2

Fix getHash not respecting Nullable wrapper

99f4a74

Add test for the bug fix

302a927

Implement hash computation for sharding

07adf3c

Prepare columns for sharding

07fd79d

Increase test coverage

193b23a

Prepare sharded header

1bee19d

Prepare instruction for sharding aggregation

f1b53ff

Add test for prefetching path

b8d2a93

Implement sharded aggregation

24f6059

Implement generic addBatchForRows

493fb53

Add with hash emplace method

b1f9ea7

Implement hash table emplace using known hash

661c3fe

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 16, 2026

clickhouse-gh Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread src/Core/Settings.cpp

nihalzp added 15 commits April 11, 2026 12:05

Fix nan handling for ForRows

3810c6f

Add test for the fix

dd56846

Make test deterministic

24c3229

Temporarily disable sharded aggregation for some tests

155d631

Only support numeric like and string like columns

75031ee

Add test for not supported case

6dbb4e9

Add quantile addBatchForRows override

a990188

Merge branch 'master' into sharded-aggregation

2a2a608

Remove flakiness of 00178_quantile_ddsketch

6e08024

Remove flakiness from 03307_parallel_hash_max_joined_rows

e75baa9

Make sure ShardedAggregationEntry is reflect in `HashTablesCacheSta…

0340fd4

…tistics`

Use a better way to skip non-prealloc serialized methods

847a0fe

Add missing case to the test

451a78d

Fix typo

8c7f686

Fix typo

dabbe7a

clickhouse-gh Bot reviewed Apr 15, 2026

View reviewed changes

groeneai mentioned this pull request Apr 15, 2026

Replace assert with chassert in AggregateFunctionArray constructor #102856

Merged

1 task

Only run a query for new analyzer

d1cf2c3

clickhouse-gh Bot reviewed Apr 16, 2026

View reviewed changes

nickitat self-assigned this May 5, 2026

nihalzp added 3 commits May 6, 2026 09:17

Merge branch 'master' into sharded-aggregation

fadd37f

Fix build

1fc508a

Move setting to 26.5

dfb13f4

clickhouse-gh Bot reviewed May 6, 2026

View reviewed changes

nihalzp mentioned this pull request May 6, 2026

Naive Sharded Aggregation for high cardinality data #104233

Merged

1 task

clickhouse-gh Bot unassigned nickitat Jun 5, 2026

nickitat self-assigned this Jun 5, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sharded Aggregation for high cardinality data#99581

Sharded Aggregation for high cardinality data#99581
nihalzp wants to merge 108 commits into
ClickHouse:masterfrom
nihalzp:sharded-aggregation

nihalzp commented Mar 16, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

clickhouse-gh Bot Apr 15, 2026

Uh oh!

clickhouse-gh Bot Apr 16, 2026

Uh oh!

nickitat commented May 5, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot May 6, 2026

Uh oh!

clickhouse-gh Bot commented May 6, 2026

Uh oh!

clickhouse-gh Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

nihalzp commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

PR Metadata

Findings

ClickHouse Rules

Final Verdict

Uh oh!

Uh oh!

clickhouse-gh Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

nickitat commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickhouse-gh Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented May 6, 2026

LLVM Coverage Report

Uh oh!

clickhouse-gh Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nihalzp commented Mar 16, 2026 •

edited

Loading

clickhouse-gh Bot commented Mar 16, 2026 •

edited

Loading

nickitat commented May 5, 2026 •

edited

Loading