Docs: clarify description of `parallel_distributed_insert_select` by Blargian · Pull Request #104607 · ClickHouse/ClickHouse · GitHub
Skip to content

Docs: clarify description of parallel_distributed_insert_select#104607

Merged
Blargian merged 6 commits into
masterfrom
Blargian-patch-18
May 18, 2026
Merged

Docs: clarify description of parallel_distributed_insert_select#104607
Blargian merged 6 commits into
masterfrom
Blargian-patch-18

Conversation

@Blargian

@Blargian Blargian commented May 11, 2026

Copy link
Copy Markdown
Member

Clarifies an inaccurate docs description. Description was previously updated to say:

Setting `enable_parallel_replicas = 1` is needed when using this setting.

However this is an over-simplification.

Source: https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/ClusterProxy/executeQuery.cpp (isSuitableForInsertSelectWithParallelReplicas function). introduced in #78041.

As I see it:

Source Behavior
Distributed table Each shard executes the query locally. No additional settings required.
s3Cluster, gcsCluster, hdfsCluster Files are distributed across replicas. No additional settings required.
ReplicatedMergeTree / SharedMergeTree Uses parallel replicas to distribute the read. Requires enable_parallel_replicas = 1, allow_experimental_analyzer = 1, and cluster_for_parallel_replicas to be configured. Available from v25.4.

Changelog category (leave one):

  • Documentation (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.773

@clickhouse-gh

clickhouse-gh Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [347af01]

Summary:


AI Review

Summary

This PR updates the description of parallel_distributed_insert_select and adds keywords metadata to one docs file. One blocking documentation-correctness issue remains: the prerequisites for the ReplicatedMergeTree / SharedMergeTree path are still incomplete for OSS.

Findings

❌ Blockers

  • [src/Core/Settings.cpp:1317] The new "To enable it" list omits cluster_for_parallel_replicas as a required prerequisite for OSS. The parallel-replica path still reaches Context::getClusterForParallelReplicas; with the default empty cluster_for_parallel_replicas, it throws CLUSTER_DOESNT_EXIST. This means the current docs can still guide users to a failing configuration.
    Suggested fix: add cluster_for_parallel_replicas as an explicit requirement (or explicitly scope this statement to environments where it is preconfigured).
Final Verdict

Status: ⚠️ Request changes
Minimum required action:

  1. Update the ReplicatedMergeTree / SharedMergeTree prerequisite list so it includes cluster_for_parallel_replicas (or clearly scopes the statement to preconfigured environments).

@clickhouse-gh clickhouse-gh Bot added the pr-documentation Documentation PRs for the specific code PR label May 11, 2026
@Blargian Blargian marked this pull request as ready for review May 11, 2026 16:25
@Blargian Blargian requested review from azat and devcrafter May 11, 2026 16:34
Comment thread src/Core/Settings.cpp Outdated
@azat azat self-assigned this May 13, 2026
Comment thread src/Core/Settings.cpp
Setting `enable_parallel_replicas = 1` is needed when using this setting.
Since v25.4, `INSERT ... SELECT` from a `ReplicatedMergeTree` or `SharedMergeTree` source can also be parallelized across replicas. To enable it:
- `parallel_distributed_insert_select = 2`
- `enable_parallel_replicas = 1`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "To enable it" prerequisites are still incomplete for OSS users. Setting enable_parallel_replicas = 1 is not sufficient by itself here: this path eventually relies on Context::getClusterForParallelReplicas, and with the default empty cluster_for_parallel_replicas it throws CLUSTER_DOESNT_EXIST.

Could we either list cluster_for_parallel_replicas as an explicit requirement, or scope this sentence to environments where it is preconfigured (for example Cloud)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked this on current master-based code and the requirement is still present for the ReplicatedMergeTree / SharedMergeTree case.

isSuitableForInsertSelectWithParallelReplicas still plans the query via InterpreterSelectQueryAnalyzer, and the parallel-replica path still goes through Context::getClusterForParallelReplicas (see src/Interpreters/ClusterProxy/executeQuery.cpp and src/Interpreters/Context.cpp). With empty cluster_for_parallel_replicas, it throws CLUSTER_DOESNT_EXIST.

So the "To enable it" list is still incomplete for OSS unless it also mentions cluster_for_parallel_replicas (or explicitly scopes the statement to environments where it is preconfigured).

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

@clickhouse-gh

clickhouse-gh Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.20% +0.10%
Functions 91.40% 91.40% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 100.00% (8/8) · Uncovered code

Full report · Diff report

@Blargian Blargian added this pull request to the merge queue May 18, 2026
Merged via the queue into master with commit dcb2b5d May 18, 2026
166 checks passed
@Blargian Blargian deleted the Blargian-patch-18 branch May 18, 2026 20:26
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-documentation Documentation PRs for the specific code PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants