Make clickhouse-local temporary storage limit configurable by alexey-milovidov · Pull Request #106689 · ClickHouse/ClickHouse · GitHub
Skip to content

Make clickhouse-local temporary storage limit configurable#106689

Merged
alexey-milovidov merged 5 commits into
masterfrom
fix-local-tmp-storage-limit-101697
Jun 16, 2026
Merged

Make clickhouse-local temporary storage limit configurable#106689
alexey-milovidov merged 5 commits into
masterfrom
fix-local-tmp-storage-limit-101697

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Jun 7, 2026

Copy link
Copy Markdown
Member

Closes: #101697

clickhouse-local hard-coded the temporary storage limit to 1 GiB, which is limiting when spilling to disk, e.g. during an external merge sort with a low max_bytes_before_external_sort.

This change makes clickhouse-local honor the existing max_temporary_data_on_disk_size server setting (the same one the regular server uses). It can be raised, or lifted entirely with a value of 0, via the configuration file (-C config.xml). The default is raised from 1 GiB to 1 TiB, which keeps a safety net against runaway queries filling up the disk while being generous enough for typical local workloads.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

The temporary storage size limit of clickhouse-local is no longer hard-coded to 1 GiB. The default is raised to 1 TiB and can be configured with the max_temporary_data_on_disk_size server setting.

@clickhouse-gh

clickhouse-gh Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added pr-improvement Pull request with some product improvements submodule changed At least one submodule changed in this PR. labels Jun 7, 2026
Comment thread contrib/ipnsort/ipnsort.h Outdated
Comment thread base/base/sort.h Outdated
alexey-milovidov and others added 2 commits June 10, 2026 00:41
`clickhouse-local` hard-coded the temporary storage limit to `1 GiB`,
which is limiting when spilling to disk, e.g. during an external merge
sort with a low `max_bytes_before_external_sort`.

Honor the existing `max_temporary_data_on_disk_size` server setting
instead. Unlike the regular server, `clickhouse-local` stores temporary
data in the system temporary directory by default, which can be small
(and is sometimes backed by RAM), so the conservative `1 GiB` default is
kept when the setting is not specified. It can now be raised, or lifted
entirely with a value of `0`, via the configuration file.

Closes: #101697

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous `1 GiB` default was still fairly limiting for external
operations. Raise it to `1 TiB`, which keeps a safety net against
runaway queries filling up the disk while being generous enough for
typical local workloads. It remains configurable (including unlimited
with `0`) via the `max_temporary_data_on_disk_size` server setting.

Add the `TiB` constant and the `_TiB` literal to `base/base/unit.h`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov force-pushed the fix-local-tmp-storage-limit-101697 branch from e0c9277 to b157a6e Compare June 10, 2026 00:41
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Recreated this branch off the current `master` so it contains only the two commits relevant to this PR (the `clickhouse-local` temporary-storage limit). It had accidentally been based on the still-open #106650 (ipnsort/driftsort), which polluted the diff, reviews, and CI. The two remaining review threads (on `contrib/ipnsort/ipnsort.h` and `base/base/sort.h`) belong to #106650 and no longer apply here; resolving them.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

The only failing check is AST fuzzer (amd_tsan): Logical error: Column identifier A is already registered (STID 4697-4326) in the analyzer's CollectSourceColumnsVisitor during a mutation with an IN subquery — unrelated to this PR, which only changes the clickhouse-local temporary storage limit.

This is a known issue: #106649. @groeneai already has a fix in progress for exactly this STID: #106025 (see also #106414).

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106689&sha=b157a6e4a79a9b8df99f92afaf62f47be529d46a&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29

@groeneai

Copy link
Copy Markdown
Contributor

Confirmed. The AST fuzzer (amd_tsan) red on this PR (Column identifier A is already registered, STID 4697-4326) is unrelated to the clickhouse-local temporary storage change. It is the chronic trunk analyzer bug in the mutation re-analysis path (MutationsInterpreter::validate -> buildSubqueryPlansForSetsAndAdd -> CollectSourceColumnsVisitor), tracked in #106649.

Fix status: our #106025 (you approved it; currently held by KochetovNicolai's "temporarily block" pending his review) and the alternative #106414 in the mutations path. Either one closes this STID. This PR is unaffected and safe to merge independent of that fix.

session: cron:clickhouse-worker-slot-3:20260612-001500

Comment thread programs/local/LocalServer.cpp Outdated

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@alexey-milovidov alexey-milovidov self-assigned this Jun 13, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, fix this failure:
Fast test (arm_darwin) FAIL
02947_parallel_replicas_remote

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov The 02947_parallel_replicas_remote failure is a pre-existing macOS-only issue, unrelated to this PR.

The test does remote('127.0.0.1|127.0.0.2|127.0.0.3|127.0.0.4', ...) with prefer_localhost_replica = 0. macOS binds only 127.0.0.1 to loopback and does not auto-route the rest of 127.0.0.0/8 (Linux does), so connecting to 127.0.0.3/127.0.0.4 times out with SOCKET_TIMEOUT. The query result is still correct (100 4950 on stdout via failover to 127.0.0.1), but the connection warnings hit stderr and clickhouse-test fails on any stderr.

Same limitation already handled for sibling 02875_parallel_replicas_remote in ci/defs/darwin.skip. Fix in a separate PR: #107420

alexey-milovidov and others added 2 commits June 14, 2026 12:09
The "Apply suggestion" commit removed `as a safety net against` from the
end of the first comment line but left the trailing `runaway queries
filling up the disk` on the next line, producing a broken sentence.
Restore a connecting phrase so the comment reads correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.80% 84.80% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.30% 77.40% +0.10%

Changed lines: Changed C/C++ lines covered by tests: 10/10 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, two more tests have failed on Mac OS - fix them.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 16, 2026
Merged via the queue into master with commit 47f68be Jun 16, 2026
327 of 328 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-local-tmp-storage-limit-101697 branch June 16, 2026 15:18
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

local server mode hard-codes temporary storage limit to 1gb

3 participants