iframe-proxy

alexey-milovidov · 2026-05-14T16:04:57Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Make the filesystem table function honor max_memory_usage / max_server_memory_usage when loading file content. Previously, large or numerous parallel content reads could push past the limit without raising MEMORY_LIMIT_EXCEEDED and end up OOM-killed instead.

Motivation

Running clickhouse-local in a cgroup with a small memory limit (e.g. 8 GiB) and issuing SELECT * FROM filesystem() against a directory of large files reliably ends in an OOM-kill rather than a clean MEMORY_LIMIT_EXCEEDED. The hard limit is correctly detected at startup (getMemoryAmount reads cgroup v2 memory.max and the limit is installed on total_memory_tracker), but the path that loads file content does not enforce it.

StorageFilesystem was staging each file's bytes through a temporary std::string before moving them into the output column:

String content;
ReadBufferFromFile in(file.path().string());
readStringUntilEOF(content, in);
columns_map[\"content\"]->insert(std::move(content));

std::string allocations go through the global operator new, which is intercepted in src/Common/AllocationInterceptors.cpp via Memory::trackMemory -> CurrentMemoryTracker::allocNoThrow -> allocImpl(size, throw_if_memory_exceeded=false). When the limit is exceeded on that path, MemoryTracker::allocImpl takes the no-throw branch (incrementAllocationWithoutCheck) instead of raising MEMORY_LIMIT_EXCEEDED. The amount is counted, but no exception fires - the process keeps allocating until the kernel / cgroup OOM-killer steps in. With num_streams parallel FilesystemSources each growing their own multi-GiB std::string, an 8 GiB limit is easy to blow past.

In contrast, Allocator-backed buffers (PODArray, ColumnString::chars, ...) go through CurrentMemoryTracker::alloc, which does throw on the limit.

Change

Read file content directly into the destination ColumnString's chars buffer:

auto & nullable = assert_cast<ColumnNullable &>(*columns_map[\"content\"]);
auto & col_str = assert_cast<ColumnString &>(nullable.getNestedColumn());

ReadBufferFromFile in(file.path().string());
readStringUntilEOFInto(col_str.getChars(), in);
col_str.getOffsets().push_back(col_str.getChars().size());
nullable.getNullMapData().push_back(UInt8{0});

readStringUntilEOFInto is already specialized for PaddedPODArray<UInt8>, so the bytes land in ColumnString::Chars, which uses Allocator and trips MEMORY_LIMIT_EXCEEDED on the limit. The schema (Nullable(String) for content) is unchanged.

Test plan

New test 04238_filesystem_content_memory_limit.sh writes a 32 MiB file under user_files and asserts that SELECT length(content) FROM filesystem('...') with --max_memory_usage=1MiB produces MEMORY_LIMIT_EXCEEDED, then re-runs without the limit and checks length(content) = 33554432.
Manual verification with build/programs/clickhouse local:
- With --max_memory_usage=1MiB -> Code: 241. MEMORY_LIMIT_EXCEEDED (clean throw, no kill).
- Without the limit -> reads back the right bytes; cityHash64(content) matches the hash produced by file(..., RawBLOB, 'content String').
- Existing 04041 scenarios (5-6 byte files, content IS NULL for directories) still produce the expected output.

Documentation entry for user-facing changes

I have considered changes to the docs - no behavioral change for users beyond the new error class on overflow.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

The content-reading loop in `StorageFilesystem` staged each file's bytes through a temporary `std::string` before moving them into the output column. Allocations made by `std::string` go through the global `operator new`, which is intercepted via `CurrentMemoryTracker::allocNoThrow`: the amount is counted but the hard limit is *never* enforced. With several `filesystem()` streams reading large files in parallel, the per-thread buffers could grow well past `max_server_memory_usage` (or `max_memory_usage`) and the cgroup OOM-killer fired before any check could raise `MEMORY_LIMIT_EXCEEDED`. Read the file directly into the `ColumnString::chars` `PaddedPODArray`, which goes through `CurrentMemoryTracker::alloc` and throws on the hard limit. Push one offset and one null-map byte after each file, so the column ends up in the same state as before. The `else` branch and the existing schema (`Nullable(String)`) are unchanged. Adds 04238_filesystem_content_memory_limit which writes a 32 MiB file and asserts that selecting its content with `--max_memory_usage=1MiB` raises `MEMORY_LIMIT_EXCEEDED` rather than OOM-killing the process. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-05-14T16:05:57Z

The test asserted `grep -c -F MEMORY_LIMIT_EXCEEDED` returned exactly `1`, but `shell_config.sh` passes `--send_logs_level=warning`, so a server-side warning line that mirrors the exception code can appear alongside the client's own error message. The Fast test report observed `2` matches: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104956&sha=69d278e3024ea27db74ca0afc6e43ebf1edd6758&name_0=PR&name_1=Fast%20test The behavior we're asserting on is presence, not multiplicity, so switch to `grep -F -q ... && echo 1 || echo 0`. The reference file is unchanged. Original PR: #104956

The reviewer (@clickhouse-gh) pointed out that the single-file `max_memory_usage=1 MiB` scenario could still raise `MEMORY_LIMIT_EXCEEDED` on master, because the final `ColumnString::Chars` allocation went through the throwing `Allocator` path regardless of the intermediate `std::string`. So both pre-fix and post-fix code peaked at ~97 MiB and the test passed either way -- Bugfix validation correctly flagged it as not reproducing the bug. Switch to a three-file scenario sized so that: * the column's chars buffer crosses a power-of-two capacity boundary while reading the third file (128 MiB -> 256 MiB), and * with `max_threads=1` the work is pinned to a single stream so the peaks are deterministic. Empirical peaks on aarch64: * post-fix (stream directly into chars): ~385 MiB * pre-fix (stage through `std::string`): ~449 MiB A 440 MB (~419 MiB) limit lies between them: the pre-fix path raises `MEMORY_LIMIT_EXCEEDED` (now caught by `grep` and reported as `1`), while the post-fix path completes (reported as `0`, then the sanity query prints the total size `201326592`). Bugfix validation will see the test pass on this branch and fail on master, which is what the check expects. Addresses review on PR #104956

…ontent-into-column

`clickhouse_fuzzer` runs the full `clickhouse-local` binary (see `programs/local/fuzzers/clickhouse_fuzzer.cpp`) which spins up a complete clickhouse-server-grade runtime: JeMalloc arenas, OpenSSL initializer, static initializers, and full query execution. Baseline RSS is close to the libFuzzer default `rss_limit_mb=2048`, so fuzzed inputs that trigger even moderate query-time allocations push the process past the limit. CIDB shows 7 unrelated PRs in 7 days all hitting OOM at 2049-2056 Mb — just barely over the 2048 Mb default. Failure mode is uniform: libFuzzer out-of-memory (used: 2049-2056Mb; limit: 2048Mb) oom-<hash> Bumping to 4096 Mb follows existing precedent for fuzzers with similar workloads: - `execute_query_fuzzer.options`: rss_limit_mb = 4096 (also runs query execution; same author bumped this in commit bafb7d7) - `data_type_deserialization_fuzzer.options`: rss_limit_mb = 4096 - `create_parser_fuzzer.options`: rss_limit_mb = 6144 The limit still bounds the target (it does not disable the check via `rss_limit_mb=0`), so genuine memory regressions above 4 GB will still be caught. CIDB evidence (last 30 days, all 10 hits — 0 on master because libFuzzer does not run on master regularly): PR ClickHouse#99740 (4 hits, Apr 19-21) PR ClickHouse#104231, ClickHouse#104849, ClickHouse#104956, ClickHouse#96844, ClickHouse#104510, ClickHouse#104492 (1 hit each, May 13-15)

…ontent-into-column

alexey-milovidov · 2026-05-16T21:21:32Z

@groeneai, the CI failures on this PR are all unrelated to the changes (which only touch StorageFilesystem.cpp and one stateless test):

Unit tests (msan, function_prop_fuzzer) / FunctionsStress.stress + AllTests: known recurring failure on master (FunctionsStress.stress #104877, AllTests #105040), substringUTF8 non-determinism with negative offsets. ~67–106 failures/day on master per CIDB.
BuzzHouse (amd_debug) Segmentation fault (STID: 2779-2d69) in RemoteQueryExecutor: flaky CI, also seen on unrelated PR Enable query condition cache for ORDER BY ... LIMIT n queries #104478. No filesystem-table-function code path in the stack trace.

Master has been merged into the branch (commit ed96e0e). Could you retry the failing jobs, or investigate these as separate issues if they persist? Reports:

groeneai · 2026-05-16T21:47:24Z

@alexey-milovidov confirmed — both failures are chronic and unrelated to this PR. CIDB evidence below.

1. `FunctionsStress.stress` / `AllTests` (substringUTF8 non-determinism)

Chronic master + PR failure family tracked in #104877 (umbrella) and #105040 (specific shape). Last 6 days of master+PR hits:

Day	Master	PR
2026-05-16	5	62
2026-05-15	17	89
2026-05-14	4	74
2026-05-13	7	69
2026-05-12	1	14
2026-05-11	6	78

That matches your "~67–106/day on master" attribution — chronic master pattern, not introduced by this PR. Fix workstreams in flight:

PR Fix more non-deterministic function results found by the functions stress test #104858 (@Algunenano, open) — stringVectorConstantFixedString in FunctionsComparison.h covers the isDistinctFrom/isNotDistinctFrom/if/nullIf String↔FixedString shapes
PR Fix transform default-branch conversion bug (FixedString / Enum) #104804 (mine, open) — wrapper-layer castColumn fix in src/Functions/transform.cpp for the caseWithExpression/transform family

The substringUTF8 negative-offset shape is part of the umbrella; nothing actionable on this PR. StorageFilesystem.cpp is untouched by either fix path.

2. `BuzzHouse (amd_debug)` Segfault (STID 2779-2d69)

Confirmed chronic across unrelated PRs over 17 days. 3 distinct PRs hit in the last 30 days, zero master hits:

PR	Hit time	Build	Scope (sanity check)
#104956	2026-05-15 16:47:54	`BuzzHouse (amd_debug)`	`StorageFilesystem.cpp` (this PR)
#104478	2026-05-10 10:52:31	`BuzzHouse (amd_msan)`	(unrelated, as you flagged)
#101428	2026-04-29 14:58:55	`BuzzHouse (arm_asan_ubsan)`	(unrelated)

Three different scopes, three different sanitizer builds, and the crash is in RemoteQueryExecutor.cpp:538 (null IConnections in the distributed path) — nothing in any of those PRs touches the remote-query stack. We have a P0 trunk task tracking this STID separately (2026-04-29-ci-buzzhouse-ubsan-stid-2779-2d69).

Master merge (`ed96e0e`)

Worth noting that the master merge does not clear these — they're chronic on master itself for #104877, and STID 2779-2d69 is a rare distributed-path nullptr that can happen on any BuzzHouse run regardless of base.

Retrying the failing jobs as you suggested is the right call here; both are flaky CI failures, not regressions from this PR.

alexey-milovidov · 2026-05-17T18:42:56Z

This was fixed by #105146. Let's update the branch.

alexey-milovidov · 2026-05-18T05:32:59Z

@groeneai, what is with 01710_projection_additional_filters?

groeneai · 2026-05-18T05:59:28Z

01710_projection_additional_filters is a chronic master flake — unrelated to this PR's filesystem() changes.

Root cause: a trivial count() optimization interaction with additional_table_filters under parallel_replicas + analyzer. With parallel replicas each replica independently runs the optimized count and the coordinator sums the partial results, so the test sees count() = 13 instead of 3 on Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel). Surfaced after #102921 was merged 2026-05-04.

Tracking issue: #104219
Fix PR in flight: #104296 by @azat — "Fix trivial count optimization with additional_table_filters with parallel replicas and analyzer" — OPEN, awaiting review. Once that lands, this should drop off the PR CI noise.

On this PR the failure is just the chronic surfacing; it doesn't block your filesystem() memory-limit work.

…ontent-into-column

clickhouse-gh · 2026-05-18T20:23:48Z

LLVM Coverage Report

Changed lines: 100.00% (8/8) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

alexey-milovidov

Good.

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 14, 2026

alexey-milovidov changed the title ~~Make filesystem() honor the memory limit when loading file content~~ Make filesystem() honor the memory limit when loading file content May 14, 2026

alexey-milovidov added the memory When memory usage is higher than expected label May 14, 2026

clickhouse-gh Bot reviewed May 14, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04238_filesystem_content_memory_limit.sh

alexey-milovidov added 2 commits May 15, 2026 07:27

Merge remote-tracking branch 'origin/master' into stream-filesystem-c…

7423a7b

…ontent-into-column

groeneai mentioned this pull request May 16, 2026

CI: bump rss_limit_mb to 4096 for clickhouse_fuzzer to stop chronic OOMs at 2048 Mb default #105134

Merged

1 task

Merge remote-tracking branch 'origin/master' into stream-filesystem-c…

ed96e0e

…ontent-into-column

alexey-milovidov mentioned this pull request May 17, 2026

Stop the bleeding in function_prop_fuzzer #105146

Merged

1 task

Merge branch 'master' into stream-filesystem-content-into-column

23d55ba

Merge remote-tracking branch 'origin/master' into stream-filesystem-c…

3f0105f

…ontent-into-column

alexey-milovidov self-assigned this May 18, 2026

alexey-milovidov commented May 18, 2026

View reviewed changes

alexey-milovidov added this pull request to the merge queue May 18, 2026

Merged via the queue into master with commit 6405020 May 19, 2026
166 checks passed

alexey-milovidov deleted the stream-filesystem-content-into-column branch May 19, 2026 00:00

robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026

groeneai mentioned this pull request Jun 10, 2026

Fix null connections dereference in RemoteQueryExecutor::read() on cancel #106950

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `filesystem()` honor the memory limit when loading file content#104956

Make `filesystem()` honor the memory limit when loading file content#104956
alexey-milovidov merged 7 commits into
masterfrom
stream-filesystem-content-into-column

alexey-milovidov commented May 14, 2026

Uh oh!

clickhouse-gh Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

alexey-milovidov commented May 16, 2026

Uh oh!

groeneai commented May 16, 2026

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

alexey-milovidov commented May 18, 2026

Uh oh!

groeneai commented May 18, 2026

Uh oh!

clickhouse-gh Bot commented May 18, 2026

Uh oh!

alexey-milovidov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alexey-milovidov commented May 14, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Motivation

Change

Test plan

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Final Verdict

Uh oh!

Uh oh!

alexey-milovidov commented May 16, 2026

Uh oh!

groeneai commented May 16, 2026

1. FunctionsStress.stress / AllTests (substringUTF8 non-determinism)

2. BuzzHouse (amd_debug) Segfault (STID 2779-2d69)

Master merge (ed96e0e)

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

alexey-milovidov commented May 18, 2026

Uh oh!

groeneai commented May 18, 2026

Uh oh!

clickhouse-gh Bot commented May 18, 2026

LLVM Coverage Report

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clickhouse-gh Bot commented May 14, 2026 •

edited

Loading

1. `FunctionsStress.stress` / `AllTests` (substringUTF8 non-determinism)

2. `BuzzHouse (amd_debug)` Segfault (STID 2779-2d69)

Master merge (`ed96e0e`)