Support Remote and RemoteSecure storage engines by alexey-milovidov · Pull Request #106189 · ClickHouse/ClickHouse · GitHub
Skip to content

Support Remote and RemoteSecure storage engines#106189

Open
alexey-milovidov wants to merge 31 commits into
masterfrom
remote-storage-engine
Open

Support Remote and RemoteSecure storage engines#106189
alexey-milovidov wants to merge 31 commits into
masterfrom
remote-storage-engine

Conversation

@alexey-milovidov

Copy link
Copy Markdown
Member

Implements #93138.

Previously, a table over a remote server built from an ad-hoc set of addresses could only be created via CREATE TABLE ... AS remote(...), reusing the remote/remoteSecure table functions. This PR adds the persistent counterparts as storage engines:

CREATE TABLE t ENGINE = Remote('addresses', db, table[, user[, password], sharding_key]);
CREATE TABLE t ENGINE = RemoteSecure(...);

This restores the symmetry between the table functions and the storage engines: Distributed uses a configured cluster, while Remote/RemoteSecure build a cluster on the fly from the supplied addresses, exactly like the remote/remoteSecure table functions. The table structure may be omitted, in which case it is inferred from the remote table.

The complex argument parsing previously living inside TableFunctionRemote::parseArguments is extracted into a reusable free function parseRemoteFunctionArguments, shared between the table functions and the new engines, so they accept exactly the same signatures with no duplicated logic. The engines construct a StorageDistributed over the resulting owned cluster.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added the Remote and RemoteSecure table engines, the persistent counterparts of the remote and remoteSecure table functions. CREATE TABLE ... ENGINE = Remote('addresses', db, table, ...) now works in addition to CREATE TABLE ... AS remote(...).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

Implements #93138.

Previously, a table over a remote server built from an ad-hoc set of
addresses could only be created via `CREATE TABLE ... AS remote(...)`,
i.e. by reusing the `remote` and `remoteSecure` table functions. This
adds the persistent counterparts as storage engines, so that

    CREATE TABLE t ENGINE = Remote('addresses', db, table[, user[, password], sharding_key])
    CREATE TABLE t ENGINE = RemoteSecure(...)

work as well. This restores the symmetry between the table functions and
the storage engines: `Distributed` uses a configured cluster, while
`Remote`/`RemoteSecure` build a cluster on the fly from the addresses,
exactly like the `remote`/`remoteSecure` table functions.

The complex argument parsing previously living inside
`TableFunctionRemote::parseArguments` is extracted into a reusable free
function `parseRemoteFunctionArguments`, which is shared between the
table functions and the new engines. The engines construct a
`StorageDistributed` over the resulting owned cluster, so they accept
exactly the same signatures as the table functions, and the table
structure may be omitted (it is then inferred from the remote table).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 31, 2026
Comment thread src/Storages/StorageDistributed.cpp
alexey-milovidov and others added 3 commits June 1, 2026 09:01
`registerStorageRemote` in `StorageDistributed.cpp` (part of the `dbms`
library) called `parseRemoteFunctionArguments`, which was defined in
`TableFunctionRemote.cpp` in `clickhouse_table_functions`. That library is
not linked into every target that uses `dbms` (e.g. `unit_tests_dbms`),
so the build failed with an undefined symbol:

    ld.lld-21: error: undefined symbol: DB::parseRemoteFunctionArguments(...)

Move `parseRemoteFunctionArguments` and the `ParsedRemoteFunctionArguments`
struct into a new file `Storages/Distributed/parseRemoteFunctionArguments.{h,cpp}`,
which is part of `dbms`, so the symbol is available wherever `dbms` is.
`TableFunctionRemote` keeps calling it through the new header (table
functions already depend on storages).

Also add the missing local forward declaration of `registerStorageRemote`
required by `-Wmissing-prototypes`.

CI: #106189

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a `Remote`/`RemoteSecure` table is created without an explicit
structure, the structure is inferred from the remote table. Previously the
inference was left to the `StorageDistributed` constructor, which uses the
storage's global context. For a local target such as
`Remote('127.0.0.1', db, table)` that bypassed the user-context
`SHOW_COLUMNS` access check in `getStructureOfRemoteTableInShard`, so a user
allowed to create a `Remote` table could infer the schema of a local table
they are not allowed to describe.

Infer the columns in `registerStorageRemote` under `args.getLocalContext()`
(the user's context) and pass them to `StorageDistributed`, so the access
check is enforced.

CI: #106189

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/StorageDistributed.cpp
Comment thread src/Storages/StorageDistributed.cpp
Comment thread src/Storages/Distributed/parseRemoteFunctionArguments.cpp Outdated
alexey-milovidov and others added 5 commits June 3, 2026 00:39
The new `Remote` and `RemoteSecure` storage engines were not present in the
`engine_to_type` map used by the ArrowFlight SQL server, so
`test_arrowflight_interface/test_sql_server.py::test_get_table_types` failed
with `Some engine(s) in system.table_engines are not mapped`. Map them to
`REMOTE TABLE`, like `Distributed`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gines

`Remote` and `RemoteSecure` accept a positional `password`, but were not
registered in `FunctionSecretArgumentsFinder::findTableEngineSecretArguments`,
so `SHOW CREATE TABLE` / `system.tables.engine_full` printed the password
instead of `[HIDDEN]`. Reuse `findRemoteFunctionSecretArguments`, which handles
both the positional and the named-collection forms, since the engines accept
exactly the same arguments as the `remote`/`remoteSecure` table functions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Setting `is_remote_function_` only adjusts the distributed client info; it does
not reproduce the local-shard access checks that `TableFunctionRemote::executeImpl`
performs. A user allowed to create `ENGINE = Remote('127.0.0.1:9000', secret_db,
secret_table, 'default')` could otherwise force the remote path (e.g. with
`prefer_localhost_replica = 0`), sending the query back to this server under the
stored credentials without checking the caller's access on the local target.

Mirror the table-function check in `registerStorageRemote`: when the built
cluster contains a local shard, require the creating user to hold both `SELECT`
and `INSERT` on the local target table (a persistent table can be used for both),
under the user's context.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a persistent `Remote`/`RemoteSecure` table is created from a named
collection (`ENGINE = Remote(my_collection, ...)`), the shared parser never
passed the table's `StorageID` to `tryGetNamedCollectionWithOverrides`, so the
table was not registered as a dependent of the collection. As a result,
`DROP NAMED COLLECTION` was not blocked while the table existed and the
dependency was not rebuilt on reload, which could leave a stored `Remote` table
broken after reload or next use.

Thread an optional `dependent_table_id` through `parseRemoteFunctionArguments`
and pass `&args.table_id` from `registerStorageRemote`. Table-function callers
leave it null, since their lifetime is bound to the query.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tests/queries/0_stateless/04308_remote_storage_engine.sql
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The failures of 00175_obfuscator_schema_inference and 00096_obfuscator_save_load in Stateless tests (amd_tsan, parallel) are NOT caused by this PR.

They were introduced by #104690 ("Add UntrackedMemory asynchronous metric"), which made clickhouse-obfuscator abort (SIGABRT) on process teardown: the query results are correct, but the Aborted line on stderr fails the test. #104690 has now been reverted (#106365).

#104690 was merged in violation of the ClickHouse team rules: its own CI already showed these two tests failing (10 times between May 12 and June 1) before it was merged.

Please update your branch to pick up the revert; the tests should pass again.

Comment thread src/Storages/StorageDistributed.cpp Outdated
alexey-milovidov and others added 2 commits June 5, 2026 10:15
For the persistent `Remote`/`RemoteSecure` engines, the local-shard access
check applied `SELECT`/`INSERT` to `parsed.remote_table_id`. When the target
is a table function (e.g. `numbers(...)`, `merge(...)`), that field is left at
the parser default `system.one`, so the check validated the wrong object: it
both spuriously required privileges on `system.one` and, with explicit columns,
never analyzed the table function under the creating user's context. A query
against such a persisted table could then be routed back to this server under
the engine credentials, reading local data the creator was never checked for.

Now, when the cluster contains a local shard and the target is a table
function, the function is analyzed under the user's context even when the
columns are given explicitly (`getStructureOfRemoteTableInShard` runs it through
`getActualTableStructureWithAccess`, which performs the access check). The
`SELECT`/`INSERT` check on `remote_table_id` is applied only for a plain table
target.

Added `04318_remote_storage_engine_access`, covering the persistent-engine
security and lifecycle invariants: the local-shard `SELECT`/`INSERT` check with
explicit columns, schema inference under the creating user's context, and the
named-collection dependency that blocks `DROP NAMED COLLECTION` while the table
exists.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Updated in c3e41a8 (pushed onto the existing branch, no rebase):

  • Merged current master (the branch was ~543 commits behind), which picks up the revert of Add UntrackedMemory asynchronous metric #104690 — the 00175_obfuscator_schema_inference / 00096_obfuscator_save_load failures were caused by that PR and are unrelated here. Latest CI for the previous commit was already green.
  • Fixed the remaining AI-review blocker: for a table-function target on a local shard, the access check no longer validates the parser default system.one; the function is now analyzed under the creating user's context even when columns are explicit (getActualTableStructureWithAccess), and the SELECT/INSERT check on remote_table_id applies only to plain table targets. Harmless targets like numbers(...) are no longer rejected spuriously. (The view(SELECT ...) example specifically is unreachable via the engine — a nested SELECT in a table-function argument is a SYNTAX_ERROR in the storage-definition grammar.)
  • Added 04318_remote_storage_engine_access covering the persistent-engine security/lifecycle invariants: the local-shard SELECT/INSERT check with explicit columns, schema inference under the creating user's context, and the named-collection dependency blocking DROP NAMED COLLECTION.

Built and verified locally: both 04308_remote_storage_engine and the new 04318_remote_storage_engine_access pass, and the merge(db, '^secret$') table-function path is correctly rejected at CREATE without SELECT on the underlying table.

Comment thread src/Storages/StorageDistributed.cpp Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The Fast test (arm_darwin) failure is the known macOS shard-test infrastructure issue, not caused by this PR.

04308_remote_storage_engine failed only on the case

CREATE TABLE t_remote_engine (dummy UInt8) ENGINE = Remote('127.0.0.{1,2}', system, one);
SELECT count() FROM t_remote_engine;

with Code: 209. DB::NetException: Timeout: connect timed out: 127.0.0.2:9000 ... (ALL_CONNECTION_TRIES_FAILED).

This is the standard shard-tag mechanism in clickhouse-test: shard tests are enabled only when the server's listen_host contains 127.0.0.2, but on the arm_darwin runner 127.0.0.2 is not actually reachable, so any shard test that connects to the second shard times out. Over the last 14 days, shard-tagged tests have 0 passes on arm_darwin — they are either skipped (when listen_host lacks 127.0.0.2) or fail (when it is configured but unroutable). Affected on dozens of unrelated PRs, e.g. 04336_parallel_blocks_marshalling_low_cardinality_native_format (86 fails across ~85 PRs) and 04327_rewrite_aggregate_function_with_if_distributed (45 fails). 04308 uses the identical 127.0.0.{1,2} pattern and shard tag as those accepted master tests, so no test change is warranted here.

I re-ran the failed job (workflow run 27483260206) — it passes when the runner skips the shard tests.

@groeneai, could you investigate / track the arm_darwin shard-test infrastructure failure (the runner advertises 127.0.0.2 in listen_host but 127.0.0.2:9000 is unroutable, so all shard-tagged tests fail when they actually connect to the second shard)? I could not find a dedicated tracking issue — #103235 covers a different arm_darwin problem (server crashes/hangs for 01404/04068). The fix likely belongs on the runner side (set up the 127.0.0.2/127.0.0.3 loopback aliases, or keep listen_host to 127.0.0.1/::1 so these tests skip consistently).

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov Confirmed: same root cause as the wider arm_darwin class. macOS binds only 127.0.0.1 on lo0 and does not auto-route the rest of 127.0.0.0/8, so 127.0.0.2:9000 is unroutable. The shard tag enables 04308's Remote('127.0.0.{1,2}', ...) case only when listen_host advertises 127.0.0.2, but that address is then unreachable, hence the Code 209 ALL_CONNECTION_TRIES_FAILED.

Scope over the last 7 days (Fast test arm_darwin, 127.0.0.2:9000 / ALL_CONNECTION_TRIES_FAILED): 15 distinct tests, 90 PRs, 165 failures, 0 on master.

Fix is in flight: PR #107435 adds a macOS Fast-test pre-hook aliasing 127.0.0.2..127.0.0.16 on lo0 (the server already listens on 0.0.0.0), so these tests run instead of skip. It also un-skips 5 parallel-replica remote() tests as proof. The same aliasing makes the shard-tag tests both satisfied and reachable, so 04308 is covered by it too. Open and mergeable, awaiting review.

On the tracking issue: a dedicated tracker is not strictly needed since #107435 fixes the whole class at once. If you would prefer one, I can open an issue enumerating the remaining darwin.skip / shard-tag tail to un-skip incrementally after the aliases land. 04327/04336 are already skipped on master via #107376.

`04308_remote_storage_engine` exercises the multi-shard case
`Remote('127.0.0.{1,2}', system, one)`, which connects to the second
loopback shard `127.0.0.2:9000`. On the `arm_darwin` Fast test runner,
macOS binds only `127.0.0.1` on `lo0` and does not auto-route the rest
of `127.0.0.0/8`, so `127.0.0.2` is unroutable and the query fails with
`ALL_CONNECTION_TRIES_FAILED`.

This is the same macOS shard-test infrastructure issue already handled
for the sibling tests `04327_rewrite_aggregate_function_with_if_distributed`
and `04336_parallel_blocks_marshalling_low_cardinality_native_format`,
which were added to `ci/defs/darwin.skip` in #107376. `04308` is a new
test introduced by this pull request, so it was not covered there. Add
it to `ci/defs/darwin.skip` for consistency until the runner-side fix
(#107435) aliases the additional loopback addresses and these tests can
run on macOS again.

The test itself is unchanged and keeps full coverage on Linux.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=dbc0fc28753abffe3096e1064be57a6c440263b6&name_0=PR&name_1=Fast%20test%20%28arm_darwin%29
Related: #107376
Related: #107435

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Storages/StorageDistributed.cpp Outdated
alexey-milovidov and others added 2 commits June 16, 2026 08:31
…rom metadata

The `Remote`/`RemoteSecure` storage-engine creator analyzes a local-shard
table-function target under the user's context to validate the creator's
access to the function's underlying tables, even when the columns are given
explicitly. This is a `CREATE`-time check, but the creator also runs when the
table is loaded from already-validated metadata on server startup
(`FORCE_ATTACH`) or `RESTORE` (`FORCE_RESTORE`). In that case re-running it is
unnecessary and harmful: analyzing the target can throw if its underlying
tables have changed since creation (e.g. the table matched by `merge(...)` was
dropped), making a valid persisted table impossible to load.

Guard the table-function inference and the local-shard `SELECT`/`INSERT`
checks with `isLoadingFromExistingMetadata`, so they run for fresh definitions
(`CREATE` and user `ATTACH`) but are skipped when loading from existing
metadata. The structure is still inferred unconditionally when it was omitted,
because then it is the table's only source of columns.

Add an integration test that creates a `Remote` table over a local `view(...)`
target, drops the table the view reads from, restarts the server, and checks
that the `Remote` table still attaches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every integration test directory must contain an `__init__.py`; its absence
made the `Style check` fail with:

    ./tests/integration/test_remote_storage_engine_attach/__init__.py should exist for every integration test

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/StorageDistributed.cpp
Comment thread src/Storages/StorageDistributed.cpp Outdated
alexey-milovidov and others added 2 commits June 20, 2026 17:53
A backup `RESTORE` of a `Remote`/`RemoteSecure` table that resolves to a
local shard must run the same local-shard `SELECT`/`INSERT` access check
that a direct `CREATE` does. Previously `args.is_restore_from_backup` was
folded into the `loading_from_metadata` guard together with
`isLoadingFromExistingMetadata`, so a normal `RESTORE` (which arrives with
`SECONDARY_CREATE` and `is_restore_from_backup`) skipped the check. A user
who could restore a table could therefore smuggle in
`Remote('127.0.0.1', protected_db, protected_table, 'default')` and reach a
local target they cannot access directly, even though a direct `CREATE`
would be rejected.

Split the guard: `loading_from_existing_metadata` (server startup only)
gates the plain local-shard `SELECT`/`INSERT` check, while
`skip_table_function_analysis` (startup or restore) still skips
re-analyzing a table-function target whose underlying tables may be absent
in the restore environment. Server startup keeps skipping both checks; a
backup `RESTORE` now runs the plain check again.

Addresses the AI-review blocker on the PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A persistent `Remote`/`RemoteSecure` table whose target is a table
function (e.g. `numbers(...)`, `view(...)`) can be read, but cannot be
written to: `remote_storage` is an empty `StorageID` whenever
`remote_table_function_ptr` is set, so `DistributedSink` would build an
`INSERT` into an empty table id instead of a real target. Such targets are
read-only by nature.

Make `StorageDistributed::write` throw `NOT_IMPLEMENTED` when
`remote_table_function_ptr` is set, instead of producing a malformed
query. This covers both the `remote`/`cluster` table functions and the
`Remote`/`RemoteSecure` storage engines with a table-function target; no
existing test inserts into such a target, so reading those targets is
unaffected.

Addresses the AI-review blocker on the PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Addressed the two remaining AI-review blockers (pushed onto the branch, no rebase): d3a8d8083b5..b909fc39534.

  • RESTORE local-target access check (0cafbd2bf4b): a backup RESTORE of Remote('127.0.0.1', db, table, ...) over a local shard now runs the same local-shard SELECT/INSERT check that a direct CREATE does. Previously args.is_restore_from_backup was folded into loading_from_metadata alongside isLoadingFromExistingMetadata, so a RESTORE (which arrives as SECONDARY_CREATE + is_restore_from_backup) skipped the check, letting a user who can restore reach a local target they cannot access directly even though a direct CREATE would be rejected. The guard is now split: loading_from_existing_metadata (server startup only) gates the plain SELECT/INSERT check, while skip_table_function_analysis (startup or restore) still skips re-analyzing a table-function target whose underlying tables may legitimately be absent in the restore environment — so the 04341 table-function-restore case keeps working.

  • INSERT into a table-function-target Remote (b909fc39534): StorageDistributed::write now throws NOT_IMPLEMENTED when remote_table_function_ptr is set, instead of letting DistributedSink build an INSERT into the empty StorageID. Reading such targets (e.g. Remote('127.0.0.1', numbers(10))) is unaffected, and no existing test inserts into a table-function target, so the read path is preserved.

Added the two requested regressions:

  • 04401_remote_storage_engine_restore_access: a restricted user restoring Remote('127.0.0.1', protected_db, protected_target, 'default') with explicit columns and no SELECT/INSERT on the local target is rejected like CREATE, then succeeds once granted.
  • 04402_remote_storage_engine_insert_table_function: reading a table-function-target Remote works; INSERT is rejected with NOT_IMPLEMENTED.

Built and verified locally: 04308, 04318, 04341, 04342, and the new 04401/04402 all pass.

Comment thread src/Storages/StorageDistributed.cpp
…t queue

A persistent `Remote`/`RemoteSecure` table builds an ad-hoc cluster stored in
`owned_cluster`, exactly like the `remote()`/`cluster()` table functions. But
`StorageDistributed::write` forced synchronous insertion whenever `owned_cluster`
was set, so a persistent `Remote` table could never use the asynchronous
`Distributed` queue even with `distributed_foreground_insert = 0`. As a result
`fsync_after_insert`, `background_insert_*` and the validated
`bytes_to_delay_insert` / `bytes_to_throw_insert` surface were silently
ineffective for it.

`owned_cluster` was only ever a proxy for "this is a `remote()` table function",
which has no persistent storage to spool to. That condition is already captured
precisely by `relative_data_path.empty()`: every table-function storage is
temporary and has an empty data path, while a persistent `Remote` engine has a
real data path. Key the forced-sync fallback on `relative_data_path.empty()`
instead of `owned_cluster`, so the persistent engine inserts asynchronously like
a regular `Distributed` table, while the table functions keep inserting
synchronously (they still match `relative_data_path.empty()`).

The asynchronous queue works for an owned cluster: `parseAddresses` resolves both
the full and compact directory-name formats against `getCluster()` (which returns
`owned_cluster`), and `createPool` honors `address.secure`, so `RemoteSecure`
inserts stay secure.

Addresses the AI-review blocker on the PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/StorageDistributed.cpp
Resolve the conflict in `src/TableFunctions/TableFunctionRemote.cpp`: this branch
moved the argument parsing out of `TableFunctionRemote::parseArguments` into the
shared `parseRemoteFunctionArguments`, while master's #b09d93f9920 ("Fix null AST
in remote() named-collection database override") changed the now-extracted code.
Keep this branch's call site and port master's fix into
`parseRemoteFunctionArguments`, so the `database =`/`db =` named-collection branch
treats a non-table-function argument as a database name instead of assigning a null
`remote_table_function_ptr`. Without the port the merge would have silently
reverted that master bug fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, investigate and fix

Received signal 4 (signal sent by pid 692190695 from user 21854) (STID: 1781-3414)

This is surprising to me, didn't see it before.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov signal 4 is SIGILL from the harmful.c trap on getservbyname (a non-reentrant libc function, replaced by __builtin_trap in debug/sanitizer builds). It is not in the Remote/RemoteSecure path.

Full stack (Stress test (amd_asan_ubsan), STID 1781-3414):

3.  getservbyname                              base/harmful/harmful.c:85
4.  Poco::Net::SocketAddress::resolveService   SocketAddress.cpp:394
6.  Poco::Net::SocketAddress::SocketAddress    SocketAddress.cpp:132
9.  DB::ClientInfo::read                        src/Interpreters/ClientInfo.cpp:238
10. DB::TCPHandler::processUnexpectedQuery      src/Server/TCPHandler.cpp:2560
11. DB::TCPHandler::receivePacketsExpectData    src/Server/TCPHandler.cpp:1230
12. DB::TCPHandler::skipData                     src/Server/TCPHandler.cpp:1301

Root cause: ClientInfo::read builds Poco::Net::SocketAddress(initial_address_string) directly from the wire string (ClientInfo.cpp:238). When the port part is non-numeric (or > 65535), Poco's resolveService falls back to getservbyname, which is trapped. initial_address is always serialized as numeric host:port via SocketAddress::toString(), so valid data never reaches this branch; a desynced or corrupted native-protocol stream does (here processUnexpectedQuery skipping an unexpected packet under stress + ast_fuzzer_any_query + fault injection). Since the trap is not a catchable exception, untrusted wire input is a hard server crash.

This is pre-existing and unrelated to this PR: it touches none of ClientInfo/SocketAddress/TCPHandler, the line dates to Feb 2025, and the same STID fires on the unrelated #97302.

Fix: in ClientInfo::read, validate the port is numeric before constructing the address and throw a catchable error otherwise, instead of letting Poco call getservbyname. I will open a separate fix PR referencing this one.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=ef34fc53dcda266cca006edea8332247405ff46c&name_0=PR&name_1=Stress%20test%20%28amd_asan_ubsan%29&name_1=Stress%20test%20%28amd_asan_ubsan%29

@groeneai

Copy link
Copy Markdown
Contributor

Fix PR: #108410 (validates the wire address in ClientInfo::read so a non-numeric port can no longer reach the trapped getservbyname). Pre-existing on master, not caused by this PR.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged current master (the branch was 260 commits behind) and pushed onto the branch with no rebase: 1c70303ea57..cc55263b2b2.

  • The only conflict was in ci/defs/darwin.skip. master removed the whole block of distributed/remote shard/distributed-tagged tests from the macOS skip list — Raúl's isLocalAddress fix (f627c8478cc, keep 127.0.0.{2..255} non-local by value even when aliased to lo0) plus the macOS 127.0.0.{2..16} lo0 aliases that make those addresses reachable, batch-pruned in darwin-skip-prune-batch8. That block included 04327_rewrite_aggregate_function_with_if_distributed, which uses the identical remote('127.0.0.{1,2}', system.one) pattern as 04308's shard case (the one that timed out connecting to 127.0.0.2:9000 on arm_darwin before). I resolved the conflict by also dropping the 04308_remote_storage_engine entry, so it now runs on macOS like its siblings instead of staying skipped. The net diff against master is otherwise unchanged — exactly the engine's files.
  • Build-verified the merge: recompiled StorageDistributed.cpp, Storages/Distributed/parseRemoteFunctionArguments.cpp, registerStorages.cpp, TableFunctionRemote.cpp, and Server/ArrowFlight/commandSelector.cpp against current masterninja exit 0, no API drift over the 260 commits.

The only remaining CI red on the previous commit was the chronic, master-wide stress hung-check (Stress test (amd_tsan): Cannot drop databases after stress tests. Probably server consumed too much memory and cannot execute simple queriesServer is not responding), tracked by #107941 — not on the Remote/RemoteSecure path. The fresh master merge re-runs it.

The AI-review Block is unchanged on purpose: it is the RESTORE-time table-function re-analysis skip in StorageDistributed.cpp, which you deliberately kept (documented in-code and in 0cafbd2bf4b) as a restorability-vs-strictness trade-off — the plain-table local-shard SELECT/INSERT check still runs on restore.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

CI re-ran on the same commit (cc55263b2b2). The stress hung-check I mentioned above (#107941) is no longer red; the only remaining failure is now a different, unrelated master-wide flake:

So the PR is still green apart from an unrelated flake. The sole substantive blocker remains the AI-review Block — the deliberate RESTORE-time table-function re-analysis skip you documented in 0cafbd2bf4b (restorability-vs-strictness trade-off; the plain-table local-shard SELECT/INSERT check still runs on restore, covered by 04401_remote_storage_engine_restore_access.sh). No code change pushed for it — that trade-off is your call.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

CI has fully settled on the current commit (cc55263b2b2): the Stress test (arm_ubsan) flake #107598 I noted on 06-30 has re-run green, so the PR is now 100% green — 156 checks SUCCESS / 18 SKIPPED / 0 failures, with both required gates (Mergeable Check, CH Inc sync) passing. MERGEABLE/CLEAN. No re-merge is warranted: merge-base is <1 day old and master has touched none of this PR's files.

The sole remaining item is the AI-review ❌ Block (and the one open clickhouse-gh inline thread at StorageDistributed.cpp:2552), both about the RESTORE-time table-function re-analysis skip. I re-verified the current code against your 0cafbd2bf4b documentation: skip_table_function_analysis = loading_from_existing_metadata || is_restore_from_backup skips only the table-function re-analysis on restore, while the plain-table local-shard SELECT/INSERT check is gated on !loading_from_existing_metadata and still runs on RESTORE (covered by 04401_remote_storage_engine_restore_access.sh). That matches the restorability-vs-strictness trade-off you documented, so I left it as your design call and pushed no code change.

Ready to merge modulo that decision.

…ESTORE

A backup `RESTORE` of a `Remote`/`RemoteSecure` table with a table-function
target that resolves to a local shard (e.g.
`Remote('127.0.0.1', merge(db, '^protected$'), 'default')`) previously skipped
the table-function analysis unconditionally, because `args.is_restore_from_backup`
was folded into `skip_table_function_analysis`. A restricted user who may restore
could therefore restore such a table with explicit columns and reach a local
target they cannot access directly, even though a direct `CREATE` runs the
analysis under the user's context (via `getStructureOfRemoteTable` ->
`getActualTableStructureWithAccess`) and would be rejected. The plain-target
`SELECT`/`INSERT` check was already enforced on restore; this closes the same
hole for table-function targets.

Run the analysis on restore as well, but fail closed only on an access-control
error (`ACCESS_DENIED`): the target's underlying tables may legitimately be
absent in the restore environment (e.g. the table matched by `merge(...)` was
dropped since the backup was taken), and a valid persisted table must still be
restorable in that case (`04341_remote_storage_engine_restore`). An access
failure is the exact case a direct `CREATE` rejects and the only one that could
leak a local target; any other failure means the target cannot be analyzed (and
therefore cannot be read either), so the restore proceeds with the columns
carried in the backup metadata.

Add `04489_remote_storage_engine_restore_tf_access` covering both: a restricted
user's restore of `Remote(..., merge('^protected$'), ...)` is rejected without
target access and allowed with it, while a restore over a genuinely absent
target still succeeds.

Addresses the AI-review blocker on the PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Addressed the remaining AI-review blocker (pushed onto the branch, no rebase): cc55263b2b2..2b912bb2a79.

RESTORE access check for table-function targets (StorageDistributed.cpp): a backup RESTORE of a Remote/RemoteSecure table whose table-function target resolves to a local shard now runs the same analysis a direct CREATE does, closing the last gap the plain-target fix (0cafbd2bf4b) had left open. Previously args.is_restore_from_backup was folded into skip_table_function_analysis, so a restricted user who may restore could smuggle in Remote('127.0.0.1', merge(db, '^protected$'), 'default') (or view(SELECT ... FROM secret)) with explicit columns and reach a local target they cannot access directly.

The analysis now runs on restore too, failing closed only on ACCESS_DENIED — the exact case a direct CREATE rejects and the only one that could leak a local target. Any other failure is tolerated, because the target's underlying tables may legitimately be absent in the restore environment (e.g. the table matched by merge(...) was dropped since the backup was taken) and a valid persisted table must remain restorable; in that case there is nothing to leak. This keeps 04341_remote_storage_engine_restore green.

Regression added: 04489_remote_storage_engine_restore_tf_access — a restricted user's restore of Remote(..., merge('^protected$'), ...) is rejected without target access, allowed with it, and a restore over a genuinely absent target still succeeds.

Verified locally against a Release build (cc55263+ this commit): 04489, 04341, 04318, 04401, 04308 all pass (5/5).

The inline AI-review thread on StorageDistributed.cpp is replied to and resolved.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

To be explicit about the earlier "that trade-off is your design call" framing: this commit is offered as a concrete proposal, not an override — if you prefer to keep the previous unconditional skip on restore, 2b912bb2a79 reverts cleanly on its own (it is a separate commit, no rebase/amend).

The key point is that it does not reintroduce the un-restorable-valid-table regression that motivated the skip: the restore still tolerates a genuinely absent table-function target (any non-ACCESS_DENIED analysis failure is swallowed), so 04341_remote_storage_engine_restore stays green. It only fails closed when the analysis reports ACCESS_DENIED — the exact case a direct CREATE rejects. So it satisfies both the restorability constraint you documented and the access-control invariant the AI review is asking for.

/// When the structure was carried in the definition, the analysis runs purely for its access
/// side effect, so on restore a non-access failure (absent target) may be tolerated. When the
/// structure was omitted, the analysis is the only source of columns and must always succeed.
const bool tolerate_absent_target = !columns.empty() && args.is_restore_from_backup;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this RESTORE path hits any local table-function-target failure other than ACCESS_DENIED, it silently trusts the persisted columns and keeps the definition. That is not fail-closed for a fresh user-driven restore: if Remote('127.0.0.1', merge(db, '^protected$'), 'default') is restored while db.protected is temporarily absent, the restoring user never proves access to that local target, yet the stored engine credentials will start reading it as soon as a matching table is recreated later. Please make the default RESTORE path reject local table-function targets whose access cannot be established, or gate this bypass behind an explicitly trusted/internal restore mode instead of swallowing every non-access exception here.

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Recommendation: keep 2b912bb2a79. Revert is strictly worse (it reopens the present-target leak this commit closes), and the residual the bot describes is not fully closable for a table-function target without breaking restorability (04341).

The two branches fail closed differently, and that is the crux:

  • Plain-table branch (checkAccess(SELECT/INSERT, remote_table_id)) is grant-based, so it fails closed even for an absent target: a grant exists independently of whether the table currently exists. It already runs unconditionally on restore.
  • Table-function branch fails closed only when it can enumerate what the function resolves to. getStructureOfRemoteTable -> getActualTableStructureWithAccess checks access on the tables merge('^protected$') matches. When those tables are absent at restore, there is nothing to enumerate and no grant set to check. You cannot prove or deny access to a target you cannot resolve.

That is why 04341 (valid table, target legitimately gone, must restore) and the bot's "absent-now, present-later" vector are indistinguishable at restore time from the definition alone: the only discriminator is a future event plus the restoring user's trust level, neither knowable when the analysis runs.

On the bot's vector: it is real but lower severity than the one you closed. It needs (1) the restoring user to already hold a backup with privileged engine credentials baked in, which only someone who already had access to the target could have produced and handed over, (2) the target absent at restore, and (3) the target reappearing later. That is the same trust surface as "Remote persists credentials and reads under them," not a new escalation primitive. The commit closes the immediate, no-future-event case (present target, ACCESS_DENIED fatal), which is the higher-value fix.

If you want the clickhouse-gh thread cleared rather than documented as an accepted structural limit, the only principled closure is the bot's second option: gate the non-ACCESS_DENIED swallow behind a trusted/internal restore path, so a user-driven restore fails closed on an unresolved local table-function target while a system/admin restore keeps 04341's tolerance (04341 restores as the default user, so it stays green; 04489's absent-target case would move to restricted-user-rejected). The bot's first option (reject all unresolved targets on the default path) would break 04341. That gate is a separable change; I would not fold it into this PR unless you want it here, since the current commit is a strict improvement on its own.

Net: keep it. Either document the table-function residual as a known restore limitation, or follow up with the trusted-restore gate if the block must clear.

@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 292/297 (98.32%) · Uncovered code

Full report · Diff report

@alexey-milovidov

alexey-milovidov commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

The sole remaining CI red on 2b912bb2a79 is a stateless-test failure unrelated to this PR:

  • Stateless tests (amd_debug, distributed plan, s3 storage, parallel)03040_dynamic_type_alters_2_compact_merge_tree ("result differs with reference": Dynamic subcolumns read back as \N instead of their values).

This is a chronic, master-wide flake, not on the Remote/RemoteSecure path. Over the last 30 days it has failed 9 times — once on master itself (pull_request_number = 0, 2026-07-01) and on 7 unrelated PRs (#107934, #101841, #106734, #102402, #107300, #103158). This PR touches only the Remote/Distributed argument parsing and access-check code — none of the Dynamic-type serialization or compact-part merge logic the test exercises.

Its CI auto-tracking issue #105306 was closed on 2026-05-19, but the test has since regressed (9 failures in the last 30 days), so the earlier fix no longer holds.

@groeneai, please investigate 03040_dynamic_type_alters_2_compact_merge_tree (reopen or supersede #105306) and provide a fix in a separate PR; if a fix is already in progress, link it here.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=2b912bb2a792c787bbbd776bd7252686d47ee875&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20distributed%20plan%2C%20s3%20storage%2C%20parallel%29

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov thanks, agreed this is unrelated to the Remote/RemoteSecure work here.

I investigated 03040_dynamic_type_alters_2_compact_merge_tree. It is the same class as #80648 (data loss on ALTER RENAME COLUMN racing a merge). A fix is already in progress: #104822 (open, mergeable). Confirmed root cause by local reproduction: a background merge (OPTIMIZE ... FINAL) selected concurrently with the async rename observes the new column name in metadata but no matching rename AlterConversions, reads old parts by the new name, finds no file, and writes the merged part without the Dynamic column. Reads then return all \N (the exact 15 None from the report).

Reproducer (compact part, Dynamic column with no default, 5 parts, RENAME COLUMN d TO d1 SETTINGS mutations_sync=0 concurrent with OPTIMIZE TABLE t FINAL) fails on ~8% of iterations on master.

Important finding: #104822 in its current form does not fully close the Dynamic case. With #104822 applied the failure rate drops ~8x (from ~8% to ~1%, 4/400 iterations) but is still nonzero. #104822 makes the alter() side atomic (metadata publish + mutation registration under one lock), which fixes the primary window. The residual is on the merge side: the merge captures its metadata_snapshot at selection time (StorageMergeTree::merge) and its mutations_snapshot later in MergeTask prepare, so the two can still straddle the rename. Observed the inconsistent pair directly: storage_columns = {x, y, d} (metadata still shows the old name d) while the per-part AlterConversions maps d -> d1 (rename already in the mutations snapshot). d1 is absent from the merge output schema, so the Dynamic column is dropped from the merged part.

Fix direction: capture the merge's metadata and mutations snapshots as a consistent pair (e.g. under currently_processing_in_background_mutex at selection, threaded into the MergeTask), so the merge cannot combine old-name metadata with a rename conversion. I will extend #104822 with this and re-verify the reproducer at 0 failures. Will link the update here.

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Fix pushed. 03040_dynamic_type_alters_2_compact_merge_tree is the ALTER RENAME COLUMN vs merge race (#80648); the residual for the Dynamic (no-default) case is now closed in #104822 (commit 1057c91), which I extended rather than opening a separate PR since it is the same race and that PR is open.

Root cause of the residual: a merge applies the pending rename to its source parts on-fly, but the merged part kept the source mutation version, so the rename mutation ran again on the already-renamed compact part, re-read the column by its old name (gone), and wrote it empty. The fix stamps a merged part with the metadata-mutation version it applies on-fly, so the rename is not re-scheduled. 0 data loss over 2366 iterations of the reproducer (was ~8% on master); details on #104822.

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants