iframe-proxy

alexey-milovidov · 2026-05-31T07:32:51Z

Implements #93138.

Previously, a table over a remote server built from an ad-hoc set of addresses could only be created via CREATE TABLE ... AS remote(...), reusing the remote/remoteSecure table functions. This PR adds the persistent counterparts as storage engines:

CREATE TABLE t ENGINE = Remote('addresses', db, table[, user[, password], sharding_key]);
CREATE TABLE t ENGINE = RemoteSecure(...);

This restores the symmetry between the table functions and the storage engines: Distributed uses a configured cluster, while Remote/RemoteSecure build a cluster on the fly from the supplied addresses, exactly like the remote/remoteSecure table functions. The table structure may be omitted, in which case it is inferred from the remote table.

The complex argument parsing previously living inside TableFunctionRemote::parseArguments is extracted into a reusable free function parseRemoteFunctionArguments, shared between the table functions and the new engines, so they accept exactly the same signatures with no duplicated logic. The engines construct a StorageDistributed over the resulting owned cluster.

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added the Remote and RemoteSecure table engines, the persistent counterparts of the remote and remoteSecure table functions. CREATE TABLE ... ENGINE = Remote('addresses', db, table, ...) now works in addition to CREATE TABLE ... AS remote(...).

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

Implements #93138. Previously, a table over a remote server built from an ad-hoc set of addresses could only be created via `CREATE TABLE ... AS remote(...)`, i.e. by reusing the `remote` and `remoteSecure` table functions. This adds the persistent counterparts as storage engines, so that CREATE TABLE t ENGINE = Remote('addresses', db, table[, user[, password], sharding_key]) CREATE TABLE t ENGINE = RemoteSecure(...) work as well. This restores the symmetry between the table functions and the storage engines: `Distributed` uses a configured cluster, while `Remote`/`RemoteSecure` build a cluster on the fly from the addresses, exactly like the `remote`/`remoteSecure` table functions. The complex argument parsing previously living inside `TableFunctionRemote::parseArguments` is extracted into a reusable free function `parseRemoteFunctionArguments`, which is shared between the table functions and the new engines. The engines construct a `StorageDistributed` over the resulting owned cluster, so they accept exactly the same signatures as the table functions, and the table structure may be omitted (it is then inferred from the remote table). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

clickhouse-gh · 2026-05-31T07:33:25Z

Workflow [PR], commit [2b912bb]

Summary: ❌

Performance Comparison: Performance dashboard

job_name	test_name	status	info	comment
Stateless tests (amd_debug, distributed plan, s3 storage, parallel)		FAIL
	03040_dynamic_type_alters_2_compact_merge_tree	FAIL	cidb

AI Review

Summary

This PR adds persistent Remote and RemoteSecure storage engines by reusing StorageDistributed and the shared remote argument parser, and it fixes most of the earlier issues around password masking, named-collection dependencies, settings validation, async inserts, and restore-time checks. I still do not think it is merge-ready: the latest restore fallback now accepts a privileged local table-function target when access cannot be proven if the target is temporarily absent, and the older local-merge(...) privilege gap remains in the shared table-function semantics.

Findings

❌ Blockers

[src/Storages/StorageDistributed.cpp:2571-2586] The new restore fallback is not fail-close for local table-function targets. Swallowing every non-ACCESS_DENIED failure means a restricted user can restore Remote('127.0.0.1', merge(db, '^protected$'), 'default') while the matching local table is absent, and the stored engine credentials will start reading it later as soon as that table is recreated. Suggested fix: do not silently accept fresh user-facing RESTORE when access to a local table-function target cannot be established; if historical-backup restorability needs a bypass, gate it behind a trusted/internal restore mode or setting instead of the default path.
[dismissed by author -- https://github.com/Support Remote and RemoteSecure storage engines #106189#discussion_r3362203807] [src/TableFunctions/TableFunctionRemote.cpp:56-59, src/Storages/StorageMerge.cpp:230-233, src/Storages/StorageMerge.cpp:1467-1472] Local table-function targets still do not mirror the plain-table SELECT/INSERT validation. For merge(...), schema resolution checks only SHOW_TABLES/SHOW_COLUMNS, while the real SELECT check is deferred until execution under the stored remote credentials. That still lets a creator with schema visibility but without SELECT install a Remote(..., merge(...), 'default') wrapper over local data.

Final Verdict

❌ Changes requested.

`registerStorageRemote` in `StorageDistributed.cpp` (part of the `dbms` library) called `parseRemoteFunctionArguments`, which was defined in `TableFunctionRemote.cpp` in `clickhouse_table_functions`. That library is not linked into every target that uses `dbms` (e.g. `unit_tests_dbms`), so the build failed with an undefined symbol: ld.lld-21: error: undefined symbol: DB::parseRemoteFunctionArguments(...) Move `parseRemoteFunctionArguments` and the `ParsedRemoteFunctionArguments` struct into a new file `Storages/Distributed/parseRemoteFunctionArguments.{h,cpp}`, which is part of `dbms`, so the symbol is available wherever `dbms` is. `TableFunctionRemote` keeps calling it through the new header (table functions already depend on storages). Also add the missing local forward declaration of `registerStorageRemote` required by `-Wmissing-prototypes`. CI: #106189 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When a `Remote`/`RemoteSecure` table is created without an explicit structure, the structure is inferred from the remote table. Previously the inference was left to the `StorageDistributed` constructor, which uses the storage's global context. For a local target such as `Remote('127.0.0.1', db, table)` that bypassed the user-context `SHOW_COLUMNS` access check in `getStructureOfRemoteTableInShard`, so a user allowed to create a `Remote` table could infer the schema of a local table they are not allowed to describe. Infer the columns in `registerStorageRemote` under `args.getLocalContext()` (the user's context) and pass them to `StorageDistributed`, so the access check is enforced. CI: #106189 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The new `Remote` and `RemoteSecure` storage engines were not present in the `engine_to_type` map used by the ArrowFlight SQL server, so `test_arrowflight_interface/test_sql_server.py::test_get_table_types` failed with `Some engine(s) in system.table_engines are not mapped`. Map them to `REMOTE TABLE`, like `Distributed`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gines `Remote` and `RemoteSecure` accept a positional `password`, but were not registered in `FunctionSecretArgumentsFinder::findTableEngineSecretArguments`, so `SHOW CREATE TABLE` / `system.tables.engine_full` printed the password instead of `[HIDDEN]`. Reuse `findRemoteFunctionSecretArguments`, which handles both the positional and the named-collection forms, since the engines accept exactly the same arguments as the `remote`/`remoteSecure` table functions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Setting `is_remote_function_` only adjusts the distributed client info; it does not reproduce the local-shard access checks that `TableFunctionRemote::executeImpl` performs. A user allowed to create `ENGINE = Remote('127.0.0.1:9000', secret_db, secret_table, 'default')` could otherwise force the remote path (e.g. with `prefer_localhost_replica = 0`), sending the query back to this server under the stored credentials without checking the caller's access on the local target. Mirror the table-function check in `registerStorageRemote`: when the built cluster contains a local shard, require the creating user to hold both `SELECT` and `INSERT` on the local target table (a persistent table can be used for both), under the user's context. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When a persistent `Remote`/`RemoteSecure` table is created from a named collection (`ENGINE = Remote(my_collection, ...)`), the shared parser never passed the table's `StorageID` to `tryGetNamedCollectionWithOverrides`, so the table was not registered as a dependent of the collection. As a result, `DROP NAMED COLLECTION` was not blocked while the table existed and the dependency was not rebuilt on reload, which could leave a stored `Remote` table broken after reload or next use. Thread an optional `dependent_table_id` through `parseRemoteFunctionArguments` and pass `&args.table_id` from `registerStorageRemote`. Table-function callers leave it null, since their lifetime is bound to the query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-03T07:39:28Z

The failures of 00175_obfuscator_schema_inference and 00096_obfuscator_save_load in Stateless tests (amd_tsan, parallel) are NOT caused by this PR.

They were introduced by #104690 ("Add UntrackedMemory asynchronous metric"), which made clickhouse-obfuscator abort (SIGABRT) on process teardown: the query results are correct, but the Aborted line on stderr fails the test. #104690 has now been reverted (#106365).

#104690 was merged in violation of the ClickHouse team rules: its own CI already showed these two tests failing (10 times between May 12 and June 1) before it was merged.

Please update your branch to pick up the revert; the tests should pass again.

For the persistent `Remote`/`RemoteSecure` engines, the local-shard access check applied `SELECT`/`INSERT` to `parsed.remote_table_id`. When the target is a table function (e.g. `numbers(...)`, `merge(...)`), that field is left at the parser default `system.one`, so the check validated the wrong object: it both spuriously required privileges on `system.one` and, with explicit columns, never analyzed the table function under the creating user's context. A query against such a persisted table could then be routed back to this server under the engine credentials, reading local data the creator was never checked for. Now, when the cluster contains a local shard and the target is a table function, the function is analyzed under the user's context even when the columns are given explicitly (`getStructureOfRemoteTableInShard` runs it through `getActualTableStructureWithAccess`, which performs the access check). The `SELECT`/`INSERT` check on `remote_table_id` is applied only for a plain table target. Added `04318_remote_storage_engine_access`, covering the persistent-engine security and lifecycle invariants: the local-shard `SELECT`/`INSERT` check with explicit columns, schema inference under the creating user's context, and the named-collection dependency that blocks `DROP NAMED COLLECTION` while the table exists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-05T10:59:13Z

Updated in c3e41a8 (pushed onto the existing branch, no rebase):

Merged current master (the branch was ~543 commits behind), which picks up the revert of Add UntrackedMemory asynchronous metric #104690 — the 00175_obfuscator_schema_inference / 00096_obfuscator_save_load failures were caused by that PR and are unrelated here. Latest CI for the previous commit was already green.
Fixed the remaining AI-review blocker: for a table-function target on a local shard, the access check no longer validates the parser default system.one; the function is now analyzed under the creating user's context even when columns are explicit (getActualTableStructureWithAccess), and the SELECT/INSERT check on remote_table_id applies only to plain table targets. Harmless targets like numbers(...) are no longer rejected spuriously. (The view(SELECT ...) example specifically is unreachable via the engine — a nested SELECT in a table-function argument is a SYNTAX_ERROR in the storage-definition grammar.)
Added 04318_remote_storage_engine_access covering the persistent-engine security/lifecycle invariants: the local-shard SELECT/INSERT check with explicit columns, schema inference under the creating user's context, and the named-collection dependency blocking DROP NAMED COLLECTION.

Built and verified locally: both 04308_remote_storage_engine and the new 04318_remote_storage_engine_access pass, and the merge(db, '^secret$') table-function path is correctly rejected at CREATE without SELECT on the underlying table.

alexey-milovidov · 2026-06-14T15:56:22Z

The Fast test (arm_darwin) failure is the known macOS shard-test infrastructure issue, not caused by this PR.

04308_remote_storage_engine failed only on the case

CREATE TABLE t_remote_engine (dummy UInt8) ENGINE = Remote('127.0.0.{1,2}', system, one);
SELECT count() FROM t_remote_engine;

with Code: 209. DB::NetException: Timeout: connect timed out: 127.0.0.2:9000 ... (ALL_CONNECTION_TRIES_FAILED).

This is the standard shard-tag mechanism in clickhouse-test: shard tests are enabled only when the server's listen_host contains 127.0.0.2, but on the arm_darwin runner 127.0.0.2 is not actually reachable, so any shard test that connects to the second shard times out. Over the last 14 days, shard-tagged tests have 0 passes on arm_darwin — they are either skipped (when listen_host lacks 127.0.0.2) or fail (when it is configured but unroutable). Affected on dozens of unrelated PRs, e.g. 04336_parallel_blocks_marshalling_low_cardinality_native_format (86 fails across ~85 PRs) and 04327_rewrite_aggregate_function_with_if_distributed (45 fails). 04308 uses the identical 127.0.0.{1,2} pattern and shard tag as those accepted master tests, so no test change is warranted here.

I re-ran the failed job (workflow run 27483260206) — it passes when the runner skips the shard tests.

@groeneai, could you investigate / track the arm_darwin shard-test infrastructure failure (the runner advertises 127.0.0.2 in listen_host but 127.0.0.2:9000 is unroutable, so all shard-tagged tests fail when they actually connect to the second shard)? I could not find a dedicated tracking issue — #103235 covers a different arm_darwin problem (server crashes/hangs for 01404/04068). The fix likely belongs on the runner side (set up the 127.0.0.2/127.0.0.3 loopback aliases, or keep listen_host to 127.0.0.1/::1 so these tests skip consistently).

groeneai · 2026-06-14T16:20:27Z

@alexey-milovidov Confirmed: same root cause as the wider arm_darwin class. macOS binds only 127.0.0.1 on lo0 and does not auto-route the rest of 127.0.0.0/8, so 127.0.0.2:9000 is unroutable. The shard tag enables 04308's Remote('127.0.0.{1,2}', ...) case only when listen_host advertises 127.0.0.2, but that address is then unreachable, hence the Code 209 ALL_CONNECTION_TRIES_FAILED.

Scope over the last 7 days (Fast test arm_darwin, 127.0.0.2:9000 / ALL_CONNECTION_TRIES_FAILED): 15 distinct tests, 90 PRs, 165 failures, 0 on master.

Fix is in flight: PR #107435 adds a macOS Fast-test pre-hook aliasing 127.0.0.2..127.0.0.16 on lo0 (the server already listens on 0.0.0.0), so these tests run instead of skip. It also un-skips 5 parallel-replica remote() tests as proof. The same aliasing makes the shard-tag tests both satisfied and reachable, so 04308 is covered by it too. Open and mergeable, awaiting review.

On the tracking issue: a dedicated tracker is not strictly needed since #107435 fixes the whole class at once. If you would prefer one, I can open an issue enumerating the remaining darwin.skip / shard-tag tail to un-skip incrementally after the aliases land. 04327/04336 are already skipped on master via #107376.

`04308_remote_storage_engine` exercises the multi-shard case `Remote('127.0.0.{1,2}', system, one)`, which connects to the second loopback shard `127.0.0.2:9000`. On the `arm_darwin` Fast test runner, macOS binds only `127.0.0.1` on `lo0` and does not auto-route the rest of `127.0.0.0/8`, so `127.0.0.2` is unroutable and the query fails with `ALL_CONNECTION_TRIES_FAILED`. This is the same macOS shard-test infrastructure issue already handled for the sibling tests `04327_rewrite_aggregate_function_with_if_distributed` and `04336_parallel_blocks_marshalling_low_cardinality_native_format`, which were added to `ci/defs/darwin.skip` in #107376. `04308` is a new test introduced by this pull request, so it was not covered there. Add it to `ci/defs/darwin.skip` for consistency until the runner-side fix (#107435) aliases the additional loopback addresses and these tests can run on macOS again. The test itself is unchanged and keeps full coverage on Linux. Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=dbc0fc28753abffe3096e1064be57a6c440263b6&name_0=PR&name_1=Fast%20test%20%28arm_darwin%29 Related: #107376 Related: #107435 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rom metadata The `Remote`/`RemoteSecure` storage-engine creator analyzes a local-shard table-function target under the user's context to validate the creator's access to the function's underlying tables, even when the columns are given explicitly. This is a `CREATE`-time check, but the creator also runs when the table is loaded from already-validated metadata on server startup (`FORCE_ATTACH`) or `RESTORE` (`FORCE_RESTORE`). In that case re-running it is unnecessary and harmful: analyzing the target can throw if its underlying tables have changed since creation (e.g. the table matched by `merge(...)` was dropped), making a valid persisted table impossible to load. Guard the table-function inference and the local-shard `SELECT`/`INSERT` checks with `isLoadingFromExistingMetadata`, so they run for fresh definitions (`CREATE` and user `ATTACH`) but are skipped when loading from existing metadata. The structure is still inferred unconditionally when it was omitted, because then it is the table's only source of columns. Add an integration test that creates a `Remote` table over a local `view(...)` target, drops the table the view reads from, restarts the server, and checks that the `Remote` table still attaches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Every integration test directory must contain an `__init__.py`; its absence made the `Style check` fail with: ./tests/integration/test_remote_storage_engine_attach/__init__.py should exist for every integration test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A backup `RESTORE` of a `Remote`/`RemoteSecure` table that resolves to a local shard must run the same local-shard `SELECT`/`INSERT` access check that a direct `CREATE` does. Previously `args.is_restore_from_backup` was folded into the `loading_from_metadata` guard together with `isLoadingFromExistingMetadata`, so a normal `RESTORE` (which arrives with `SECONDARY_CREATE` and `is_restore_from_backup`) skipped the check. A user who could restore a table could therefore smuggle in `Remote('127.0.0.1', protected_db, protected_table, 'default')` and reach a local target they cannot access directly, even though a direct `CREATE` would be rejected. Split the guard: `loading_from_existing_metadata` (server startup only) gates the plain local-shard `SELECT`/`INSERT` check, while `skip_table_function_analysis` (startup or restore) still skips re-analyzing a table-function target whose underlying tables may be absent in the restore environment. Server startup keeps skipping both checks; a backup `RESTORE` now runs the plain check again. Addresses the AI-review blocker on the PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A persistent `Remote`/`RemoteSecure` table whose target is a table function (e.g. `numbers(...)`, `view(...)`) can be read, but cannot be written to: `remote_storage` is an empty `StorageID` whenever `remote_table_function_ptr` is set, so `DistributedSink` would build an `INSERT` into an empty table id instead of a real target. Such targets are read-only by nature. Make `StorageDistributed::write` throw `NOT_IMPLEMENTED` when `remote_table_function_ptr` is set, instead of producing a malformed query. This covers both the `remote`/`cluster` table functions and the `Remote`/`RemoteSecure` storage engines with a table-function target; no existing test inserts into such a target, so reading those targets is unaffected. Addresses the AI-review blocker on the PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov · 2026-06-20T17:55:36Z

Addressed the two remaining AI-review blockers (pushed onto the branch, no rebase): d3a8d8083b5..b909fc39534.

RESTORE local-target access check (0cafbd2bf4b): a backup RESTORE of Remote('127.0.0.1', db, table, ...) over a local shard now runs the same local-shard SELECT/INSERT check that a direct CREATE does. Previously args.is_restore_from_backup was folded into loading_from_metadata alongside isLoadingFromExistingMetadata, so a RESTORE (which arrives as SECONDARY_CREATE + is_restore_from_backup) skipped the check, letting a user who can restore reach a local target they cannot access directly even though a direct CREATE would be rejected. The guard is now split: loading_from_existing_metadata (server startup only) gates the plain SELECT/INSERT check, while skip_table_function_analysis (startup or restore) still skips re-analyzing a table-function target whose underlying tables may legitimately be absent in the restore environment — so the 04341 table-function-restore case keeps working.
INSERT into a table-function-target Remote (b909fc39534): StorageDistributed::write now throws NOT_IMPLEMENTED when remote_table_function_ptr is set, instead of letting DistributedSink build an INSERT into the empty StorageID. Reading such targets (e.g. Remote('127.0.0.1', numbers(10))) is unaffected, and no existing test inserts into a table-function target, so the read path is preserved.

Added the two requested regressions:

04401_remote_storage_engine_restore_access: a restricted user restoring Remote('127.0.0.1', protected_db, protected_target, 'default') with explicit columns and no SELECT/INSERT on the local target is rejected like CREATE, then succeeds once granted.
04402_remote_storage_engine_insert_table_function: reading a table-function-target Remote works; INSERT is rejected with NOT_IMPLEMENTED.

Built and verified locally: 04308, 04318, 04341, 04342, and the new 04401/04402 all pass.

…t queue A persistent `Remote`/`RemoteSecure` table builds an ad-hoc cluster stored in `owned_cluster`, exactly like the `remote()`/`cluster()` table functions. But `StorageDistributed::write` forced synchronous insertion whenever `owned_cluster` was set, so a persistent `Remote` table could never use the asynchronous `Distributed` queue even with `distributed_foreground_insert = 0`. As a result `fsync_after_insert`, `background_insert_*` and the validated `bytes_to_delay_insert` / `bytes_to_throw_insert` surface were silently ineffective for it. `owned_cluster` was only ever a proxy for "this is a `remote()` table function", which has no persistent storage to spool to. That condition is already captured precisely by `relative_data_path.empty()`: every table-function storage is temporary and has an empty data path, while a persistent `Remote` engine has a real data path. Key the forced-sync fallback on `relative_data_path.empty()` instead of `owned_cluster`, so the persistent engine inserts asynchronously like a regular `Distributed` table, while the table functions keep inserting synchronously (they still match `relative_data_path.empty()`). The asynchronous queue works for an owned cluster: `parseAddresses` resolves both the full and compact directory-name formats against `getCluster()` (which returns `owned_cluster`), and `createPool` honors `address.secure`, so `RemoteSecure` inserts stay secure. Addresses the AI-review blocker on the PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolve the conflict in `src/TableFunctions/TableFunctionRemote.cpp`: this branch moved the argument parsing out of `TableFunctionRemote::parseArguments` into the shared `parseRemoteFunctionArguments`, while master's #b09d93f9920 ("Fix null AST in remote() named-collection database override") changed the now-extracted code. Keep this branch's call site and port master's fix into `parseRemoteFunctionArguments`, so the `database =`/`db =` named-collection branch treats a non-table-function argument as a database name instead of assigning a null `remote_table_function_ptr`. Without the port the merge would have silently reverted that master bug fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-24T16:52:16Z

@groeneai, investigate and fix

Received signal 4 (signal sent by pid 692190695 from user 21854) (STID: 1781-3414)

This is surprising to me, didn't see it before.

groeneai · 2026-06-24T17:44:56Z

@alexey-milovidov signal 4 is SIGILL from the harmful.c trap on getservbyname (a non-reentrant libc function, replaced by __builtin_trap in debug/sanitizer builds). It is not in the Remote/RemoteSecure path.

Full stack (Stress test (amd_asan_ubsan), STID 1781-3414):

3.  getservbyname                              base/harmful/harmful.c:85
4.  Poco::Net::SocketAddress::resolveService   SocketAddress.cpp:394
6.  Poco::Net::SocketAddress::SocketAddress    SocketAddress.cpp:132
9.  DB::ClientInfo::read                        src/Interpreters/ClientInfo.cpp:238
10. DB::TCPHandler::processUnexpectedQuery      src/Server/TCPHandler.cpp:2560
11. DB::TCPHandler::receivePacketsExpectData    src/Server/TCPHandler.cpp:1230
12. DB::TCPHandler::skipData                     src/Server/TCPHandler.cpp:1301

Root cause: ClientInfo::read builds Poco::Net::SocketAddress(initial_address_string) directly from the wire string (ClientInfo.cpp:238). When the port part is non-numeric (or > 65535), Poco's resolveService falls back to getservbyname, which is trapped. initial_address is always serialized as numeric host:port via SocketAddress::toString(), so valid data never reaches this branch; a desynced or corrupted native-protocol stream does (here processUnexpectedQuery skipping an unexpected packet under stress + ast_fuzzer_any_query + fault injection). Since the trap is not a catchable exception, untrusted wire input is a hard server crash.

This is pre-existing and unrelated to this PR: it touches none of ClientInfo/SocketAddress/TCPHandler, the line dates to Feb 2025, and the same STID fires on the unrelated #97302.

Fix: in ClientInfo::read, validate the port is numeric before constructing the address and throw a catchable error otherwise, instead of letting Poco call getservbyname. I will open a separate fix PR referencing this one.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=ef34fc53dcda266cca006edea8332247405ff46c&name_0=PR&name_1=Stress%20test%20%28amd_asan_ubsan%29&name_1=Stress%20test%20%28amd_asan_ubsan%29

groeneai · 2026-06-24T18:50:34Z

Fix PR: #108410 (validates the wire address in ClientInfo::read so a non-numeric port can no longer reach the trapped getservbyname). Pre-existing on master, not caused by this PR.

# Conflicts: # ci/defs/darwin.skip

alexey-milovidov · 2026-06-30T01:16:55Z

Merged current master (the branch was 260 commits behind) and pushed onto the branch with no rebase: 1c70303ea57..cc55263b2b2.

The only conflict was in ci/defs/darwin.skip. master removed the whole block of distributed/remote shard/distributed-tagged tests from the macOS skip list — Raúl's isLocalAddress fix (f627c8478cc, keep 127.0.0.{2..255} non-local by value even when aliased to lo0) plus the macOS 127.0.0.{2..16} lo0 aliases that make those addresses reachable, batch-pruned in darwin-skip-prune-batch8. That block included 04327_rewrite_aggregate_function_with_if_distributed, which uses the identical remote('127.0.0.{1,2}', system.one) pattern as 04308's shard case (the one that timed out connecting to 127.0.0.2:9000 on arm_darwin before). I resolved the conflict by also dropping the 04308_remote_storage_engine entry, so it now runs on macOS like its siblings instead of staying skipped. The net diff against master is otherwise unchanged — exactly the engine's files.
Build-verified the merge: recompiled StorageDistributed.cpp, Storages/Distributed/parseRemoteFunctionArguments.cpp, registerStorages.cpp, TableFunctionRemote.cpp, and Server/ArrowFlight/commandSelector.cpp against current master — ninja exit 0, no API drift over the 260 commits.

The only remaining CI red on the previous commit was the chronic, master-wide stress hung-check (Stress test (amd_tsan): Cannot drop databases after stress tests. Probably server consumed too much memory and cannot execute simple queries → Server is not responding), tracked by #107941 — not on the Remote/RemoteSecure path. The fresh master merge re-runs it.

The AI-review Block is unchanged on purpose: it is the RESTORE-time table-function re-analysis skip in StorageDistributed.cpp, which you deliberately kept (documented in-code and in 0cafbd2bf4b) as a restorability-vs-strictness trade-off — the plain-table local-shard SELECT/INSERT check still runs on restore.

alexey-milovidov · 2026-06-30T15:21:16Z

CI re-ran on the same commit (cc55263b2b2). The stress hung-check I mentioned above (#107941) is no longer red; the only remaining failure is now a different, unrelated master-wide flake:

Stress test (arm_ubsan) → Logical error: Bad cast from type A to B (STID: 4054-600e), tracked by Logical error: Bad cast from type A to B (STID: 4054-600e) #107598 (open since 2026-06-16, first seen on Add cancellation check to ObjectStorage read operation #103016). Over the last 30 days it has failed on ~25 unrelated PRs and on master itself; its stack is entirely in MergeTreeIndexConditionMinMax / KeyCondition / RPNBuilder, none of which this PR touches.

So the PR is still green apart from an unrelated flake. The sole substantive blocker remains the AI-review Block — the deliberate RESTORE-time table-function re-analysis skip you documented in 0cafbd2bf4b (restorability-vs-strictness trade-off; the plain-table local-shard SELECT/INSERT check still runs on restore, covered by 04401_remote_storage_engine_restore_access.sh). No code change pushed for it — that trade-off is your call.

alexey-milovidov · 2026-07-01T07:40:44Z

CI has fully settled on the current commit (cc55263b2b2): the Stress test (arm_ubsan) flake #107598 I noted on 06-30 has re-run green, so the PR is now 100% green — 156 checks SUCCESS / 18 SKIPPED / 0 failures, with both required gates (Mergeable Check, CH Inc sync) passing. MERGEABLE/CLEAN. No re-merge is warranted: merge-base is <1 day old and master has touched none of this PR's files.

The sole remaining item is the AI-review ❌ Block (and the one open clickhouse-gh inline thread at StorageDistributed.cpp:2552), both about the RESTORE-time table-function re-analysis skip. I re-verified the current code against your 0cafbd2bf4b documentation: skip_table_function_analysis = loading_from_existing_metadata || is_restore_from_backup skips only the table-function re-analysis on restore, while the plain-table local-shard SELECT/INSERT check is gated on !loading_from_existing_metadata and still runs on RESTORE (covered by 04401_remote_storage_engine_restore_access.sh). That matches the restorability-vs-strictness trade-off you documented, so I left it as your design call and pushed no code change.

Ready to merge modulo that decision.

…ESTORE A backup `RESTORE` of a `Remote`/`RemoteSecure` table with a table-function target that resolves to a local shard (e.g. `Remote('127.0.0.1', merge(db, '^protected$'), 'default')`) previously skipped the table-function analysis unconditionally, because `args.is_restore_from_backup` was folded into `skip_table_function_analysis`. A restricted user who may restore could therefore restore such a table with explicit columns and reach a local target they cannot access directly, even though a direct `CREATE` runs the analysis under the user's context (via `getStructureOfRemoteTable` -> `getActualTableStructureWithAccess`) and would be rejected. The plain-target `SELECT`/`INSERT` check was already enforced on restore; this closes the same hole for table-function targets. Run the analysis on restore as well, but fail closed only on an access-control error (`ACCESS_DENIED`): the target's underlying tables may legitimately be absent in the restore environment (e.g. the table matched by `merge(...)` was dropped since the backup was taken), and a valid persisted table must still be restorable in that case (`04341_remote_storage_engine_restore`). An access failure is the exact case a direct `CREATE` rejects and the only one that could leak a local target; any other failure means the target cannot be analyzed (and therefore cannot be read either), so the restore proceeds with the columns carried in the backup metadata. Add `04489_remote_storage_engine_restore_tf_access` covering both: a restricted user's restore of `Remote(..., merge('^protected$'), ...)` is rejected without target access and allowed with it, while a restore over a genuinely absent target still succeeds. Addresses the AI-review blocker on the PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-07-02T06:16:52Z

Addressed the remaining AI-review blocker (pushed onto the branch, no rebase): cc55263b2b2..2b912bb2a79.

RESTORE access check for table-function targets (StorageDistributed.cpp): a backup RESTORE of a Remote/RemoteSecure table whose table-function target resolves to a local shard now runs the same analysis a direct CREATE does, closing the last gap the plain-target fix (0cafbd2bf4b) had left open. Previously args.is_restore_from_backup was folded into skip_table_function_analysis, so a restricted user who may restore could smuggle in Remote('127.0.0.1', merge(db, '^protected$'), 'default') (or view(SELECT ... FROM secret)) with explicit columns and reach a local target they cannot access directly.

The analysis now runs on restore too, failing closed only on ACCESS_DENIED — the exact case a direct CREATE rejects and the only one that could leak a local target. Any other failure is tolerated, because the target's underlying tables may legitimately be absent in the restore environment (e.g. the table matched by merge(...) was dropped since the backup was taken) and a valid persisted table must remain restorable; in that case there is nothing to leak. This keeps 04341_remote_storage_engine_restore green.

Regression added: 04489_remote_storage_engine_restore_tf_access — a restricted user's restore of Remote(..., merge('^protected$'), ...) is rejected without target access, allowed with it, and a restore over a genuinely absent target still succeeds.

Verified locally against a Release build (cc55263+ this commit): 04489, 04341, 04318, 04401, 04308 all pass (5/5).

The inline AI-review thread on StorageDistributed.cpp is replied to and resolved.

alexey-milovidov · 2026-07-02T06:20:23Z

To be explicit about the earlier "that trade-off is your design call" framing: this commit is offered as a concrete proposal, not an override — if you prefer to keep the previous unconditional skip on restore, 2b912bb2a79 reverts cleanly on its own (it is a separate commit, no rebase/amend).

The key point is that it does not reintroduce the un-restorable-valid-table regression that motivated the skip: the restore still tolerates a genuinely absent table-function target (any non-ACCESS_DENIED analysis failure is swallowed), so 04341_remote_storage_engine_restore stays green. It only fails closed when the analysis reports ACCESS_DENIED — the exact case a direct CREATE rejects. So it satisfies both the restorability constraint you documented and the access-control invariant the AI review is asking for.

clickhouse-gh · 2026-07-02T06:36:13Z

+            /// When the structure was carried in the definition, the analysis runs purely for its access
+            /// side effect, so on restore a non-access failure (absent target) may be tolerated. When the
+            /// structure was omitted, the analysis is the only source of columns and must always succeed.
+            const bool tolerate_absent_target = !columns.empty() && args.is_restore_from_backup;


When this RESTORE path hits any local table-function-target failure other than ACCESS_DENIED, it silently trusts the persisted columns and keeps the definition. That is not fail-closed for a fresh user-driven restore: if Remote('127.0.0.1', merge(db, '^protected$'), 'default') is restored while db.protected is temporarily absent, the restoring user never proves access to that local target, yet the stored engine credentials will start reading it as soon as a matching table is recreated later. Please make the default RESTORE path reject local table-function targets whose access cannot be established, or gate this bypass behind an explicitly trusted/internal restore mode instead of swallowing every non-access exception here.

groeneai · 2026-07-02T07:01:06Z

Recommendation: keep 2b912bb2a79. Revert is strictly worse (it reopens the present-target leak this commit closes), and the residual the bot describes is not fully closable for a table-function target without breaking restorability (04341).

The two branches fail closed differently, and that is the crux:

Plain-table branch (checkAccess(SELECT/INSERT, remote_table_id)) is grant-based, so it fails closed even for an absent target: a grant exists independently of whether the table currently exists. It already runs unconditionally on restore.
Table-function branch fails closed only when it can enumerate what the function resolves to. getStructureOfRemoteTable -> getActualTableStructureWithAccess checks access on the tables merge('^protected$') matches. When those tables are absent at restore, there is nothing to enumerate and no grant set to check. You cannot prove or deny access to a target you cannot resolve.

That is why 04341 (valid table, target legitimately gone, must restore) and the bot's "absent-now, present-later" vector are indistinguishable at restore time from the definition alone: the only discriminator is a future event plus the restoring user's trust level, neither knowable when the analysis runs.

On the bot's vector: it is real but lower severity than the one you closed. It needs (1) the restoring user to already hold a backup with privileged engine credentials baked in, which only someone who already had access to the target could have produced and handed over, (2) the target absent at restore, and (3) the target reappearing later. That is the same trust surface as "Remote persists credentials and reads under them," not a new escalation primitive. The commit closes the immediate, no-future-event case (present target, ACCESS_DENIED fatal), which is the higher-value fix.

If you want the clickhouse-gh thread cleared rather than documented as an accepted structural limit, the only principled closure is the bot's second option: gate the non-ACCESS_DENIED swallow behind a trusted/internal restore path, so a user-driven restore fails closed on an unresolved local table-function target while a system/admin restore keeps 04341's tolerance (04341 restores as the default user, so it stays green; 04489's absent-target case would move to restricted-user-rejected). The bot's first option (reject all unresolved targets on the default path) would break 04341. That gate is a separable change; I would not fold it into this PR unless you want it here, since the current commit is a strict improvement on its own.

Net: keep it. Either document the table-function residual as a known restore limitation, or follow up with the trusted-restore gate if the block must clear.

clickhouse-gh · 2026-07-02T09:57:16Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.50%	85.50%	+0.00%
Functions	92.60%	92.60%	+0.00%
Branches	77.70%	77.70%	+0.00%

Changed lines: Changed C/C++ lines covered: 292/297 (98.32%) · Uncovered code

Full report · Diff report

alexey-milovidov · 2026-07-02T11:51:35Z

The sole remaining CI red on 2b912bb2a79 is a stateless-test failure unrelated to this PR:

Stateless tests (amd_debug, distributed plan, s3 storage, parallel) → 03040_dynamic_type_alters_2_compact_merge_tree ("result differs with reference": Dynamic subcolumns read back as \N instead of their values).

This is a chronic, master-wide flake, not on the Remote/RemoteSecure path. Over the last 30 days it has failed 9 times — once on master itself (pull_request_number = 0, 2026-07-01) and on 7 unrelated PRs (#107934, #101841, #106734, #102402, #107300, #103158). This PR touches only the Remote/Distributed argument parsing and access-check code — none of the Dynamic-type serialization or compact-part merge logic the test exercises.

Its CI auto-tracking issue #105306 was closed on 2026-05-19, but the test has since regressed (9 failures in the last 30 days), so the earlier fix no longer holds.

@groeneai, please investigate 03040_dynamic_type_alters_2_compact_merge_tree (reopen or supersede #105306) and provide a fix in a separate PR; if a fix is already in progress, link it here.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=2b912bb2a792c787bbbd776bd7252686d47ee875&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20distributed%20plan%2C%20s3%20storage%2C%20parallel%29

groeneai · 2026-07-02T14:58:01Z

@alexey-milovidov thanks, agreed this is unrelated to the Remote/RemoteSecure work here.

I investigated 03040_dynamic_type_alters_2_compact_merge_tree. It is the same class as #80648 (data loss on ALTER RENAME COLUMN racing a merge). A fix is already in progress: #104822 (open, mergeable). Confirmed root cause by local reproduction: a background merge (OPTIMIZE ... FINAL) selected concurrently with the async rename observes the new column name in metadata but no matching rename AlterConversions, reads old parts by the new name, finds no file, and writes the merged part without the Dynamic column. Reads then return all \N (the exact 15 None from the report).

Reproducer (compact part, Dynamic column with no default, 5 parts, RENAME COLUMN d TO d1 SETTINGS mutations_sync=0 concurrent with OPTIMIZE TABLE t FINAL) fails on ~8% of iterations on master.

Important finding: #104822 in its current form does not fully close the Dynamic case. With #104822 applied the failure rate drops ~8x (from ~8% to ~1%, 4/400 iterations) but is still nonzero. #104822 makes the alter() side atomic (metadata publish + mutation registration under one lock), which fixes the primary window. The residual is on the merge side: the merge captures its metadata_snapshot at selection time (StorageMergeTree::merge) and its mutations_snapshot later in MergeTask prepare, so the two can still straddle the rename. Observed the inconsistent pair directly: storage_columns = {x, y, d} (metadata still shows the old name d) while the per-part AlterConversions maps d -> d1 (rename already in the mutations snapshot). d1 is absent from the merge output schema, so the Dynamic column is dropped from the merged part.

Fix direction: capture the merge's metadata and mutations snapshots as a consistent pair (e.g. under currently_processing_in_background_mutex at selection, threaded into the MergeTask), so the merge cannot combine old-name metadata with a rename conversion. I will extend #104822 with this and re-verify the reproducer at 0 failures. Will link the update here.

groeneai · 2026-07-02T18:33:44Z

Fix pushed. 03040_dynamic_type_alters_2_compact_merge_tree is the ALTER RENAME COLUMN vs merge race (#80648); the residual for the Dynamic (no-default) case is now closed in #104822 (commit 1057c91), which I extended rather than opening a separate PR since it is the same race and that PR is open.

Root cause of the residual: a merge applies the pending rename to its source parts on-fly, but the merged part kept the source mutation version, so the rename mutation ran again on the already-renamed compact part, re-read the column by its old name (gone), and wrote it empty. The fix stamps a merged part with the metadata-mutation version it applies on-fly, so the rename is not re-scheduled. 0 data loss over 2366 iterations of the reproducer (was ~8% on master); details on #104822.

groeneai · 2026-07-03T01:25:52Z

clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 31, 2026

clickhouse-gh Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

alexey-milovidov and others added 3 commits June 1, 2026 09:01

Merge remote-tracking branch 'origin/master' into remote-storage-engine

1516512

clickhouse-gh Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

clickhouse-gh Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

clickhouse-gh Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread src/Storages/Distributed/parseRemoteFunctionArguments.cpp Outdated

alexey-milovidov and others added 5 commits June 3, 2026 00:39

Merge remote-tracking branch 'origin/master' into remote-storage-engine

4e9f5a0

clickhouse-gh Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04308_remote_storage_engine.sql

Merge branch 'master' into remote-storage-engine

41f364b

clickhouse-gh Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp Outdated

alexey-milovidov and others added 2 commits June 5, 2026 10:15

Merge remote-tracking branch 'origin/master' into remote-storage-engine

93ca5f6

clickhouse-gh Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp Outdated

Merge remote-tracking branch 'origin/master' into remote-storage-engine

dbc0fc2

clickhouse-gh Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp Outdated

alexey-milovidov and others added 2 commits June 16, 2026 08:31

clickhouse-gh Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

clickhouse-gh Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp Outdated

alexey-milovidov mentioned this pull request Jun 20, 2026

Use text index for LIKE/ILIKE with ESCAPE #105848

Open

1 task

alexey-milovidov and others added 2 commits June 20, 2026 17:53

clickhouse-gh Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

clickhouse-gh Bot reviewed Jun 21, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

groeneai mentioned this pull request Jun 24, 2026

Validate the wire address in ClientInfo::read to avoid a SIGILL #108410

Merged

alexey-milovidov added 2 commits June 29, 2026 16:31

Merge remote-tracking branch 'origin/master' into remote-storage-engine

1c70303

Merge remote-tracking branch 'origin/master' into remote-storage-engine

cc55263

# Conflicts: # ci/defs/darwin.skip

clickhouse-gh Bot reviewed Jul 2, 2026

View reviewed changes

groeneai mentioned this pull request Jul 2, 2026

Fix race between ALTER RENAME COLUMN and merge causing data loss (#80648) #104822

Merged

1 task

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alexey-milovidov commented May 31, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Jun 3, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 5, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 14, 2026

Uh oh!

groeneai commented Jun 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Jun 20, 2026

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Jun 24, 2026

Uh oh!

groeneai commented Jun 24, 2026

Uh oh!

groeneai commented Jun 24, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jun 30, 2026

Uh oh!

alexey-milovidov commented Jul 1, 2026

Uh oh!

alexey-milovidov commented Jul 2, 2026

Uh oh!

alexey-milovidov commented Jul 2, 2026

Uh oh!

clickhouse-gh Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

groeneai commented Jul 2, 2026

Uh oh!

clickhouse-gh Bot commented Jul 2, 2026

LLVM Coverage Report

Uh oh!

alexey-milovidov commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groeneai commented Jul 2, 2026

Uh oh!

groeneai commented Jul 2, 2026

Uh oh!

groeneai commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clickhouse-gh Bot commented May 31, 2026 •

edited

Loading

alexey-milovidov commented Jul 2, 2026 •

edited

Loading