add option skip_unavailable_shards_mode#79091
Conversation
42bdba9 to
13d4a16
Compare
a12437d to
c6b23f3
Compare
05d6eb2 to
dab4bbd
Compare
dab4bbd to
122e82d
Compare
ecac654 to
e20a9e8
Compare
e20a9e8 to
3ecd33a
Compare
|
Merged |
…y block
The `02995_new_settings_history` Fast test failed with:
PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp: skip_unavailable_shards_mode WAS ADDED
`master` has bumped to version `26.7.1` and the test's baseline dump is now
`02995_settings_26_6_1.tsv`. The test requires every setting that is present in
the current binary but absent from the `26.6.1` release to be recorded in
`SettingsChangesHistory.cpp` with a version greater than `26.6`. The
`skip_unavailable_shards_mode` entry was in the `26.6` block, which no longer
satisfies the check, so move it to the `26.7` block.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Fixed the |
|
Merged The This is a fresh master-wide regression: STID The merge was clean (0 conflicts); the feature diff (16 core files) and the |
`skip_unavailable_shards_mode` was only consulted where a server `Exception` packet is received (`RemoteQueryExecutor::shouldIgnoreShardException` on `SELECT`, `DistributedSink` on `INSERT`). But a missing table on a `SELECT` is detected before any query is sent: `ConnectionEstablisher` drops a replica whose table is absent from `getTablesStatus`, `getManyChecked` is allowed to return zero connections once `skip_unavailable_shards` is set, and `SelectStreamFactory` skips a missing local table on `skip_unavailable_shards` alone. All three skipped the shard without consulting the mode, so `skip_unavailable_shards_mode = 'unavailable'` still silently dropped a shard with a missing table, contradicting its contract (only connection failures should be ignored; a missing table is ignored only by `unavailable_or_table_missing` or `unavailable_or_exception_before_processing`). Record the missing-table reason on `PoolWithFailoverBase::TryResult` at the single detection point (`ConnectionEstablisher`), then enforce the contract at the shard level: `ConnectionPoolWithFailover::getManyChecked` and `HedgedConnectionsFactory` throw `UNKNOWN_TABLE` when the shard ends up with no usable connections solely because of a missing table and the mode is `unavailable`. Doing this at the shard level (only when the result is empty) preserves intra-shard failover to a replica that does have the table. `SelectStreamFactory`'s local-table skip is gated the same way and otherwise falls through to the local stream so the query fails the usual way. Add a `SELECT` + missing-table regression test for all three modes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… skips" This reverts commit 0e1c576.
…e_shards_mode # Conflicts: # ci/defs/darwin.skip
Update (continuing the PR)Reverted the The test's premise does not hold for Rather than ship an unverified change to the connection-pool failover path ( Merged Pushed Remaining CI reds are all unrelated, known flakes:
The standing AI |
| /// Throws `TOO_MANY_UNAVAILABLE_SHARDS` if the configured `max_skip_unavailable_shards_num` / | ||
| /// `max_skip_unavailable_shards_ratio` limits are exceeded, so the safety bounds apply to every | ||
| /// silently skipped shard regardless of why it was skipped (no connections or an ignored exception). | ||
| if (unavailable_shard_tracker) |
There was a problem hiding this comment.
reportShardSkipped only enforces max_skip_unavailable_shards_num / max_skip_unavailable_shards_ratio when a tracker was attached, but the new exception-skip logic is used by RemoteQueryExecutor instances outside the SELECT path too. The default optimized INSERT INTO distributed_dst SELECT ... FROM distributed_src path (parallel_distributed_insert_select = 2) constructs a RemoteQueryExecutor in StorageDistributed::distributedWriteBetweenDistributedTables without calling setUnavailableShardTracker. If two destination shards return UNKNOWN_TABLE or another before-data exception with skip_unavailable_shards = 1, both can be converted to empty results and the INSERT succeeds even with max_skip_unavailable_shards_num = 1, silently dropping rows. Please attach a shard-level tracker in that path (and the analogous distributed INSERT SELECT remote-executor paths), or disable exception-based skipping there until the limits can be enforced.
Continue-PR check on
|
Continue-PR check on
|
…e_shards_mode # Conflicts: # ci/defs/darwin.skip # src/QueryPipeline/RemoteQueryExecutor.cpp
Continue-PR: re-merged
|
| | `fsync_directories` | Do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to background inserts on Distributed table (after insert, after sending the data to shard, etc.). | `false` | | ||
| | `skip_unavailable_shards` | If true, ClickHouse silently skips unavailable shards. Shard is marked as unavailable when: 1) The shard cannot be reached due to a connection failure. 2) Shard is unresolvable through DNS. 3) Table does not exist on the shard. | `false` | | ||
| | `skip_unavailable_shards` | If true, ClickHouse silently skips unavailable shards. The behavior of this setting is controlled by the `skip_unavailable_shards_mode` parameter. | `false` | | ||
| | `skip_unavailable_shards_mode` | Controls which exceptions from a remote shard are ignored when `skip_unavailable_shards` is enabled: `unavailable` ignores only connection errors; `unavailable_or_table_missing` also ignores a missing table or database; `unavailable_or_exception_before_processing` also ignores any exception received before the shard returned data. | `unavailable_or_table_missing` | |
There was a problem hiding this comment.
skip_unavailable_shards_mode is documented here as a Distributed engine setting, but the optimized destination INSERT ... SELECT paths still bypass that contract. StorageDistributed::distributedWriteBetweenDistributedTables and distributedWriteFromClusterStorage both build query_context = Context::createCopy(local_context) and never merge the destination table's DistributedSetting::skip_unavailable_shards / skip_unavailable_shards_mode, so INSERT INTO dist_dst SELECT ... ignores the table-level setting unless the query repeats it.
distributedWriteFromClusterStorage is worse: it calls getMany(..., skip_unavailable_endpoints = true) unconditionally, so an unavailable destination replica can be silently dropped even when skip_unavailable_shards = 0. That makes the new setting path-dependent depending on whether the optimizer takes the shard-local distributed INSERT ... SELECT fast path.
|
Continue-PR: the only failing CI check on the current head (
|
Continue-PR check on
|
`04318_skip_unavailable_shards_mode` and `04319_skip_unavailable_shards_mode_table_missing` were added to `ci/defs/darwin.skip` because they rely on `127.0.0.2+` being reachable, which macOS does not auto-route. Master now aliases `127.0.0.2-21` on `lo0` for the macOS Fast test (`ci/jobs/scripts/fast_test_darwin.sh`), so these tests run there like the other distributed tests (e.g. `04320_skip_unavailable_shards_mode_distributed_engine`, which was never skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review, drop the `log`-may-be-null note that just restated the `if (log)` guard, remove the duplicated inline note about `reportShardSkipped` enforcing the skip limits (the function and its declaration already document that), and shorten the remaining explanations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`shouldIgnoreShardException` in `unavailable_or_exception_before_processing` mode ignored any exception that arrived before the shard returned data, including `LOGICAL_ERROR`. That denotes a programming error rather than an expected shard failure, so silencing it would hide real bugs. Rethrow it in every mode, matching the foreground INSERT path (`shouldSkipShardOnInsert`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 164/181 (90.61%) · Uncovered code |

Closes: #79014
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added the
skip_unavailable_shards_modesetting (also available as aDistributedengine setting) to control which exceptions from a remote shard are silently ignored whenskip_unavailable_shardsis enabled. This provides finer control over query behavior in distributed environments.Documentation entry for user-facing changes
Motivation
In distributed environments, shards can become unavailable due to network issues, DNS resolution failures, or other transient errors. The
skip_unavailable_shardssetting allows ClickHouse to silently skip such shards, and it historically also skipped a shard whose table does not exist. The newskip_unavailable_shards_modesetting makes this behavior explicit and provides more granular control over how ClickHouse handles exceptions from unavailable shards.Parameters:
skip_unavailable_shards_modeControls which exceptions from a remote shard are silently ignored when
skip_unavailable_shardsis enabled. It has no effect whenskip_unavailable_shards = 0.Possible values:
unavailable: Only connection-related errors are ignored (the shard cannot be reached, or a replica's hostname cannot be resolved through DNS).unavailable_or_table_missing(default): In addition tounavailable, errors caused by a missing table or database on the shard are ignored. This matches the historical behavior ofskip_unavailable_shards.unavailable_or_exception_before_processing: In addition tounavailable, any exception received from a shard before it returned any data is ignored. An exception that arrives after the shard already returned some data is always rethrown, so partial results are never silently accepted.The number of silently skipped shards is bounded by
max_skip_unavailable_shards_num/max_skip_unavailable_shards_ratioand is reported through theDistributedShardsSkippedprofile event.Example use:
In this example, if any shard throws an exception before returning data, ClickHouse will ignore it and return results based on the remaining shards.
The setting can also be applied at the
Distributedengine level during table creation:Version info
26.7.1.463