add option `skip_unavailable_shards_mode` by byte-sourcerer · Pull Request #79091 · ClickHouse/ClickHouse · GitHub
Skip to content

add option skip_unavailable_shards_mode#79091

Merged
alexey-milovidov merged 55 commits into
ClickHouse:masterfrom
byte-sourcerer:cjw/skip_unavailable_shards_mode
Jul 3, 2026
Merged

add option skip_unavailable_shards_mode#79091
alexey-milovidov merged 55 commits into
ClickHouse:masterfrom
byte-sourcerer:cjw/skip_unavailable_shards_mode

Conversation

@byte-sourcerer

@byte-sourcerer byte-sourcerer commented Apr 13, 2025

Copy link
Copy Markdown
Contributor

Closes: #79014

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added the skip_unavailable_shards_mode setting (also available as a Distributed engine setting) to control which exceptions from a remote shard are silently ignored when skip_unavailable_shards is enabled. This provides finer control over query behavior in distributed environments.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Motivation

In distributed environments, shards can become unavailable due to network issues, DNS resolution failures, or other transient errors. The skip_unavailable_shards setting allows ClickHouse to silently skip such shards, and it historically also skipped a shard whose table does not exist. The new skip_unavailable_shards_mode setting makes this behavior explicit and provides more granular control over how ClickHouse handles exceptions from unavailable shards.

Parameters:

  • skip_unavailable_shards_mode
    Controls which exceptions from a remote shard are silently ignored when skip_unavailable_shards is enabled. It has no effect when skip_unavailable_shards = 0.

Possible values:

  • unavailable: Only connection-related errors are ignored (the shard cannot be reached, or a replica's hostname cannot be resolved through DNS).
  • unavailable_or_table_missing (default): In addition to unavailable, errors caused by a missing table or database on the shard are ignored. This matches the historical behavior of skip_unavailable_shards.
  • unavailable_or_exception_before_processing: In addition to unavailable, any exception received from a shard before it returned any data is ignored. An exception that arrives after the shard already returned some data is always rethrown, so partial results are never silently accepted.

The number of silently skipped shards is bounded by max_skip_unavailable_shards_num / max_skip_unavailable_shards_ratio and is reported through the DistributedShardsSkipped profile event.

Example use:

SELECT * FROM distributed_table
SETTINGS skip_unavailable_shards = 1, skip_unavailable_shards_mode = 'unavailable_or_exception_before_processing';

In this example, if any shard throws an exception before returning data, ClickHouse will ignore it and return results based on the remaining shards.

The setting can also be applied at the Distributed engine level during table creation:

CREATE TABLE distributed_table
(
    `ID` UInt32
)
ENGINE = Distributed(test_cluster, default, local_table, rand())
SETTINGS skip_unavailable_shards = 1, skip_unavailable_shards_mode = 'unavailable_or_exception_before_processing';

Version info

  • Merged into: 26.7.1.463

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 13, 2025
@clickhouse-gh

clickhouse-gh Bot commented Apr 13, 2025

Copy link
Copy Markdown
Contributor

@CLAassistant

CLAassistant commented Apr 16, 2025

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch from 42bdba9 to 13d4a16 Compare April 16, 2025 14:32
@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label Apr 16, 2025
@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch from a12437d to c6b23f3 Compare April 17, 2025 04:27
@byte-sourcerer byte-sourcerer marked this pull request as ready for review April 18, 2025 06:11
@byte-sourcerer byte-sourcerer marked this pull request as draft April 18, 2025 09:48
@byte-sourcerer byte-sourcerer marked this pull request as ready for review April 18, 2025 14:21
@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch 2 times, most recently from 05d6eb2 to dab4bbd Compare April 21, 2025 09:17
@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch from dab4bbd to 122e82d Compare April 29, 2025 07:35
@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch from ecac654 to e20a9e8 Compare May 19, 2025 13:48
@byte-sourcerer byte-sourcerer force-pushed the cjw/skip_unavailable_shards_mode branch from e20a9e8 to 3ecd33a Compare June 6, 2025 09:29
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master into the branch (4d21119b9dc..70819c9a2af), per @groeneai's diagnosis: the 6h hang in Stateless tests (amd_tsan, parallel, 2/2) was the stale branch taking the old inline-lldb teardown path and leaking orphans instead of failing fast — it was missing the harness teardown fixes already on master (#104771, #106183, #107602, kill_process_group, check_server_liveness retries). The merge was conflict-free; the feature diff is intact and the SettingsChangesHistory entry is unchanged. CI is re-running on the merged tip, so a future TSAN stall will now fail fast (or report a real Server died) rather than burning the full 6h.

…y block

The `02995_new_settings_history` Fast test failed with:

    PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp: skip_unavailable_shards_mode WAS ADDED

`master` has bumped to version `26.7.1` and the test's baseline dump is now
`02995_settings_26_6_1.tsv`. The test requires every setting that is present in
the current binary but absent from the `26.6.1` release to be recorded in
`SettingsChangesHistory.cpp` with a version greater than `26.6`. The
`skip_unavailable_shards_mode` entry was in the `26.6` block, which no longer
satisfies the check, so move it to the `26.7` block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Fixed the 02995_new_settings_history Fast test failure (commit 5be1cc3). master has bumped to 26.7.1 and the test's baseline dump is now 02995_settings_26_6_1.tsv, which requires any setting absent from the 26.6.1 release to be recorded in SettingsChangesHistory.cpp with a version greater than 26.6. The skip_unavailable_shards_mode entry was in the 26.6 block, so it was moved to the 26.7 block. No behavior change. Approved the gated CI run to validate.

@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master into the branch (5be1cc3449c792ae5a60f2) to clear the sole CI failure.

The Stateless tests (amd_tsan, parallel, 2/2) red (Server died + ThreadSanitizer: data race (STID: 4071-3348)) was not caused by this PR. The race is in std::unique_ptr<MemoryReservation>::reset along the JOIN-spilling teardown path (SpillingHashJoin / BlockIO), with no relation to skip_unavailable_shards_mode (which touches RemoteQueryExecutor/DistributedSink/executeQuery).

This is a fresh master-wide regression: STID 4071-3348 first appeared 2026-06-24 05:08 and hit 49 PRs plus master itself (pull_request_number = 0) through 2026-06-25 17:00. It is fixed on master by #108391 (fix-memory-reservation-release-race, merged 2026-06-25 23:24), which the branch's previous merge-base (4963a1b95e6, 2026-06-24 09:22) predated. The merge brings #108391 in; no master occurrences of the race after the fix landed.

The merge was clean (0 conflicts); the feature diff (16 core files) and the SettingsChangesHistory 26.7 entry are intact. No functional change in this update.

alexey-milovidov and others added 4 commits June 28, 2026 18:16
`skip_unavailable_shards_mode` was only consulted where a server `Exception`
packet is received (`RemoteQueryExecutor::shouldIgnoreShardException` on
`SELECT`, `DistributedSink` on `INSERT`). But a missing table on a `SELECT`
is detected before any query is sent: `ConnectionEstablisher` drops a replica
whose table is absent from `getTablesStatus`, `getManyChecked` is allowed to
return zero connections once `skip_unavailable_shards` is set, and
`SelectStreamFactory` skips a missing local table on `skip_unavailable_shards`
alone. All three skipped the shard without consulting the mode, so
`skip_unavailable_shards_mode = 'unavailable'` still silently dropped a shard
with a missing table, contradicting its contract (only connection failures
should be ignored; a missing table is ignored only by
`unavailable_or_table_missing` or `unavailable_or_exception_before_processing`).

Record the missing-table reason on `PoolWithFailoverBase::TryResult` at the
single detection point (`ConnectionEstablisher`), then enforce the contract at
the shard level: `ConnectionPoolWithFailover::getManyChecked` and
`HedgedConnectionsFactory` throw `UNKNOWN_TABLE` when the shard ends up with no
usable connections solely because of a missing table and the mode is
`unavailable`. Doing this at the shard level (only when the result is empty)
preserves intra-shard failover to a replica that does have the table.
`SelectStreamFactory`'s local-table skip is gated the same way and otherwise
falls through to the local stream so the query fails the usual way.

Add a `SELECT` + missing-table regression test for all three modes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e_shards_mode

# Conflicts:
#	ci/defs/darwin.skip
@alexey-milovidov

Copy link
Copy Markdown
Member

Update (continuing the PR)

Reverted the 04415 pre-query missing-table change. The previous commit (0e1c576a, "Honor skip_unavailable_shards_mode in pre-query missing-table skips") added a SELECT + missing-table regression test (04415_skip_unavailable_shards_mode_select_table_missing) that fails deterministically in CI — 35/35 and 63/63 reruns failed, with and without randomized settings, so it was the only PR-caused red:

Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Unknown table expression identifier 't_sus' ...
(query: SELECT x FROM dist_sus ORDER BY x
 SETTINGS prefer_localhost_replica = 0, skip_unavailable_shards = 1, skip_unavailable_shards_mode = 'unavailable_or_table_missing')

The test's premise does not hold for test_cluster_two_shards_different_databases: a missing table on a SELECT over that cluster is not caught by the pre-query getTablesStatus preflight — the query still reaches the remote, where the analyzer raises UNKNOWN_TABLE — and it is not skipped by the exception path either, so even the default unavailable_or_table_missing mode rethrows instead of skipping the shard (the first query in the test, not the strict-mode one). The commit's source change only altered strict unavailable-mode behavior at the connection layer, so it could not make the test pass; the failing case is exactly the pre-query missing-table SELECT gap I had tracked as a deferred follow-up item, not something settled in this PR.

Rather than ship an unverified change to the connection-pool failover path (ConnectionPoolWithFailover / HedgedConnectionsFactory, which affects every distributed query) with a red test, I reverted 0e1c576a in full (93d070259fd). The feature is back to its prior state; the strict unavailable-vs-missing-table preflight gap stays on the deferred list together with the foreground-INSERT items.

Merged master (the branch was ~570 commits behind and CONFLICTING). The only conflict was ci/defs/darwin.skip: master is actively pruning that list (a campaign to unskip distributed/remote tests fixed by isLocalAddress) and removed the trailing block the branch's 04318/04319 entries sat inside. I kept 04318/04319 skipped on darwin — they were added because they failed the arm_darwin Fast test, and a redundant skip is harmless while an unverified unskip risks a red — and honored master's removal of the neighbors. SettingsChangesHistory is unchanged: the skip_unavailable_shards_mode entry is already in the 26.7 block, which is what 02995_new_settings_history (baseline 26.6.1) requires.

Pushed 0e1c576ac19..e889ceb5fa6.

Remaining CI reds are all unrelated, known flakes:

The standing AI ❌ Block is about the foreground-INSERT semantics (connection-error parity with SELECT, skip-limit enforcement on INSERT, Distributed-engine settings propagation to DistributedSink, the internal_replication missing-table preflight, and shard-level vs replica-level skip). Those remain design questions for @byte-sourcerer and @azat, as before — left untouched.

/// Throws `TOO_MANY_UNAVAILABLE_SHARDS` if the configured `max_skip_unavailable_shards_num` /
/// `max_skip_unavailable_shards_ratio` limits are exceeded, so the safety bounds apply to every
/// silently skipped shard regardless of why it was skipped (no connections or an ignored exception).
if (unavailable_shard_tracker)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reportShardSkipped only enforces max_skip_unavailable_shards_num / max_skip_unavailable_shards_ratio when a tracker was attached, but the new exception-skip logic is used by RemoteQueryExecutor instances outside the SELECT path too. The default optimized INSERT INTO distributed_dst SELECT ... FROM distributed_src path (parallel_distributed_insert_select = 2) constructs a RemoteQueryExecutor in StorageDistributed::distributedWriteBetweenDistributedTables without calling setUnavailableShardTracker. If two destination shards return UNKNOWN_TABLE or another before-data exception with skip_unavailable_shards = 1, both can be converted to empty results and the INSERT succeeds even with max_skip_unavailable_shards_num = 1, silently dropping rows. Please attach a shard-level tracker in that path (and the analogous distributed INSERT SELECT remote-executor paths), or disable exception-based skipping there until the limits can be enforced.

@alexey-milovidov

Copy link
Copy Markdown
Member

Continue-PR check on e889ceb5

No code change or re-merge this pass — recording the current CI/review state.

Sole CI red is an unrelated, known flake (different from the one noted on the prior commit).
On e889ceb5 the only failure is Upgrade check (amd_release)"Error message in clickhouse-server.log", but the offending line is now FUNCTION_THROW_IF_VALUE_IS_NON_ZERO raised from MutatePlainMergeTreeTask:

<Error> ... DB::Exception: Value passed to 'throwIf' function is non-zero: while executing
'FUNCTION if(1, _CAST(plus(x, throwIf(1)), 'UInt64'), x) -> ...' (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO)

That is the benign <Error> noise from 04341_broken_mutation_part_log_flush (ALTER TABLE ... UPDATE x = x + throwIf(1) WHERE 1 -- { serverError UNFINISHED }), an intentionally-failing mutation. A fix is already in progress: #108856 (Allowlist throwIf upgrade-check errors from 04341_broken_mutation_part_log_flush), with #108651 class-matching the same mutation-error family. This is unrelated to skip_unavailable_shards_mode (which only touches RemoteQueryExecutor / DistributedSink / ClusterProxy::executeQuery — no mutation code). Note this is a different error than the MySQL/PostgreSQL 192.0.2.1 remote-DB noise (#108560) seen on the previous commit's Upgrade check. The remaining in-progress jobs (coverage sequential, perf comparison, SQLLogic) are routine.

AI ❌ Block re-fired on e889ceb5 (09:34Z) but adds nothing actionable. Its blockers/majors are exactly the foreground-INSERT semantics (connection-error parity with SELECT, max_skip_unavailable_shards_num/ratio enforcement on INSERT, Distributed-engine settings propagation into DistributedSink, the internal_replication missing-table preflight, shard- vs replica-level skip, the optimized distributed INSERT SELECT tracker) plus the strict unavailable pre-query missing-table SELECT gap — i.e. the design items already deferred to @byte-sourcerer / @azat in the 06:19 update, unchanged.

No re-merge: 6 commits / <1 day behind master (merge-base today 07:03Z), MERGEABLE, no conflicts; the sole red is the unrelated flake above.

@azat azat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the comments looks good, please take a look
Also I think we can squash this changes

Comment thread ci/defs/darwin.skip Outdated
Comment thread src/QueryPipeline/RemoteQueryExecutor.cpp Outdated
Comment thread src/QueryPipeline/RemoteQueryExecutor.cpp
@alexey-milovidov

Copy link
Copy Markdown
Member

Continue-PR check on e889ceb5

Recording one review-state change since the last check: @azat approved (12:53 UTC), superseding the stale CHANGES_REQUESTED ("rebase on top of upstream", 2025-11-07 — satisfied by the repeated master merges). reviewDecision is now cleared, so no outstanding human change-request remains.

Everything else is unchanged from the previous update:

  • Sole CI red is the unrelated Upgrade check (amd_release) flake. A benign <Error> from MutatePlainMergeTreeTask running 04341_broken_mutation_part_log_flush's intentional ALTER ... UPDATE x = x + throwIf(1) ... -- {serverError UNFINISHED} (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO). The PR diff is distributed/remote-only (no mutation/MergeTree code), so it cannot be the cause. Fix PRs Allowlist throwIf upgrade-check errors from 04341_broken_mutation_part_log_flush #108856 / Match benign mutation parse errors by class in the Upgrade check #108651 are still open and not yet in master, so re-merging would not clear it — no re-merge this pass (branch is 18 commits behind, but the merge-base is from today).
  • The foreground-INSERT / pre-query-SELECT mode-consistency items (the standing AI ❌ Block and the open clickhouse-gh threads) remain the deferred design questions left for @byte-sourcerer and @azat.

No code change or re-merge this pass. Blocked only on the unrelated flake clearing and the final merge.

…e_shards_mode

# Conflicts:
#	ci/defs/darwin.skip
#	src/QueryPipeline/RemoteQueryExecutor.cpp
@alexey-milovidov

Copy link
Copy Markdown
Member

Continue-PR: re-merged master (e889ceb570c1df28437)

The branch had drifted to CONFLICTING / DIRTY against master (1193 commits behind), which failed the Mergeable Check and PR meta-gates. Merged master to clear it and re-trigger CI. Two conflicts, both resolved additively — net diff is still feature-only (24 files, distributed/remote/settings + tests + doc; the reverted 04415 pre-query plumbing and table_is_missing stay out):

  • ci/defs/darwin.skip: kept this branch's 04318/04319 skips and master's new 04402_float_parsing_corner_cases.
  • src/QueryPipeline/RemoteQueryExecutor.cpp: kept this branch's UNKNOWN_TABLE/UNKNOWN_DATABASE and master's new BAD_ARGUMENTS in the ErrorCodes block (all three are used).

@azat's approval-time review comments (2026-06-30) are noted and left to the author/maintainer, as they are style/design judgment calls rather than mechanical fixes:

  • ci/defs/darwin.skip:192 — whether the 04318/04319 skips are still needed now that interface aliases exist (removing them is unverified without a darwin run, so they were kept for now).
  • RemoteQueryExecutor.cpp (exception-skip logging) — trim the explanatory comment and audit comment verbosity elsewhere.
  • RemoteQueryExecutor.cpp — whether to always throw LOGICAL_ERROR (parity with the INSERT path).

The standing clickhouse-gh ⚠️ Request changes threads (DistributedSink INSERT parity/limits/engine-settings, executeQuery.cpp:193 pre-query SELECT skip) remain the previously-adjudicated deferred design items — unchanged this pass.

| `fsync_directories` | Do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to background inserts on Distributed table (after insert, after sending the data to shard, etc.). | `false` |
| `skip_unavailable_shards` | If true, ClickHouse silently skips unavailable shards. Shard is marked as unavailable when: 1) The shard cannot be reached due to a connection failure. 2) Shard is unresolvable through DNS. 3) Table does not exist on the shard. | `false` |
| `skip_unavailable_shards` | If true, ClickHouse silently skips unavailable shards. The behavior of this setting is controlled by the `skip_unavailable_shards_mode` parameter. | `false` |
| `skip_unavailable_shards_mode` | Controls which exceptions from a remote shard are ignored when `skip_unavailable_shards` is enabled: `unavailable` ignores only connection errors; `unavailable_or_table_missing` also ignores a missing table or database; `unavailable_or_exception_before_processing` also ignores any exception received before the shard returned data. | `unavailable_or_table_missing` |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_unavailable_shards_mode is documented here as a Distributed engine setting, but the optimized destination INSERT ... SELECT paths still bypass that contract. StorageDistributed::distributedWriteBetweenDistributedTables and distributedWriteFromClusterStorage both build query_context = Context::createCopy(local_context) and never merge the destination table's DistributedSetting::skip_unavailable_shards / skip_unavailable_shards_mode, so INSERT INTO dist_dst SELECT ... ignores the table-level setting unless the query repeats it.

distributedWriteFromClusterStorage is worse: it calls getMany(..., skip_unavailable_endpoints = true) unconditionally, so an unavailable destination replica can be silently dropped even when skip_unavailable_shards = 0. That makes the new setting path-dependent depending on whether the optimizer takes the shard-local distributed INSERT ... SELECT fast path.

@alexey-milovidov

Copy link
Copy Markdown
Member

Continue-PR: the only failing CI check on the current head (70c1df28437) was AST fuzzer (amd_debug, targeted, old_compatibility)Logical error: 'Inconsistent AST formatting' (STID class 1941-*). This is the tracked flake #108372 (still open): a random fuzzer-generated ALTER … MODIFY COLUMN query, and this PR touches no parser/AST/ALTER files (24 files, all distributed/remote/settings + tests + doc), so it cannot be PR-caused. Re-ran just that job to clear the red.

CH Inc sync remains pending on the private mirror (clickhouse-private#59402) — a private-side workflow-file-prefix transition unrelated to this PR (which touches 0 .github/ files); it self-clears on the private side. Everything else is green (155 success / 18 skipped). azat has approved.

@alexey-milovidov

Copy link
Copy Markdown
Member

Continue-PR check on 70c1df28437

No code change or re-merge this pass — closing the loop on the previous update.

CI is now fully green (0 failures). The AST fuzzer (amd_debug, targeted, old_compatibility) red I re-ran (tracked flake #108372Inconsistent AST formatting on a random ALTER … MODIFY COLUMN; this PR touches no parser/AST/ALTER files, so unrelated) came back green, and CH Inc sync cleared to SUCCESS. All meta-gates (Mergeable Check, Style check, CH Inc sync) and all six AST-fuzzer variants are green — 158 success / 18 skipped. azat has approved (on e889ceb5), and the only commit since is the clean, feature-only master merge (70c1df28437). Merge-base f54adf1f5bc is from today and the branch is MERGEABLE/CLEAN, so no re-merge or re-run is needed.

The AI ❌ Changes requested verdict re-fired on 70c1df28437 (04:51Z) with the same findings as before, all in the foreground-INSERT / optimized INSERT … SELECT / pre-query missing-table SELECT scope:

  • foreground INSERT skips a single JobReplica rather than a whole shard, is not counted by UnavailableShardTracker (max_skip_unavailable_shards_num / _ratio), and does not apply the destination Distributed engine settings;
  • optimized destination INSERT … SELECT (distributedWriteBetweenDistributedTables / distributedWriteFromClusterStorage) bypasses the documented engine contract;
  • strict unavailable mode does not cover pre-query missing-table skips on SELECT (the ConnectionEstablisher / hedged-connection preflight can drop a shard before RemoteQueryExecutor sees a server Exception).

I verified the cited code: the LOGICAL_ERROR fail-close guard and the mode split at DistributedSink.cpp:117-129 are implemented correctly. The findings are the deferred design/scope decisions for @byte-sourcerer and @azat, not mechanical fixes — extending the skip semantics into the write paths risks silently diverging replicas or dropping rows if done wrong, and the one pre-query missing-table SELECT extension previously attempted (0e1c576a) had to be reverted after deterministic CI failure. azat's remaining comments (darwin.skip cleanup, comment verbosity at RemoteQueryExecutor.cpp:685, always-LOGICAL_ERROR at :1080) are style/design judgment calls left to the author.

Blocked only on those design decisions and the final merge.

alexey-milovidov and others added 3 commits July 2, 2026 07:50
`04318_skip_unavailable_shards_mode` and `04319_skip_unavailable_shards_mode_table_missing`
were added to `ci/defs/darwin.skip` because they rely on `127.0.0.2+` being reachable, which
macOS does not auto-route. Master now aliases `127.0.0.2-21` on `lo0` for the macOS Fast test
(`ci/jobs/scripts/fast_test_darwin.sh`), so these tests run there like the other distributed
tests (e.g. `04320_skip_unavailable_shards_mode_distributed_engine`, which was never skipped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review, drop the `log`-may-be-null note that just restated the `if (log)` guard, remove the
duplicated inline note about `reportShardSkipped` enforcing the skip limits (the function and its
declaration already document that), and shorten the remaining explanations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`shouldIgnoreShardException` in `unavailable_or_exception_before_processing` mode ignored any
exception that arrived before the shard returned data, including `LOGICAL_ERROR`. That denotes a
programming error rather than an expected shard failure, so silencing it would hide real bugs.
Rethrow it in every mode, matching the foreground INSERT path (`shouldSkipShardOnInsert`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 164/181 (90.61%) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow partial response

8 participants