Skip macOS-incompatible distributed tests 04327 and 04336#107376
Conversation
…on macOS
The test runs remote('127.0.0.{1,2}', ...), connecting to 127.0.0.2. Linux
auto-routes all of 127.0.0.0/8 to loopback, but macOS only binds 127.0.0.1, so
the connection to 127.0.0.2:9000 times out on the Fast test (arm_darwin) runner.
CIDB shows the failure is macOS-only (0 OK / 8 FAIL on arm_darwin, 100% pass on
every Linux job including the 200-run flaky check). The shard tag does not skip
it because Fast test forces clickhouse-test --shard. Add it to ci/defs/darwin.skip,
matching its sibling distributed tests (04277, 04278, 04303, 04151) already there.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @leshikus — adds the new test 04336_parallel_blocks_marshalling_low_cardinality_native_format (from #107319) to ci/defs/darwin.skip. It uses remote('127.0.0.{1,2}', ...) and times out on macOS (127.0.0.2 isn't loopback there); 0/8 on Fast test (arm_darwin), 100% pass on Linux. Same signature as its siblings 04277/04278/04303/04151 already in the list. |
Same macOS loopback issue as 04336 in this PR. The test runs
remote('127.0.0.{1,2}', system.one), connecting to 127.0.0.2. Linux
auto-routes all of 127.0.0.0/8 to loopback, but macOS only binds 127.0.0.1, so
the connection to 127.0.0.2:9000 times out on the Fast test (arm_darwin) runner.
CIDB shows the failure is macOS-only (0 OK / 3 FAIL on arm_darwin across 3
unrelated PRs, 100% pass on every Linux job including the flaky check 50/50 on
4 sanitizers). The shard tag does not skip it because Fast test forces
clickhouse-test --shard. Add it to ci/defs/darwin.skip, next to its sibling
distributed tests. Added by ClickHouse#106908.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Scope update: this PR now also skips Pre-PR validation gate (04327 addition)
Session id: cron:clickhouse-worker-slot-0:20260613-061800 |
|
Workflow [PR], commit [0e20968] Summary: ✅ AI ReviewSummaryThis PR adds Missing context / blind spots
Final Verdict✅ Approve from code review. No inline findings were posted. |
rienath
left a comment
There was a problem hiding this comment.
I think it's better to bind the loopback aliases (127.0.0.1/8) at the runner level rather than remove tests from CI. This'd fix the root cause and prevent the whole class of failures in the future
|
@rienath agreed, binding 127.0.0.0/8 on the runner is the better root-cause fix and would retire the whole class (darwin.skip is already ~1982 entries, a large share of them distributed/shard tests skipped for exactly this reason). The catch is where it lives. The macOS runners are self-hosted EC2 Dedicated Hosts (ci/praktika/infrastructure), so the alias has to be set at runner provisioning, not in ci/defs: either Proposal: merge this 1-line skip now to stop the bleeding (it unblocks the red arm_darwin Mergeable Check on unrelated PRs), and track the loopback-alias change as the durable follow-up that retires these and future skips. Do you own the macOS runner config, or who does? Happy to pursue the runner-level fix if you point me at it; I just did not want to take on a runner-image change unilaterally. |
`04308_remote_storage_engine` exercises the multi-shard case
`Remote('127.0.0.{1,2}', system, one)`, which connects to the second
loopback shard `127.0.0.2:9000`. On the `arm_darwin` Fast test runner,
macOS binds only `127.0.0.1` on `lo0` and does not auto-route the rest
of `127.0.0.0/8`, so `127.0.0.2` is unroutable and the query fails with
`ALL_CONNECTION_TRIES_FAILED`.
This is the same macOS shard-test infrastructure issue already handled
for the sibling tests `04327_rewrite_aggregate_function_with_if_distributed`
and `04336_parallel_blocks_marshalling_low_cardinality_native_format`,
which were added to `ci/defs/darwin.skip` in #107376. `04308` is a new
test introduced by this pull request, so it was not covered there. Add
it to `ci/defs/darwin.skip` for consistency until the runner-side fix
(#107435) aliases the additional loopback addresses and these tests can
run on macOS again.
The test itself is unchanged and keeps full coverage on Linux.
Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=dbc0fc28753abffe3096e1064be57a6c440263b6&name_0=PR&name_1=Fast%20test%20%28arm_darwin%29
Related: #107376
Related: #107435
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Related: #107319
Related: #106908
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Description
Two new distributed tests fail deterministically on the
Fast test (arm_darwin)(macOS) job:04336_parallel_blocks_marshalling_low_cardinality_native_format(added in Fixlow_cardinality_allow_in_native_format=0handling in parallel blocks marshalling #107319)04327_rewrite_aggregate_function_with_if_distributed(added in Fix THERE_IS_NO_COLUMN foroptimize_rewrite_aggregate_function_with_ifwith distributed queries #106908)Both run
remote('127.0.0.{1,2}', ...), which connects to127.0.0.2. Linux auto-routes all of127.0.0.0/8to loopback, but macOS only binds127.0.0.1, so the connection to127.0.0.2:9000times out:CIDB confirms the failures are macOS-only.
Fast test (arm_darwin)is 100% red for both (04336: 0 OK / 8 FAIL; 04327: 0 OK / 3 FAIL across 3 unrelated PRs), while every Linux job passes 100% (includingFast testamd and the flaky check across 4 sanitizers).The
shardtag does not skip the tests on this runner because the Fast test invokesclickhouse-test --shardunconditionally (ci/jobs/fast_test.py), forcing shard mode regardless of OS. The established mechanism for macOS-incompatible tests is the explicitci/defs/darwin.skiplist, whichfast_test.pyloads only on Darwin. Sibling distributed tests with the identical connect-timeout signature (04277_distr_in_negative_array,04278_distributed_filter_pushdown_table_function,04303_optimize_skip_unused_shards_in_merge,04151_remote_query_executor_cancel_null_connections) are already in the list; this PR adds the two new tests to it.Version info
26.6.1.736