Skip macOS-incompatible distributed tests 04327 and 04336 by groeneai · Pull Request #107376 · ClickHouse/ClickHouse · GitHub
Skip to content

Skip macOS-incompatible distributed tests 04327 and 04336#107376

Merged
alexey-milovidov merged 2 commits into
ClickHouse:masterfrom
groeneai:ci-darwin-skip-04336-parallel-blocks-marshalling
Jun 13, 2026
Merged

Skip macOS-incompatible distributed tests 04327 and 04336#107376
alexey-milovidov merged 2 commits into
ClickHouse:masterfrom
groeneai:ci-darwin-skip-04336-parallel-blocks-marshalling

Conversation

@groeneai

@groeneai groeneai commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Related: #107319
Related: #106908

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Description

Two new distributed tests fail deterministically on the Fast test (arm_darwin) (macOS) job:

Both run remote('127.0.0.{1,2}', ...), which connects to 127.0.0.2. Linux auto-routes all of 127.0.0.0/8 to loopback, but macOS only binds 127.0.0.1, so the connection to 127.0.0.2:9000 times out:

Code: 209. DB::NetException: Timeout: connect timed out: 127.0.0.2:9000

CIDB confirms the failures are macOS-only. Fast test (arm_darwin) is 100% red for both (04336: 0 OK / 8 FAIL; 04327: 0 OK / 3 FAIL across 3 unrelated PRs), while every Linux job passes 100% (including Fast test amd and the flaky check across 4 sanitizers).

The shard tag does not skip the tests on this runner because the Fast test invokes clickhouse-test --shard unconditionally (ci/jobs/fast_test.py), forcing shard mode regardless of OS. The established mechanism for macOS-incompatible tests is the explicit ci/defs/darwin.skip list, which fast_test.py loads only on Darwin. Sibling distributed tests with the identical connect-timeout signature (04277_distr_in_negative_array, 04278_distributed_filter_pushdown_table_function, 04303_optimize_skip_unused_shards_in_merge, 04151_remote_query_executor_cancel_null_connections) are already in the list; this PR adds the two new tests to it.

Version info

  • Merged into: 26.6.1.736

…on macOS

The test runs remote('127.0.0.{1,2}', ...), connecting to 127.0.0.2. Linux
auto-routes all of 127.0.0.0/8 to loopback, but macOS only binds 127.0.0.1, so
the connection to 127.0.0.2:9000 times out on the Fast test (arm_darwin) runner.

CIDB shows the failure is macOS-only (0 OK / 8 FAIL on arm_darwin, 100% pass on
every Linux job including the 200-run flaky check). The shard tag does not skip
it because Fast test forces clickhouse-test --shard. Add it to ci/defs/darwin.skip,
matching its sibling distributed tests (04277, 04278, 04303, 04151) already there.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @leshikus — adds the new test 04336_parallel_blocks_marshalling_low_cardinality_native_format (from #107319) to ci/defs/darwin.skip. It uses remote('127.0.0.{1,2}', ...) and times out on macOS (127.0.0.2 isn't loopback there); 0/8 on Fast test (arm_darwin), 100% pass on Linux. Same signature as its siblings 04277/04278/04303/04151 already in the list.

Same macOS loopback issue as 04336 in this PR. The test runs
remote('127.0.0.{1,2}', system.one), connecting to 127.0.0.2. Linux
auto-routes all of 127.0.0.0/8 to loopback, but macOS only binds 127.0.0.1, so
the connection to 127.0.0.2:9000 times out on the Fast test (arm_darwin) runner.

CIDB shows the failure is macOS-only (0 OK / 3 FAIL on arm_darwin across 3
unrelated PRs, 100% pass on every Linux job including the flaky check 50/50 on
4 sanitizers). The shard tag does not skip it because Fast test forces
clickhouse-test --shard. Add it to ci/defs/darwin.skip, next to its sibling
distributed tests. Added by ClickHouse#106908.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai changed the title Skip 04336_parallel_blocks_marshalling_low_cardinality_native_format on macOS Skip macOS-incompatible distributed tests 04327 and 04336 Jun 13, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

Scope update: this PR now also skips 04327_rewrite_aggregate_function_with_if_distributed (added in #106908), which has the identical macOS loopback root cause as 04336 and is also 100% red on Fast test (arm_darwin) (0 OK / 3 FAIL across 3 unrelated PRs, 100% green on every Linux job). One-line addition to ci/defs/darwin.skip, next to the same sibling distributed tests. cc @leshikus — both tests, same fix.

Pre-PR validation gate (04327 addition)

# Question Answer
a Deterministic repro? Yes, deterministic (not a randomization flake). 04327 runs SELECT ... FROM remote('127.0.0.{1,2}', system.one) 5x; connecting to 127.0.0.2:9000 is unbindable on macOS, so it is 100% red on Fast test (arm_darwin) (CIDB 7d: 0 OK / 3 FAIL across 3 unrelated PRs #97540/#107368/#100332). Cannot run macOS locally, but determinism is established by the 100% arm_darwin fail / 100% Linux pass split and the remote('127.0.0.2') test body.
b Root cause explained? Linux auto-routes all of 127.0.0.0/8 to loopback; macOS binds only 127.0.0.1. remote('127.0.0.{1,2}') -> connect 127.0.0.2:9000 -> Code: 209 ... connect timed out (SOCKET_TIMEOUT). The -- Tags: distributed tag does not skip it because ci/jobs/fast_test.py invokes clickhouse-test --shard unconditionally; the real macOS skip mechanism is ci/defs/darwin.skip, loaded only on Darwin.
c Fix matches root cause? Yes. Adds 04327 to ci/defs/darwin.skip, the canonical list for macOS-incompatible tests, alongside its 5 sibling distributed tests (04277, 04278, 04303, 04151, and 04336 in this PR) that share the exact same connect-timeout signature and are already listed.
d Test intent preserved / new tests added? Yes. The test still runs on every Linux job (full coverage of the optimize_rewrite_aggregate_function_with_if distributed/analyzer path). Only the macOS Fast test — which physically cannot host 127.0.0.2 — skips it. No assertion weakened, no setting pinned.
e Both directions demonstrated? Exercised fast_test.py:_load_darwin_skip_tests() against the edited file: 04327 is now in the loaded skip set (entries 1982 -> 1983); clickhouse-test --skip does exact-name match -> returns SKIP on Darwin. Both-directions evidence: the sibling tests with the identical signature flipped FAIL -> SKIPPED the moment they were added to this same list.
f Fix is general, not a narrow patch? Yes. Swept recently-failing arm_darwin tests: 04327 and 04336 were the only two new master tests with the remote('127.0.0.2') signature missing from darwin.skip; both are now covered here. (Sibling 04095_future_set_inplace_build_restore_global_in also fails this way but only in PR #102308 which adds that test — not on master yet, so out of scope.)

Session id: cron:clickhouse-worker-slot-0:20260613-061800

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 13, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [0e20968]

Summary:


AI Review

Summary

This PR adds 04327_rewrite_aggregate_function_with_if_distributed and 04336_parallel_blocks_marshalling_low_cardinality_native_format to ci/defs/darwin.skip so the Darwin Fast test job skips two stateless distributed tests that use remote('127.0.0.{1,2}', ...). The change is narrowly scoped, matches the existing Darwin skip mechanism, and I did not find any blocker or major issue requiring inline review comments.

Missing context / blind spots
  • ⚠️ The current PR CI report has no failing jobs, but most checks are skipped and I did not see a fresh Darwin Fast test execution for commit 0e209684853319df902e4d042cb3b0af34bcfac9; a full Darwin Fast test run would close this validation gap.
Final Verdict

✅ Approve from code review. No inline findings were posted.

@rienath rienath left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to bind the loopback aliases (127.0.0.1/8) at the runner level rather than remove tests from CI. This'd fix the root cause and prevent the whole class of failures in the future

@groeneai

Copy link
Copy Markdown
Contributor Author

@rienath agreed, binding 127.0.0.0/8 on the runner is the better root-cause fix and would retire the whole class (darwin.skip is already ~1982 entries, a large share of them distributed/shard tests skipped for exactly this reason).

The catch is where it lives. The macOS runners are self-hosted EC2 Dedicated Hosts (ci/praktika/infrastructure), so the alias has to be set at runner provisioning, not in ci/defs: either sudo ifconfig lo0 alias 127.0.0.{2,3} up in the runner bootstrap/AMI, or a pre-test step in ci/jobs/fast_test.py if the runner has passwordless sudo. Both are a bigger, runner-owned change than this PR.

Proposal: merge this 1-line skip now to stop the bleeding (it unblocks the red arm_darwin Mergeable Check on unrelated PRs), and track the loopback-alias change as the durable follow-up that retires these and future skips. Do you own the macOS runner config, or who does? Happy to pursue the runner-level fix if you point me at it; I just did not want to take on a runner-image change unilaterally.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit 3659e56 Jun 13, 2026
166 checks passed
@alexey-milovidov

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov self-assigned this Jun 13, 2026
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
alexey-milovidov added a commit that referenced this pull request Jun 15, 2026
`04308_remote_storage_engine` exercises the multi-shard case
`Remote('127.0.0.{1,2}', system, one)`, which connects to the second
loopback shard `127.0.0.2:9000`. On the `arm_darwin` Fast test runner,
macOS binds only `127.0.0.1` on `lo0` and does not auto-route the rest
of `127.0.0.0/8`, so `127.0.0.2` is unroutable and the query fails with
`ALL_CONNECTION_TRIES_FAILED`.

This is the same macOS shard-test infrastructure issue already handled
for the sibling tests `04327_rewrite_aggregate_function_with_if_distributed`
and `04336_parallel_blocks_marshalling_low_cardinality_native_format`,
which were added to `ci/defs/darwin.skip` in #107376. `04308` is a new
test introduced by this pull request, so it was not covered there. Add
it to `ci/defs/darwin.skip` for consistency until the runner-side fix
(#107435) aliases the additional loopback addresses and these tests can
run on macOS again.

The test itself is unchanged and keeps full coverage on Linux.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106189&sha=dbc0fc28753abffe3096e1064be57a6c440263b6&name_0=PR&name_1=Fast%20test%20%28arm_darwin%29
Related: #107376
Related: #107435

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants