Disable 03357_join_pk_sharding on TSan#105902
Conversation
The test creates three 1M-row MergeTree tables and runs ten multi-table `JOIN` queries with `EXPLAIN ACTIONS`. On TSan the test runs ~15-30x slower than on debug or release builds (CIDB shows TSan p95 = 120s, p99 = 178s, max = 469s) and very occasionally exceeds the 600s per-test timeout budget for `long`-tagged tests under unfavorable randomized settings, producing chronic flaky CI failures across unrelated PRs. CIDB (60 days): 5 timeout failures (all at exactly 600.06s), all five on `Stateless tests (amd_tsan, parallel, ...)` jobs. Zero timeouts on any non-TSan build over the same period. The test already excludes ASan and MSan with `no-asan, no-msan` for the same reason; TSan is the slowest sanitizer and belongs in the same list. Functional coverage of `query_plan_join_shard_by_pk_ranges` is preserved on debug, release, ARM, and ASan+UBSan (distributed plan) builds. Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103112&sha=3cfc7fde0c115e6368c6f235eddb2cd112767c94&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20parallel%2C%202%2F2%29
|
cc @vdimir — could you review this? Test-only change adding |
|
Workflow [PR], commit [f8bd40c] AI ReviewSummaryThis PR changes one line in Final VerdictStatus: ✅ Approve |
LLVM Coverage Report
Changed lines: No C/C++ source files changed — skipping uncovered code analysis. Newly covered by added/modified tests: 856 line(s), 67 function(s) across 117 file(s) · Details Top files
|
f74c9de
The 2026-05-27 master merge into this PR branch accidentally undid the
master-side changes to three files unrelated to this PR's StorageMergeTree
ALTER fix:
- src/Functions/tests/gtest_functions_stress.cpp: re-introduced the
per-iteration max_execution_time / CancellationChecker /
ProcessList wiring (originally landed in ClickHouse#105146, reverted by
ClickHouse#105163, re-landed on master). Without this, function_prop_fuzzer
iterations cannot be interrupted and the job regresses to long/stuck
behavior.
- src/Interpreters/CancellationChecker.cpp: re-introduced the
'stop_thread = false' reset at the end of workerFunction so the
singleton can be restarted in tests.
- tests/queries/0_stateless/03357_join_pk_sharding.sql: re-added the
'no-tsan' tag landed by ClickHouse#105902.
Reverting these hunks restores the master versions so this PR only ships
the intended StorageMergeTree / StorageReplicatedMergeTree areNonReplicatedAlterCommands
branch and the new 04248 regression test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Disable
03357_join_pk_shardingon TSan to stop chronic timeout flakes.The test creates three 1M-row
MergeTreetables and runs ten multi-tableJOINqueries withEXPLAIN ACTIONS. On TSan it runs ~15-30x slower than on debug or release builds, occasionally exceeding the 600s per-test timeout budget forlong-tagged tests under unfavorable randomized settings.CIDB over the last 60 days for
03357_join_pk_sharding:600.06s.Stateless tests (amd_tsan, parallel, ...)jobs (4 amd_tsan parallel + 1 amd_tsan s3).5 unrelated PRs were affected: #103112 (vector_spann), #98939 (text index), #102444 (hash join prefetch), #103602, #102001. The test already excludes
ASanandMSanwithno-asan, no-msanfor the same reason.TSanis typically the slowest sanitizer and belongs in the same list. Functional coverage ofquery_plan_join_shard_by_pk_rangesis preserved on debug, release, ARM, andASan+UBSan(distributed plan) builds.Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103112&sha=3cfc7fde0c115e6368c6f235eddb2cd112767c94&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20parallel%2C%202%2F2%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Version info
26.6.1.189