Do not ignore DROP for refreshable materialized views in stress tests by alexey-milovidov · Pull Request #102008 · ClickHouse/ClickHouse · GitHub
Skip to content

Do not ignore DROP for refreshable materialized views in stress tests#102008

Merged
alexey-milovidov merged 1 commit into
masterfrom
fix-stress-test-refreshable-mv-drop
Apr 8, 2026
Merged

Do not ignore DROP for refreshable materialized views in stress tests#102008
alexey-milovidov merged 1 commit into
masterfrom
fix-stress-test-refreshable-mv-drop

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Apr 8, 2026

Copy link
Copy Markdown
Member

The ignore_drop_queries_probability setting (used in stress tests) converts DROP TABLE to TRUNCATE TABLE, but TRUNCATE does not stop the periodic refresh task of a refreshable materialized view. This causes the orphaned view to keep refreshing indefinitely, consuming background pool threads and progressively overwhelming the server — especially under TSan where each operation is 5–15x slower.

Root cause analysis of the "Hung check failed, possible deadlock found" flake in Stress test (arm_tsan):

  1. Test 03221_refreshable_matview_progress.sql creates an MV with REFRESH AFTER 10 SECOND
  2. The stress test's ignore_drop_queries_probability=0.2 converts the final DROP TABLE to TRUNCATE TABLE
  3. The view survives and keeps refreshing every 10 seconds for the rest of the stress test
  4. 48 refreshes observed, each getting progressively slower under TSan (from sub-second to 59 minutes)
  5. The server becomes completely unresponsive, failing the hung check

Fix: skip the DROP-to-TRUNCATE conversion for refreshable materialized views, since TRUNCATE doesn't stop their periodic refresh task.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99997&sha=355f90c5ba8982dc501d353f211c43350cfbc20e&name_0=PR&name_1=Stress%20test%20%28arm_tsan%29
#99997

Relates to #101383

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.4.1.676

The `ignore_drop_queries_probability` setting converts DROP TABLE to
TRUNCATE TABLE, but TRUNCATE does not stop the periodic refresh task
of a refreshable materialized view. This causes the orphaned view to
keep refreshing indefinitely, consuming background pool threads and
progressively overwhelming the server (especially under TSan).

Root cause of the "Hung check failed, possible deadlock found" flake
in Stress test (arm_tsan). Test `03221_refreshable_matview_progress.sql`
creates an MV with `REFRESH AFTER 10 SECOND`. When the stress test
converts its final `DROP TABLE` to `TRUNCATE`, the view survives and
keeps refreshing for the rest of the stress test (48 refreshes observed),
with each refresh getting slower under TSan (from sub-second to 59
minutes), until the server becomes completely unresponsive.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99997&sha=355f90c5ba8982dc501d353f211c43350cfbc20e&name_0=PR&name_1=Stress%20test%20%28arm_tsan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Apr 8, 2026

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good fix.

@clickhouse-gh

clickhouse-gh Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.90% 84.00% +0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.40% 76.40% +0.00%

Changed lines: 100.00% (9/9) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov self-assigned this Apr 8, 2026
@alexey-milovidov alexey-milovidov merged commit a77cd4c into master Apr 8, 2026
162 of 163 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-stress-test-refreshable-mv-drop branch April 8, 2026 19:13
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 8, 2026
This was referenced Apr 10, 2026
Onyx2406 added a commit to Onyx2406/ClickHouse that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants