Add integration tests for attach/detach and restart of Iceberg tables by scanhex12 · Pull Request #91583 · ClickHouse/ClickHouse · GitHub
Skip to content

Add integration tests for attach/detach and restart of Iceberg tables#91583

Merged
alexey-milovidov merged 8 commits into
ClickHouse:masterfrom
scanhex12:fix_attach
Jun 13, 2026
Merged

Add integration tests for attach/detach and restart of Iceberg tables#91583
alexey-milovidov merged 8 commits into
ClickHouse:masterfrom
scanhex12:fix_attach

Conversation

@scanhex12

@scanhex12 scanhex12 commented Dec 5, 2025

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Test-only. Adds integration coverage (test_writes_detach_attach, test_writes_restart) verifying that Iceberg tables stay writable after DETACH/ATTACH and after a server restart, across s3, azure, and local.

The actual fix for the local-Iceberg writability issue already landed on master in 7ee8dd3 ("Keep IcebergLocal writable after DETACH/ATTACH or server restart"), at the configuration layer (StorageLocalConfiguration::createObjectStorage ignores is_readonly). The original registerStorageObjectStorage.cpp change in this PR became redundant for that goal and would have introduced an unrelated behavioral change for AzureBlobStorage on explicit ATTACH, so it was reverted; only the additional tests remain.

Related: #85413
Related: 7ee8dd3 (the fix that closed #85413 on master)

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.6.1.767

@clickhouse-gh

clickhouse-gh Bot commented Dec 5, 2025

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 5, 2025
@divanik divanik self-assigned this Dec 7, 2025
Comment thread src/Storages/ObjectStorage/registerStorageObjectStorage.cpp
Comment thread tests/integration/test_storage_iceberg_with_spark/test_writes.py
Comment thread tests/integration/test_storage_iceberg_with_spark/test_writes.py Outdated
alexey-milovidov and others added 2 commits March 19, 2026 10:38
Resolve conflict in `registerStorageObjectStorage.cpp`: combine the PR's
ATTACH mode check with master's added `std::nullopt` credential callback argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update comment in `registerStorageObjectStorage` to explain that both
  CREATE and ATTACH need non-readonly object storage, while server restart
  (FORCE_ATTACH) should remain readonly.
- Fix wrong table name prefix in `test_writes_detach_attach` (was
  `test_writes_complex_types_`).
- Add `test_writes_restart` that does a real server restart via
  `restart_clickhouse` and verifies inserts work after restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov self-assigned this Mar 21, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Dear @alexey-milovidov, @divanik, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

Comment thread src/Storages/ObjectStorage/registerStorageObjectStorage.cpp Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up to advance it. Summary of what I did and an important finding for you to decide on.

Done

  • Merged current master into the branch (dbab77d6b3d..23c16826505, 0 conflicts). This picks up the fix for the FunctionsStress unit-test flake.
  • Resolved the three addressed review threads from @divanik (comment wording, table name, and the real-server-restart test — test_writes_restart now calls instance.restart_clickhouse()).
  • Replied to the clickhouse-gh thread about Azure (left open for a maintainer call — see below).

Key finding: the source change is now redundant

The actual fix for #85413 already landed on master in 7ee8dd3c39c ("Keep IcebergLocal writable after DETACH/ATTACH or server restart"). It fixes the real cause at the configuration layer: StorageLocalConfiguration::createObjectStorage now ignores is_readonly and always builds a writable LocalObjectStorage (previously readonly=true on ATTACH permanently blocked writes via throwIfReadonly).

Because of that, this PR's change to the shared registerStorageObjectStorage.cpp helper (passing is_readonly=false for ATTACH) is:

  • a no-op for Local (it ignores the flag) — i.e. no effect on the stated local-Iceberg goal,
  • a no-op for S3/HDFS (they also ignore the flag),
  • the only behavioral effect is on Azure, where explicit ATTACH would now do a container existence check (and create the container only if it truly doesn't exist) — exactly the concern clickhouse-gh raised.

This also explains the Bugfix validation (integration tests) failure: all 36 tests pass on a plain master binary (the bug is already fixed), so "Failed to reproduce the bug" is structural and cannot be made green by any code change here.

The genuinely valuable part of this PR is the added integration coverage (test_writes_detach_attach, test_writes_restart) across s3/azure/local + restart.

Suggestion: consider reducing this to a test-only PR (drop the registerStorageObjectStorage.cpp change, which avoids the Azure concern, and move the Changelog category off "Bug Fix"), or close it as covered by 7ee8dd3c39c. I did not change the approach unilaterally since @antonio2368 already approved — your call.

Unrelated CI failures (not caused by this PR)

@groeneai

groeneai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov investigated. The obfuscator amd_tsan failures (00096_obfuscator_save_load, 00175_obfuscator_schema_inference) were caused by #104690 (Add UntrackedMemory asynchronous metric, merged 2026-06-02 21:01 UTC), already reverted by #106365 (merged 2026-06-03 06:28 UTC). No separate fix PR is needed; this PR just needs a master merge to pick up the revert.

CIDB evidence:

Tracking this in our chronic-flaky umbrella tasks for 00096_obfuscator_save_load and 00175_obfuscator_schema_inference so any post-reapply recurrence is caught immediately.

alexey-milovidov and others added 2 commits June 6, 2026 18:38
…hange

The local-Iceberg writability fix for ClickHouse#85413 already landed on master in
7ee8dd3 ("Keep IcebergLocal writable after DETACH/ATTACH or server
restart"), at the configuration layer: StorageLocalConfiguration::createObjectStorage
now ignores is_readonly and always builds a writable LocalObjectStorage
(StorageS3Configuration and HDFS already ignore the flag).

After that, the change here to pass is_readonly=false on explicit ATTACH
through the shared helper was redundant for the stated local-Iceberg goal,
and the only remaining consumer of the flag, StorageAzureConfiguration, would
take a behavioral change (an extra existence check / container-create path in
AzureBlobStorage::getContainerClient) on DETACH/ATTACH of an existing
Azure-backed table, with no benefit.

Reverting registerStorageObjectStorage.cpp to match master and keeping only
the additional integration coverage (test_writes_detach_attach,
test_writes_restart).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov changed the title Allow to attach-detach iceberg local tables Add integration tests for attach/detach and restart of Iceberg tables Jun 6, 2026
@clickhouse-gh clickhouse-gh Bot added pr-not-for-changelog This PR should not be mentioned in the changelog and removed pr-bugfix Pull request with bugfix, not backported by default labels Jun 6, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up via /continue-pr:

  • Merged master (the branch was ~395 commits behind and red); no conflicts.
  • Reduced the PR to test-only. The local-Iceberg writability fix for Local Iceberg table not insertable after server restart #85413 already landed on master in 7ee8dd3c39c at the configuration layer (StorageLocalConfiguration::createObjectStorage ignores is_readonly; StorageS3Configuration and HDFS already do). The registerStorageObjectStorage.cpp change here was therefore redundant for the stated goal and would have introduced the Azure-only behavioral change flagged in review (an extra existence check / container-create path in AzureBlobStorage::getContainerClient on explicit ATTACH). That file now matches master; only the new integration tests (test_writes_detach_attach, test_writes_restart) remain.
  • Updated the changelog category to Not for changelog (the label is already pr-not-for-changelog) and refreshed the title/body, since the bug is already fixed on master and this PR is now pure regression coverage.
  • Resolved the outstanding review thread accordingly.

The previously-red CI (Bugfix validation, Stress test (amd_tsan) hung-check) was against the stale pre-merge SHA; CI will re-run on 7371cc6ee36. With no source change, those checks cannot be caused by this PR.

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, investigate the failure: 04295_04278_materialized_cte_in_union_runtime_set in the Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) report and provide a fix in a separate PR. If the fix is already in progress, link it here.

This is unrelated to this PR (which is test-only, adding two Iceberg integration tests). The test is failing on master itself and across many unrelated PRs since 2026-06-06 (e.g. master runs on 2026-06-08, and PRs #106692, #84041, #106657, #100177, #104625, #106178). The failure is UNKNOWN_TABLE while executing Remote under ParallelReplicas, even though the relevant planner fix 9335a9cefe1 ("Drop all DelayedMaterializingCTEStep's in the Set plan") is already present, suggesting a remaining ParallelReplicas-specific gap or flakiness.

@groeneai

groeneai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov, fix in flight: #106717.

Stop-gap pin of enable_parallel_replicas = 0 at the top of the test. CIDB confirms 12 hits over 14 days, all on Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel); master red today on commits 38129a8480db and bf043eba3728; 11 unrelated PRs affected (incl. this one).

Mechanism: under enable_parallel_replicas = 1, the temporary materialized CTE storage _materialized_cte_ct_<hash> is registered on the initiator only; the follower replica running the Remote step raises UNKNOWN_TABLE: Unknown table expression identifier '_materialized_cte_ct_<hash>' in QueryAnalyzer::initializeQueryJoiningTreeWithTableExpressions. The planner-side fix 9335a9cefe1 (removeAllDelayedMaterializingCTEsStep) is already on master but only covers single-server execution; the parallel-replicas distributed-query path is uncovered.

The proper fix - making materialized CTEs work under parallel replicas - lives in MaterializingCTEStep plus the ParallelReplicas distributed-query construction path. That's @ novikd's MaterializedCTE area; tracked separately so this PR can unblock master immediately.

Session id: cron:clickhouse-ci-task-worker:20260608-120900

@groeneai

groeneai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov — closing the loop on this directive.

The flaky 04295_04278_materialized_cte_in_union_runtime_set is now resolved by your own revert PR #106795 (merged 2026-06-09T02:30:18Z), which removed the entire materialized-CTE feature added by #105041 — including the failing test file.

Stop-gap fix PR #106717 (pinned enable_parallel_replicas = 0) is therefore obsolete and has been closed: #106717 (comment)

If/when materialized CTEs are reintroduced, the proper parallel-replicas support (initiator-side _materialized_cte_ct_<hash> registration is currently invisible to follower replicas executing the Remote step) should be folded into the new implementation rather than re-pinning the test setting.

Session: cron:clickhouse-ci-task-worker:20260609-071900

@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master into the branch to refresh the ~3-day-old base and pick up the fix for the only red.

The sole CI failure was Tests/04295_04278_materialized_cte_in_union_runtime_set in the Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) job — unrelated to this Iceberg test-only PR. That test was broadly flaky on master under parallel replicas (19 failures on 2026-06-08 per CIDB) and was tagged no-parallel-replicas on master in b95136881bc, which this merge now brings into the branch, so it will be skipped under ParallelReplicas.

No conflicts; the PR diff is unchanged (the two new test_writes_detach_attach / test_writes_restart integration tests). No unresolved review threads; approved by @antonio2368.

@clickhouse-gh

clickhouse-gh Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.70% +0.10%
Functions 92.30% 92.30% +0.00%
Branches 77.20% 77.30% +0.10%

Changed lines: No C/C++ source files changed — skipping uncovered code analysis.

Newly covered by added/modified tests: 879 line(s), 54 function(s) across 132 file(s) · Details

Top files
  • src/Server/ACME/Client.cpp: 195 line(s), 9 function(s)
  • src/Server/ACME/API.cpp: 160 line(s), 15 function(s)
  • src/Common/OpenSSLHelpers.cpp: 52 line(s), 2 function(s)
  • src/Common/Crypto/KeyPair.cpp: 28 line(s), 5 function(s)
  • src/Server/CertificateReloader.cpp: 24 line(s), 2 function(s)

Full report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit ef03c01 Jun 13, 2026
166 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local Iceberg table not insertable after server restart

6 participants