Fix exception when Iceberg format version is upgraded by external tool#100407
Conversation
When external tools like Spark upgrade the Iceberg format version (e.g. from v1 to v2), the `format_version` cached in `PersistentTableComponents` becomes stale. The `chassert` in `getState` and `getHistory` would then trigger a logical error because the cached value no longer matches the metadata file. Replace the assertions with updates to the cached `format_version`, so that downstream code (manifest parsing, etc.) always uses the correct version. The field is changed from `const` to `mutable` to allow updating it from const methods. Fixes: #86776 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread safety? |
… `std::atomic` The `format_version` field can be updated when external tools upgrade the Iceberg format version, but can also be read concurrently from multiple queries. Using `mutable Int32` without synchronization is a data race (undefined behavior). Replace with `std::atomic<Int32>`. Since `std::atomic` is neither copyable nor movable, add explicit constructors for `PersistentTableComponents`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…version-upgrade # Conflicts: # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h
Adds an integration test that verifies ClickHouse can read an Iceberg table after its `format-version` has been upgraded from v1 to v2 by an external tool (simulated by directly modifying the metadata JSON). This is a regression test for #86776 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158 |
|
The flaky check failure is fixed in #102148, let's update the branch. |
When an external tool (e.g. Spark) upgrades the Iceberg format version from v1 to v2, existing manifest files are left in v1 format (per the Iceberg specification, their entries are interpreted as having `sequence_number = 0`). `ManifestFileIterator::create` was checking the table's cached `format_version`, so it rejected these legacy manifests with an `ICEBERG_SPECIFICATION_VIOLATION` exception claiming that the required `sequence_number` column was missing. Use the manifest file's own Avro metadata to determine how to parse it, matching what `AvroForIcebergDeserializer::createParsedManifestFileEntry` already does for entry-level decoding. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100407&sha=1299abf41df6d00d2744ca5a67931bf136c0aa15&name_0=PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback: `PersistentTableComponents` is shared across concurrent queries, so mutating `format_version` (even via `std::atomic`) introduced a cross-query correctness hazard — query A could observe the `format-version` written by query B while still parsing A's own metadata structures. Revert the field to `const Int32` and drop the assignments in `getState` and `getHistory`. Parsing now relies on the format version recorded in each manifest list / manifest file's own Avro metadata, which `getFormatVersionFromManifestFileMetadata` already returns: the previous commit already wired this into `ManifestFileIterator::create`, and this commit extends the same approach to `getManifestList` so that a v1 manifest list left over after a Spark v1 -> v2 upgrade is not read with v2 rules.
The previous commits moved away from mutating the cached `format_version` in `PersistentTableComponents` toward reading the format version from each manifest list / manifest file's own Avro metadata. Update the test's docstring accordingly so the intent matches the current implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion The OAuth token used to push from this environment lacks the `workflow` scope, so any commit that adds or modifies a file under `.github/workflows/` is rejected on push. Removing the master-only `optimize_clickhouse.yml` from the merge commit restores the workflow tree to what was already in the PR.
The prior commit deleted/modified workflow files in this branch as a workaround for GitHub's workflow-scope check. Restoring them to match `master` so the PR diff no longer contains unrelated changes.
The `test_format_version_upgrade_concurrent_reads` test submits infinite reader loops to a `ThreadPoolExecutor`, then sets `stop["flag"]` only after the main-thread upgrade path succeeds. If `_read_iceberg_metadata`, the `format-version` assertion, or `_write_iceberg_metadata` raised, the flag was never set, so the `ThreadPoolExecutor` shutdown on leaving the `with` block waited forever on the reader loops and the test hung until the global CI timeout instead of failing fast. Set `stop["flag"]` in a `finally` so the readers always stop. Addresses the AI Review finding reported by the `CH Inc sync` job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed The other failures in the latest sync run (clickhouse-private#53314) are not caused by this PR:
@groeneai, the remaining |
|
@alexey-milovidov, investigated. Findings below, sourced from CIDB ( Class 1. Class 2. "self-hosted runner lost communication with the server" (Integration tests db disk old analyzer 2/6): one-off GitHub Actions runner heartbeat loss. CIDB 30d sweep returns 0 hits matching that string in Class 3. AI-Review on Class 4. "meta in keeper, s3 storage" stateless flakiness: sync-side label, not a public CIDB Bottom line: none of the four are caused by PR #100407. Class 1 is a known sanitizer-only chronic flake; Class 2 and Class 4 are infra / sync-side noise (rerun); Class 3 is a pre-existing master AI-Review finding scoped to the S3 component. No separate fix PR opened by us at this time. We will continue to monitor Class 1 and act if a master regression appears. The actionable AI-Review finding addressed by your (Worker session: cron:clickhouse-ci-task-worker:20260605-030500) |
|
📊 Cloud Performance Report ✅ AI verdict: no significant changes detected. K_source=6 K_base=30 flagged=0/65 clickbench🟢 No significant changes tpch_adapted_1_official🟢 No significant changes Debug info
|
|
Merged The single CI failure ( The one open review thread (@scanhex12 on Iceberg v3) is already addressed in the discussion: v3 manifests always carry the |
|
Merged This pulls in #106664, which fixes the |
|
@groeneai, investigate the failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100407&sha=eebc7dbc7f028d3d22fa69b51b5e95aff18a1f34&name_0=PR&name_1=Integration%20tests%20%28amd_asan_ubsan%2C%20db%20disk%2C%20old%20analyzer%2C%202%2F6%29 and provide a fix in a separate PR. If the fix is already in progress, link it here.
|
|
Merged |
|
@alexey-milovidov fix is up in a separate PR: #106984 Root cause (not related to this Iceberg PR, as you noted): Fix sets Reproduced deterministically with the |
|
Merged The sole red in the last run, @groeneai, investigate the failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100407&sha=334b728170416b5b0ce4ba250bfc2a09b149d2ac&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20azure%2C%20parallel%29 — the runner-wide total-memory exhaustion on stateless ASan runners (spiking 2026-06-09/10 across dozens of PRs) and provide a fix in a separate PR. If the fix is already in progress, link it here. |
|
@alexey-milovidov Confirmed, and the root-cause fix is already merged. Cited run (SHA
CIDB corroborates the runner-wide spike on Root-cause fix already merged: #106962 (Raul Marin, "ci: tag This is the same azure runner-OOM family I analyzed earlier on #106187 and #101843. |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 55/96 (57.29%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |

When external tools like Spark upgrade the Iceberg format version (e.g. from v1 to v2), the
format_versioncached inPersistentTableComponentsbecomes stale. The previouschassertingetStateandgetHistorythen triggered a logical error because the cached value no longer matched the metadata file.Instead of mutating shared cached state from
constmethods (which would introduce a cross-query data race), this PR derives the parsing format version from each manifest list / manifest file's own Avroformat-versionmetadata. A v2 table can therefore still reference v1 manifests left behind by an external upgrade, and each one is parsed correctly.For backward compatibility with older ClickHouse releases that wrote manifests without the
format-versionAvro key, falls back to schema-based detection: thesequence_numberfield appears at the top level of v2 manifest lists / manifest entries, but is absent from their v1 counterparts. New manifests written by ClickHouse now also include theformat-versionAvro metadata key.The cached
format_versioninPersistentTableComponentsremainsconst(per-table value observed when the table was opened) and is no longer compared against the latest metadata file.Fixes #86776
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix logical error exception when reading Iceberg tables whose format version was upgraded by an external tool (e.g. Spark).
Documentation entry for user-facing changes
Note
Medium Risk
Changes Iceberg manifest list/file parsing and
remove_orphan_filesgating to stop relying on cached tableformat_version, which could affect compatibility across v1/v2 manifests if the new per-file detection is wrong.Overview
Fixes failures when an Iceberg table’s
format-versionis upgraded externally (e.g. Spark v1→v2) by stopping reliance on the cachedPersistentTableComponents::format_versionfor parsing and instead deriving the version from each manifest list / manifest file’s Avro metadata.AvroForIcebergDeserializer::getFormatVersionFromManifestFileMetadata()now falls back to schema-based detection (presence of top-levelsequence_number) when theformat-versionAvro key is missing, and writers now emitformat-versionmetadata for new manifest lists/files. Assertions that the cached version matches the latest metadata are removed,ManifestFileIterator::createno longer takes a table-wide version, andEXECUTE remove_orphan_filesre-reads the latest metadata to enforce its v2-only requirement.Adds an integration test covering external upgrades, concurrent reads during upgrade, and the no-
format-versionfallback behavior.Reviewed by Cursor Bugbot for commit 12aee57. Bugbot is set up for automated code reviews on this repo. Configure here.
Version info
26.6.1.741