{{ message }}
Resolve problems with paths and compatibility problems with Spark in Azure (v2)#100420
Merged
Conversation
Contributor
Contributor
LLVM Coverage ReportPR changed lines: PR changed-lines coverage: 44.54% (657/1475, 0 noise lines excluded) |
scanhex12
approved these changes
Mar 24, 2026
27 tasks
alexey-milovidov
added a commit
that referenced
this pull request
Apr 30, 2026
- Replace TODO authoring notes with user-facing summaries (or drop entries that are not user-visible). - Drop changelog item for `#101634` since the PR only adds a test. - Drop changelog item for `#100315` since the PR only adds an internal `getLastWrittenObjectPath` method on `ObjectStorage`. - Rewrite the duplicated "This PR addresses several issues..." entry for `#99163` / `#100420` as a single coherent sentence. - Fix the `chdig` link target: the text said `v26.4.3` but the URL pointed to the `v26.4.1` tag. Both versions exist; the PR title is "Bump chdig to v26.4.3", so align the URL with the version. PR: #103729
27 tasks
27 tasks
zvonand
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
May 19, 2026
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2) # Conflicts: # src/Interpreters/IcebergMetadataLog.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
May 19, 2026
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
May 20, 2026
Adapted PR ClickHouse#90740 (Read iceberg from various paths) to antalya-26.3 without applying the prerequisite upstream PR ClickHouse#100420 (IcebergPath / path_resolver refactor). The refactor is dropped; raw `String` paths are used instead. Adaptations from PR 90740 to antalya-26.3: - `IcebergPathFromMetadata` references → plain `String` (no `.serialize()`, no `IcebergPathFromMetadata::deserialize` wrapping). - `IcebergPathResolver & path_resolver` parameters → `const String & table_location`. Calls like `path_resolver.resolve(x)` become `x`. - `SecondaryStorages` infrastructure kept: thread-safe map of secondary object storages plus a `resolveObjectStorageForPath` helper that maps a metadata path to a (storage, key) pair. The IcebergPath-aware overload of `resolveObjectStorageForPath` was removed. - New protocol version `DBMS_CLUSTER_PROCESSING_PROTOCOL_VERSION_WITH_ICEBERG_ABSOLUTE_PATH = 7` used in `IcebergObjectSerializableInfo::{serializeForClusterFunctionProtocol, deserializeForClusterFunctionProtocol}` to gate the new `data_object_file_metadata_path` field and `requires_external_storage` check; `_path` for delete files goes through `SchemeAuthorityKey` on older protocols. Dropped (depend on upstream commits not on antalya-26.3): - `ExpireSnapshotsExecute.{cpp,h}`, `RemoveOrphanFilesExecute.{cpp,h}`, `SnapshotFilesTraversal.{cpp,h}` — extracted EXECUTE handlers from upstream PR introducing per-command refactor. PR 90740 only threads `secondary_storages` into these; the underlying refactor is a separate dependency. The antalya-26.3 `Iceberg::expireSnapshots` path is kept unchanged in `IcebergMetadata::executeCommand`. - `executeExpireSnapshots` / `executeRemoveOrphanFiles` dispatch in `IcebergMetadata::executeCommand` — depends on the dropped files. References: - Upstream PR: ClickHouse#90740 - antalya-26.1 backport (used as a structural reference for the no-IcebergPath adaptation): 0520e2e ("Allow to read iceberg table data from any location") Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
il9ue
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
May 26, 2026
The path resolution refactor in ClickHouse#100420/ClickHouse#100295 changed `FileNamesGenerator`'s constructor signature (5 args → 4 args, with `is_transactional` mapped to `use_uuid_in_metadata` and `config_path` removed) and removed `generateMetadataName()` in favour of `generateMetadataPathWithInfo()`. The `generateManifestList` signature also changed: first arg is now `IcebergPathResolver` instead of `FileNamesGenerator`, and manifest entry sizes is `std::vector<Int64>` instead of a scalar. Adapt the `TRUNCATE TABLE` code path (from #1655) to the new API: - Two-branch `FileNamesGenerator` construction (transactional vs. non-transactional, reading location from JSON) → single construction using `persistent_components.path_resolver.getTableLocation()` with `is_transactional` as `use_uuid_in_metadata`. - `generateMetadataName()` → `generateMetadataPathWithInfo()`, returning `GeneratedMetadataFileWithInfo { path, version, compression_method }`. - Storage paths derived via `path_resolver.resolve(path)` where raw strings were previously returned. - Catalog filename via `path_resolver.resolveForCatalog(path)`, replacing the manual `location + metadata_name` concatenation. - `generateManifestList(filename_generator, ...)` → `generateManifestList(path_resolver, ...)`. Behavior of truncate is preserved: clears data files by writing a new snapshot with empty manifests and updates the catalog pointer. The new `IcebergPathResolver` abstraction handles both transactional (full URI) and non-transactional (bare path) cases transparently. Refs: #1785, #1655
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
Jun 1, 2026
…100420 Backport of ClickHouse#100420/ClickHouse#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)
il9ue
pushed a commit
to il9ue/ClickHouse
that referenced
this pull request
Jun 1, 2026
…_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate
Version info
26.4.1.203