Fix object slicing in ObjectIteratorSplitByBuckets::next#100819
Fix object slicing in ObjectIteratorSplitByBuckets::next#100819divanik wants to merge 12 commits into
ObjectIteratorSplitByBuckets::next#100819Conversation
`ObjectIteratorSplitByBuckets::next` copied `ObjectInfo` by value and wrapped it in `std::make_shared<ObjectInfo>`, which sliced any subclass (e.g. `IcebergDataObjectInfo`) down to the base `ObjectInfo`. This caused Iceberg cluster table functions with `cluster_table_function_split_granularity = 'bucket'` to lose Iceberg-specific metadata (manifest entry, schema ID, delete files), leading to a failed assertion in `IcebergSource::createReader`. Fix by adding a virtual `clone` method to `ObjectInfo` and overriding it in `IcebergDataObjectInfo`. Use `clone` in `ObjectIteratorSplitByBuckets::next` instead of copy-constructing into a base-class `shared_ptr`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [0173cf5] AI ReviewSummaryThis PR fixes the reported PR Metadata
Findings
Tests
Final Verdict
|
|
The |
…ject-iterator-slicing
…-slicing # Conflicts: # src/Storages/ObjectStorage/IObjectIterator.cpp
…rg position deletes The bug fixed in this PR sliced `IcebergDataObjectInfo` down to the base `ObjectInfo` in `ObjectIteratorSplitByBuckets::next`, losing the Iceberg delete-file metadata. The new integration test creates a format-version 2 Iceberg table with merge-on-read position deletes and small parquet row groups so a data file is split into multiple buckets, then verifies that a cluster query with `cluster_table_function_split_granularity='bucket'` still applies the deletes and matches the non-split result. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Resolved the merge conflict with `master` (the new `has_cache_entry` / `filterByMatchingRowGroups` logic in `ObjectIteratorSplitByBuckets::next` is now combined with the polymorphic `clone`), fixed the changelog entry, and added a regression test. On the review feedback:
|
…-slicing # Conflicts: # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergDataObjectInfo.h # src/Storages/ObjectStorage/IObjectIterator.h
|
Merged Verified the four affected translation units compile cleanly after the merge: |
…-slicing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged The This is fixed on The merge was clean: |
…-slicing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 3/19 (15.79%) | Lost baseline coverage: none · Uncovered code |
…o fix-object-iterator-slicing

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix object slicing in bucket splitting of cluster table functions (
cluster_table_function_split_granularity = 'bucket'). Previously the per-bucket object was copy-constructed into the baseObjectInfo, slicing away subtype metadata. ForIcebergthis lost the manifest entry, schema id, and delete files (causing wrong results or an exception inIcebergSource); the same path also affects archive-backed reads andObjectStorageQueueobjects.Documentation entry for user-facing changes