Query condition cache for iceberg tables#102115
Conversation
|
Workflow [PR], commit [e80e78d] Summary: ❌
AI ReviewSummaryThis PR adds query-condition-cache integration for Iceberg/object-storage reads by propagating table identity ( PR Metadata
Findings❌ Blockers
💡 Nits
Tests
ClickHouse Rules
Final Verdict
|
| format_filter_info->filter_actions_dag->dumpNames(), | ||
| unmatched_ranges, | ||
| unmatched_ranges.size(), | ||
| false |
There was a problem hiding this comment.
getMatchedBuckets is called for every format when condition_hash is set, but the default implementation in IInputFormat throws NOT_IMPLEMENTED.
This means enabling use_query_condition_cache can throw on non-Parquet object-storage formats after EOF (getMatchedBuckets is not implemented for this format).
Please guard this path by format/capability (or make default behavior non-throwing) so unsupported formats keep working.
| *format_filter_info->condition_hash, | ||
| format_filter_info->filter_actions_dag->dumpNames(), | ||
| unmatched_ranges, | ||
| unmatched_ranges.size(), |
There was a problem hiding this comment.
QueryConditionCache::write expects marks_count to be the total number of marks/row-groups in the file, but here unmatched_ranges.size() is passed instead.
This can break correctness and memory safety. Concrete trace: if only row group 7 matches, then unmatched_ranges = [{7,8}] and unmatched_ranges.size() == 1; inside write, std::fill(entry->matching_marks.begin() + 7, ... + 8, false) is applied to a vector of size 1, which is out-of-bounds.
Please pass the full row-group count (and ensure ranges are in the expected coordinate space).
| unmatched_ranges.push_back({UInt64(rg), UInt64(rg + 1)}); | ||
| } | ||
| auto query_condition_cache = Context::getGlobalContextInstance()->getQueryConditionCache(); | ||
| std::cerr << "write to cache " << storage_id.getNameForLogs() << ' ' << object_info->getFileName() << '\n'; |
There was a problem hiding this comment.
Looks like accidental debug output left in server path:
std::cerr << "write to cache " << ...Please replace with LOG_TRACE/LOG_DEBUG (or remove). Writing directly to stderr bypasses ClickHouse logging controls and can pollute service logs.
|
Now (I mean after fixes) is much better: |
|
|
||
| if (object_info->file_bucket_info) | ||
| { | ||
| auto filtered = object_info->file_bucket_info->filterByMatchingRowGroups(matching_row_groups); |
There was a problem hiding this comment.
Using object_info->getFileName() as the cache key can cause false cache hits between different objects that share the same basename (e.g. .../part=1/data.parquet and .../part=2/data.parquet).
QueryConditionCache::read/write then mixes predicates for different files under the same storage_id.uuid, and matching_row_groups.empty() can skip an unrelated file entirely. That is a correctness issue (rows can be missed).
Please key by a path-unique identifier (for example object_info->getPathOrPathToArchiveIfArchive() or another stable full object path), and use the same key consistently in both read and write paths.
|
Results on my dev machine |
| #include <Disks/DiskObjectStorage/ObjectStorages/IObjectStorage.h> | ||
| #include <Processors/ISimpleTransform.h> | ||
| #include <Storages/ObjectStorage/StorageObjectStorageConfiguration.h> | ||
| #include <Interpreters/Cache/QueryConditionCache.h> |
There was a problem hiding this comment.
This header now pulls in QueryConditionCache and FormatFilterInfo types, but this file is included transitively in many object-storage and interpreter paths.
That increases recompilation fan-out and violates the compilation-time guideline for high-fan-out headers. Please forward-declare these types in the header (e.g. class QueryConditionCache; struct FormatFilterInfo;) and keep the heavy includes in IObjectIterator.cpp.
| ObjectStoragePtr object_storage_, | ||
| const ContextPtr & context_); | ||
| const ContextPtr & context_, | ||
| const StorageID & storage_id_ = StorageID::createEmpty(), |
There was a problem hiding this comment.
ObjectIteratorSplitByBuckets now accepts storage_id and format_filter_info specifically for query-condition-cache pruning, but both parameters default to empty values.
That makes this change easy to miss at call sites: StorageObjectStorageCluster::getTaskIteratorExtension still calls this ctor with only 4 args, so cache-aware pruning is silently disabled for cluster table-function reads when cluster_table_function_split_granularity='bucket'. Because createReader skips cache lookup when file_bucket_info is already set, those tasks never get pruned later either.
Please pass storage_id and a real FormatFilterInfo here (same as non-cluster object-storage reads), or remove defaults to force all call sites to wire these parameters explicitly.
There was a problem hiding this comment.
This seems legitimate, but the downside doesn't seem risky. In the case where we have empty values, we just won't engage the cache, but it is not like we would be doing anything incorrect. Honestly, I think it is fine as-is
|
The |
|
The The bug was reverted by PR #102247 (merged at ~14:48 UTC). All nessie/lakekeeper test failures between these two merges are unrelated to this PR. |
|
The flaky check failure is fixed in #102148, let's update the branch. |
|
|
||
|
|
||
| @pytest.mark.parametrize("storage_type", ["s3", "azure", "local"]) | ||
| def test_query_condition_cache(started_cluster_iceberg_with_spark, storage_type): |
There was a problem hiding this comment.
QueryConditionCache correctness here depends on object identity. Since this PR uses file-name-based cache keys in object-storage paths, please add a regression case that creates two files in different directories with the same basename (for example .../a/data.parquet and .../b/data.parquet) and validates that cache hits for one file do not affect the other.
Without this, we can miss key-collision regressions while still seeing green cache-hit counters.
|
| , prewhere_info(std::move(prewhere_info_)) | ||
| , column_mapper(column_mapper_) | ||
| { | ||
| bool use_query_condition_cache = context_->getSettingsRef()[Setting::use_query_condition_cache]; |
There was a problem hiding this comment.
Changelog category is New Feature, but the PR template still has the documentation checkbox unchecked. This change is user-facing (use_query_condition_cache now affects Iceberg/object-storage reads), so please add docs or explicitly explain why no documentation update is needed.
LLVM Coverage Report
Changed lines: 61.25% (147/240) | lost baseline coverage: 8 line(s) · Uncovered code |
69167ef
|
Hi @scanhex12 @grantholly-clickhouse — while reviewing this PR I found the following:
Happy to discuss — close anything that's wrong or already addressed. |
|
Update on the earlier review: CI on the proof PR #104278 confirmed the suspected bug via
(Automated follow-up. The proof PR has been closed; please track the bug on the Issue.) |
|
Correction — false-positive confirmation. My earlier comment at 11:01 UTC on this PR claimed the QCC bucket-poisoning bug was confirmed by CI on proof PR #104278. That was wrong — both the previous Issue #104288 (closed as not_planned) and this confirmation were the same false-positive shape: a Python Two distinct bot bugs combined: (1) the proof test had a real schema mismatch (int32 schema vs int64 dataframe), and (2) when re-evaluating the reopened proof PR, the bot picked up the stale CI run's log instead of the latest run, locking in the same wrong classification. Dropping this finding. The bug claim about |
…terInfo Bot review on PR ClickHouse#104225 surfaced that the `isQueryConditionCacheWritable` predicate added in commits 0eb3d0e9 / 1860c81 / bfa2364 was wired into the three `MergeTree` write paths but missed a fourth, very recently added one: the object-storage QCC write in `StorageObjectStorageSource::generate` (introduced by PR ClickHouse#102115 -- "Query condition cache for iceberg tables", merged 2026-05-07). That site populates the cache from `Iceberg`/`S3`/`Azure` queries when the format reader produces matched-bucket information, keyed by `FormatFilterInfo::condition_hash` -- which until now was created based only on `use_query_condition_cache`, with no writability check. So a query against an object-storage table running with `allow_suspicious_low_cardinality_types = 1` or any non-`Production` `allow_experimental_*` setting could still poison the cache for a subsequent default-settings query, leaving issue ClickHouse#104203 only partially closed for the engine as a whole. Mirror the existing `MergeTreeReaderSettings::query_condition_cache_writable` flag on `FormatFilterInfo`. Set it from the same predicate (`Settings::isQueryConditionCacheWritable`) at the same site where `condition_hash` is computed. Gate the `query_condition_cache->write` call in `StorageObjectStorageSource::generate` on the new flag. Reads through `condition_hash` stay open -- the lookup-side path remains valid because entries written from "writable" contexts are correct, and the same context that wrote nothing can still benefit from cache hits produced by ordinary queries. Coverage: extended `tests/integration/test_storage_iceberg_with_spark/test_query_condition_cache.py` with `test_query_condition_cache_not_writable_with_relaxed_settings`, parametrized across (`s3`, `azure`, `local`) and three triggering settings (`allow_suspicious_low_cardinality_types`, `allow_experimental_funnel_functions` [`Experimental` tier], `allow_experimental_database_iceberg` [`Beta` tier]). The test asserts that `system.query_condition_cache` stays empty after a filtered query in any of those contexts. Stateless test `04208_query_condition_cache_writable_104203` continues to exercise the `MergeTree` write paths and is unaffected.
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#102115 from scanhex12/iceberg_qcc Query condition cache for iceberg tables # Conflicts: # src/Databases/DataLake/DatabaseDataLake.cpp # src/Processors/Formats/Impl/ParquetV3BlockInputFormat.cpp # src/Processors/Formats/Impl/ParquetV3BlockInputFormat.h # src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp # src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#102115 from scanhex12/iceberg_qcc Query condition cache for iceberg tables # Conflicts: # src/Databases/DataLake/DatabaseDataLake.cpp # src/Processors/Formats/Impl/ParquetV3BlockInputFormat.cpp # src/Processors/Formats/Impl/ParquetV3BlockInputFormat.h # src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp # src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Query condition cache for iceberg tables
Documentation entry for user-facing changes
Version info
26.5.1.367