Time zone for Iceberg partitioning#95659
Conversation
| assert "HIDDEN" in show_result | ||
|
|
||
|
|
||
| def test_partition_timezone(started_cluster): |
There was a problem hiding this comment.
test_partition_timezone validates only the non-empty iceberg_partition_timezone path (UTC set via profile config).
This leaves the backward-compatibility path untested: default empty setting (current server/session timezone behavior). A later refactor could silently change the empty-default semantics without test coverage.
Please add a second assertion path (or a separate test) that runs the same pruning check with iceberg_partition_timezone unset/empty and verifies the legacy behavior explicitly.
There was a problem hiding this comment.
Added test_partition_default_timezone.py with the same test, but without timezone parameters.
There was a problem hiding this comment.
The added default-path test still does not exercise the compatibility branch this thread asked for. In current code the default Iceberg test starts node1 with only configs/cluster.xml (tests/integration/test_database_iceberg/test_partition_default_timezone.py:45), so the server timezone stays UTC; the Glue default test similarly omits configs/timezone.xml (tests/integration/test_database_glue/test_partition_default_timezone.py:48). That would still pass if empty iceberg_partition_timezone were accidentally changed to mean UTC, because the server/session timezone and the expected partition timezone are identical.
To prove the empty-setting behavior, please keep iceberg_partition_timezone unset but run the server/session in Asia/Istanbul and assert the legacy server-timezone pruning result.
| ) | ||
|
|
||
|
|
||
| def create_clickhouse_iceberg_database( |
There was a problem hiding this comment.
Can we add the same test for glue?
|
|
||
|
|
||
| std::optional<TransformAndArgument> parseTransformAndArgument(const String & transform_name_src) | ||
| std::optional<TransformAndArgument> parseTransformAndArgument(const String & transform_name_src, const String & time_zone) |
There was a problem hiding this comment.
Let's make this argument optional everywhere
|
Dear @scanhex12, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
| auto & factory = FunctionFactory::instance(); | ||
|
|
||
| auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name); | ||
| auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name, context->getSettingsRef()[Setting::iceberg_partition_timezone]); |
There was a problem hiding this comment.
iceberg_partition_timezone is now used when ChunkPartitioner computes write partition keys, but the added tests only append data through PyIceberg and then read it with ClickHouse. That proves the pruning path for externally written data, not the ClickHouse write path that this line changes.
A regression here would still pass the new tests while producing wrong partition paths for rows inserted, mutated, or compacted by ClickHouse. Please add a focused write-path test: with server timezone Asia/Istanbul and iceberg_partition_timezone=UTC, insert rows through ClickHouse around a UTC day boundary into a DayTransform-partitioned Iceberg table and verify the resulting partitions/reads match UTC partitioning.
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 69/90 (76.67%) | Lost baseline coverage: none · Uncovered code |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Setting
iceberg_partition_timezoneto set time zone by which partitioning of Iceberg tables was performedDocumentation entry for user-facing changes
When Iceberg table created with third-party tools, data are split on partitions with specific time zone, UTC in most cases.
But ClickHouse tries to make partition pruning based on server time zone, and when server time zone is not UTC, some data can be incorrectly pruned during select queries.
Example: Iceberg table is partitioned by days by column
time. It has value2025-12-31 22:00:00 UTC. and placed in partition20251231. ClickHouse has server time zoneUTC+3, in ClickHouse time zone this value is2026-01-01 01:00:00 UTC+3. If query has conditionSELECT ... WHERE time >= '2026-01-01 00:00:00', partition pruning makes decision that value can be only in partitions20260101and later, and prunes partition20251231with this value.This PR introduces new setting
iceberg_partition_timezone, which contains time zone used for partitions. This time zone can be different from server time zone, session time zone or column time zone.Default value is empty for backward compatibility. Empty value means 'use current time zone' as before PR.