Time zone for Iceberg partitioning by ianton-ru · Pull Request #95659 · ClickHouse/ClickHouse · GitHub
Skip to content

Time zone for Iceberg partitioning#95659

Open
ianton-ru wants to merge 21 commits into
ClickHouse:masterfrom
ianton-ru:timezone_for_partitioning
Open

Time zone for Iceberg partitioning#95659
ianton-ru wants to merge 21 commits into
ClickHouse:masterfrom
ianton-ru:timezone_for_partitioning

Conversation

@ianton-ru

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Setting iceberg_partition_timezone to set time zone by which partitioning of Iceberg tables was performed

Documentation entry for user-facing changes

When Iceberg table created with third-party tools, data are split on partitions with specific time zone, UTC in most cases.
But ClickHouse tries to make partition pruning based on server time zone, and when server time zone is not UTC, some data can be incorrectly pruned during select queries.

Example: Iceberg table is partitioned by days by column time. It has value 2025-12-31 22:00:00 UTC. and placed in partition 20251231. ClickHouse has server time zone UTC+3, in ClickHouse time zone this value is 2026-01-01 01:00:00 UTC+3. If query has condition SELECT ... WHERE time >= '2026-01-01 00:00:00', partition pruning makes decision that value can be only in partitions 20260101 and later, and prunes partition 20251231 with this value.

This PR introduces new setting iceberg_partition_timezone, which contains time zone used for partitions. This time zone can be different from server time zone, session time zone or column time zone.

Default value is empty for backward compatibility. Empty value means 'use current time zone' as before PR.

  • Documentation is written (mandatory for new features)

@ianton-ru ianton-ru marked this pull request as ready for review January 30, 2026 15:17
@scanhex12 scanhex12 self-assigned this Jan 30, 2026
@scanhex12 scanhex12 added the can be tested Allows running workflows for external contributors label Jan 30, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label Jan 30, 2026
assert "HIDDEN" in show_result


def test_partition_timezone(started_cluster):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_partition_timezone validates only the non-empty iceberg_partition_timezone path (UTC set via profile config).

This leaves the backward-compatibility path untested: default empty setting (current server/session timezone behavior). A later refactor could silently change the empty-default semantics without test coverage.

Please add a second assertion path (or a separate test) that runs the same pruning check with iceberg_partition_timezone unset/empty and verifies the legacy behavior explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_partition_default_timezone.py with the same test, but without timezone parameters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added default-path test still does not exercise the compatibility branch this thread asked for. In current code the default Iceberg test starts node1 with only configs/cluster.xml (tests/integration/test_database_iceberg/test_partition_default_timezone.py:45), so the server timezone stays UTC; the Glue default test similarly omits configs/timezone.xml (tests/integration/test_database_glue/test_partition_default_timezone.py:48). That would still pass if empty iceberg_partition_timezone were accidentally changed to mean UTC, because the server/session timezone and the expected partition timezone are identical.

To prove the empty-setting behavior, please keep iceberg_partition_timezone unset but run the server/session in Asia/Istanbul and assert the legacy server-timezone pruning result.

)


def create_clickhouse_iceberg_database(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the same test for glue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



std::optional<TransformAndArgument> parseTransformAndArgument(const String & transform_name_src)
std::optional<TransformAndArgument> parseTransformAndArgument(const String & transform_name_src, const String & time_zone)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this argument optional everywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tests/integration/test_database_iceberg/test_partition_timezone.py
Comment thread tests/integration/test_database_glue/test_partition_timezone.py Outdated
Comment thread tests/integration/test_database_glue/test_partition_default_timezone.py Outdated
@clickhouse-gh

clickhouse-gh Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Dear @scanhex12, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

auto & factory = FunctionFactory::instance();

auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name);
auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name, context->getSettingsRef()[Setting::iceberg_partition_timezone]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iceberg_partition_timezone is now used when ChunkPartitioner computes write partition keys, but the added tests only append data through PyIceberg and then read it with ClickHouse. That proves the pruning path for externally written data, not the ClickHouse write path that this line changes.

A regression here would still pass the new tests while producing wrong partition paths for rows inserted, mutated, or compacted by ClickHouse. Please add a focused write-path test: with server timezone Asia/Istanbul and iceberg_partition_timezone=UTC, insert rows through ClickHouse around a UTC day boundary into a DayTransform-partitioned Iceberg table and verify the resulting partitions/reads match UTC partitioning.

@clickhouse-gh

clickhouse-gh Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.40% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 69/90 (76.67%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants