iframe-proxy

zvonand · 2026-03-03T22:40:24Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Supports CREATE TABLE, CREATE TABLE … AS source, and DROP TABLE for DataLakeCatalog; DROP TABLE can request catalog-side data purge via the new data_lake_delete_data_on_drop setting.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-03-26T13:25:01Z

Workflow [PR], commit [4dfe2af]

Summary: ❌

Performance Comparison: Performance dashboard

job_name	test_name	status	info
Integration tests (amd_asan_ubsan, flaky)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb
Integration tests (amd_asan_ubsan, db disk, old analyzer, 1/6)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb
Integration tests (arm_binary, distributed plan, 2/4)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb
Integration tests (amd_tsan, 1/6)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb
Integration tests (amd_msan, 4/8)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb
Integration tests (amd_llvm_coverage, 1/8)		FAIL
	test_database_iceberg/test.py::test_create_table_as	FAIL	cidb

AI Review

Summary

This PR adds engine-less CREATE TABLE / DROP TABLE support for DataLakeCatalog, wires the new data_lake_delete_data_on_drop setting through the catalog-backed drop paths, and adjusts Iceberg metadata handling for catalog-backed writes. The REST/Glue happy path is close, but there are still a few contract-level gaps: shared-catalog mutations can still fan out under ON CLUSTER, CREATE TABLE ... AS source still drops some unsupported source-column clauses before validation, CREATE TABLE does not honor the documented virtual_hosted S3 endpoint style, and the advertised catalog surface is broader than the implementations in this PR.

Findings

❌ Blockers

[src/Interpreters/InterpreterCreateQuery.cpp:1919-1923, src/Interpreters/InterpreterAlterQuery.cpp:419-429, src/Interpreters/InterpreterDropQuery.cpp:101-108] DataLakeCatalog mutations still are not reliably single-writer under ON CLUSTER. The new checkDatabaseSupportsOnClusterDDL guard only fires when the initiator can look up the database locally, so a query issued from a node that does not have the catalog configured still gets enqueued to every worker. Each worker then mutates the same shared catalog, which reintroduces duplicate CREATE / ALTER / DROP, racy IF NOT EXISTS / IF EXISTS, and duplicate purge requests. Suggested fix: reject by database name before distributed dispatch, or re-check on the worker before executing the catalog-backed mutation.
[src/Databases/DataLake/Common.cpp:176-184] constructTableLocation builds S3 locations from uri.getPath() only, so CREATE TABLE does not honor the documented storage_uri_style = 'virtual_hosted' configuration. With an endpoint like https://warehouse-rest.minio.example.com/prefix, the bucket lives in the host, so this code either creates s3://prefix/... or throws that the endpoint has no bucket even though the setting is supported elsewhere in DataLakeCatalog. Suggested fix: derive the bucket from the host when storage_uri_style is virtual_hosted, or explicitly reject that endpoint style before advertising CREATE TABLE.
[src/Interpreters/InterpreterCreateQuery.cpp:2185-2197] CREATE TABLE ... AS source still silently weakens source-table schema clauses. properties.columns at this point still carries source-column COMMENT, CODEC, column TTL, SETTINGS, and STATISTICS, but the synthesized ASTColumnDeclaration copies only name, type, and defaults. That means DatabaseDataLake::createTable never sees those unsupported modifiers, and the CTAS succeeds while creating an Iceberg table without them. Suggested fix: reuse formatColumns(properties.columns) or otherwise preserve these fields so the existing rejection path fires for CTAS too.

⚠️ Majors

[src/Databases/DataLake/DatabaseDataLake.cpp:900, src/Databases/DataLake/ICatalog.cpp:332-355, docs/en/engines/database-engines/datalake.md:65-123] The new engine-less mutation path is enabled for every DataLakeCatalog, but only RestCatalog and GlueCatalog implement ICatalog::createTable / dropTable. hive, unity, onelake, biglake, and paimon_rest still fall back to NOT_IMPLEMENTED, while the new docs and changelog entry describe this as generic DataLakeCatalog functionality. Suggested fix: gate these statements to the catalog types that actually implement mutations, or narrow the user-facing contract to REST / Glue.

Final Verdict

Status: ❌ Block
Minimum required actions:

Make DataLakeCatalog CREATE / ALTER / DROP reject or single-writer under ON CLUSTER even when the initiator does not have the database locally.
Preserve or reject all unsupported source-column clauses in the engine-less CREATE TABLE ... AS source path.
Fix constructTableLocation for virtual_hosted S3 endpoints, or reject that configuration explicitly for CREATE TABLE.
Either gate engine-less catalog mutations to RestCatalog / GlueCatalog or narrow the docs/changelog contract to that supported surface.

alexey-milovidov · 2026-04-07T00:22:37Z

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

- Port \`IcebergPathResolver::reverseResolve\` from upstream (predates the PR upstream but was missing on this branch). - Handle the Altinity-only \`S3_TABLES\` catalog type in \`getLocationSchemeForTableCreation\` (it is always S3-backed). - Drop the \`unique_key\` check in \`DatabaseDataLake::createTable\`: \`ASTStorage\` has no \`UNIQUE KEY\` clause on this branch. - \`getInMemoryMetadataPtr\` does not take a context argument on this branch. ClickHouse#98670 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

zvonand · 2026-06-12T14:31:36Z

Findings
❌ Blockers
[src/Interpreters/InterpreterCreateQuery.cpp:2111] CREATE TABLE ... AS SELECT reaches the engine-less DataLakeCatalog branch, but the early return true skips the normal fillTableIfNeeded path. A query such as CREATE TABLE catalog.\ns.t AS SELECT 1 SETTINGS allow_database_iceberg=1can succeed while creating an empty Iceberg table instead of inserting the selected rows. Suggested fix: either rejectcreate.select && !create.is_create_emptyfor this path, or run the same insert-fill step afterdatabase->createTable succeeds.

This is not a valid concern; see #98670 (comment)

…erg-table

`getPartitionField` serialized `icebergBucket(0, id)`, `icebergBucket(-1, id)`, and `icebergTruncate(0, id)` into `bucket[0]`, `bucket[-1]`, and `truncate[0]`. The Iceberg spec requires a positive number of buckets and a positive truncate width; an invalid transform produces metadata that some catalogs (e.g. Glue) register, leaving an unreadable Iceberg table. Validate the argument before metadata generation so such a `CREATE TABLE ... PARTITION BY ...` is rejected. Addresses an AI Review finding on ClickHouse#98670 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

In `writeMetadataFileAndVersionHint`, when another writer already committed an equal-or-newer version, we remove the just-written `vN-<uuid>.metadata.json` so the version-hint resolver does not later pick this uncommitted file. Previously a failure to remove it was logged and swallowed, and the function returned `false` - the same signal a clean loss returns - so the caller retried as if the file were gone while the uncommitted metadata could still be resolved as the latest version. Let the removal exception propagate so the operation fails closed. Addresses an AI Review finding on ClickHouse#98670 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`RestCatalog::createTable` passed the table location (base/namespace/table) to `createNamespaceIfNotExists`, which stored it as the namespace's `properties.location`. That made the namespace default warehouse path point at the first table's directory, so another client creating a table in the same namespace without an explicit location could be placed under the first table's path. Strip the trailing table-name segment so the namespace location is base/namespace. Addresses an AI Review finding on ClickHouse#98670 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A catalog-side `DROP TABLE` removes remote catalog metadata and, when `data_lake_delete_data_on_drop` is set, can request deletion of the underlying data, but the only success log was `TRACE`. Raise it to `INFO` so accidental drops or catalog-side data purges leave an operationally visible audit trail. Addresses an AI Review finding on ClickHouse#98670 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`test_create_table_invalid_partition_transforms` checks that `icebergBucket(0, id)`, `icebergBucket(-1, id)`, and `icebergTruncate(0, id)` are rejected and leave no table registered in the catalog, while a valid `icebergBucket(8, id)` succeeds. `test_create_table_namespace_location` checks that the REST namespace `location` property points at the namespace base rather than the first table's directory. Tests for AI Review findings on ClickHouse#98670 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…erg-table # Conflicts: # src/Databases/DataLake/DatabaseDataLake.h # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp

Both this branch and master independently added a `virtual bool isDatalakeCatalog() const` method to `IDatabase`, at different locations, so the merge left two identical declarations and `class member cannot be redeclared`. Keep the one master placed next to `isExternal`/`isRemoteDatabase` and drop the duplicate.

The setting was recorded in the 26.6 bucket, but 26.6 is already released and does not contain it; the current development version is 26.7. Keeping it in 26.6 makes the settings-history consistency check (`02995_new_settings_history`) disagree with the released 26.6 binary.

alexey-milovidov · 2026-07-03T01:49:30Z

Pushed updates addressing the outstanding AI-review findings, plus a fresh master merge (the branch was ~2 weeks behind and master has since heavily refactored the Iceberg mutation write path).

AI-review blockers/majors (all four addressed):

Fail-closed on stale metadata cleanup (Utils.cpp, writeMetadataFileAndVersionHint): the removeObjectIfExists on the loser-cleanup path no longer swallows the exception — it propagates so the write is reported as failed instead of silently leaving an uncommitted vN-<uuid>.metadata.json that the version-hint resolver could later pick. (7b1903e)
Operational audit log for drops/purges (DatabaseDataLake::dropTable): successful catalog drops are now logged at INFO with namespace, table and the purge flag. (70970a4)
REST namespace default location (RestCatalog::createTable): the namespace properties.location is now the namespace base (<base>/<namespace>), not the first table's directory. (18b55df)
Invalid partition transforms (getPartitionField): bucket[N] / truncate[N] now reject non-positive N with BAD_ARGUMENTS before any metadata is written, so icebergBucket(0, …), icebergBucket(-1, …) and icebergTruncate(0, …) can no longer register an unreadable Iceberg table. (d9f3f2c)
Tests for the last two (invalid transforms + namespace location) added to test_database_iceberg. (a2f4232)

The changelog entry already uses the correct data_lake_delete_data_on_drop name.

Merge with master: resolved three conflicts —

Mutations.cpp: master collapsed the two-snapshot (POSITION_DELETE + DATA) write into a single writeMetadataFiles call. I re-applied this branch's fix on top of that structure: the single generateNextMetadata now receives the previous committed metadata path (via a new previous_metadata_file_path parameter threaded from mutate), instead of the newly-generated metadata_info.path, so the metadata-log entry references the actual predecessor.
IcebergMetadata.cpp: kept the data_lake_delete_data_on_drop declaration and master's new compaction-setting declarations; dropped the now-obsolete iceberg_delete_data_on_drop extern (it is only an alias after the rename).
DatabaseDataLake.h: kept both isRemoteDatabase and isDatalakeCatalog overrides.

Follow-ups from the merge: removed a duplicate IDatabase::isDatalakeCatalog declaration (both this branch and master added it, at different places → class member cannot be redeclared), and moved the new setting from the 26.6 to the 26.7 settings-history bucket (26.6 is released and does not contain it).

All 16 touched translation units verified with -fsyntax-only. No self-merge (unreviewed feature + non-trivial merge of the mutation write path).

`test_drop_table_purge` constructs a `Minio` client but the `minio` package was never imported, so `ruff check` failed with `F821 Undefined name Minio`. Add `from minio import Minio` to match the import used by `tests/integration/helpers/cluster.py`. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98670&sha=60c079ba48738ecaf5eb567865aea45cb048a4e6&name_0=PR&name_1=Style%20check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-07-03T06:17:23Z

+        return fmt::format("abfss://{}@{}/{}/{}/{}", container, account_host, extra_path, namespace_name, table_name);
+    }
+
+    if (location_scheme == "s3")


constructTableLocation now derives the s3://... table location from uri.getPath() only. That breaks configurations using storage_uri_style = 'virtual_hosted' (which DatabaseDataLakeSettings already documents), for example https://warehouse-rest.minio.example.com/prefix: here the bucket lives in the host, so path is just prefix (or empty) and this code either creates s3://prefix/ns/table or throws does not contain a bucket.

Existing DataLakeCatalog reads work with that endpoint style, so CREATE TABLE would now be unavailable or generate the wrong table location for a supported configuration. Please derive the bucket from the host when virtual-hosted style is configured, or reject that endpoint style explicitly before advertising CREATE TABLE support.

clickhouse-gh · 2026-07-03T06:17:24Z

            throw Exception(ErrorCodes::BAD_ARGUMENTS, "Mutations with ON CLUSTER are not allowed for KeeperMap tables");

+        if (table_id)
+            checkDatabaseSupportsOnClusterDDL(DatabaseCatalog::instance().tryGetDatabase(table_id.database_name));


checkDatabaseSupportsOnClusterDDL only runs here when the initiator resolved the table locally (if (table_id)). If ALTER TABLE dl.\ns.t` ... ON CLUSTERis issued from a node that does not have thedl DataLakeCatalogdatabase/table locally,table_id` stays empty, this guard is skipped, and the query is still fanned out to every worker.

Each worker then commits the same shared-catalog mutation, which is the same duplicate-mutation race the new ON CLUSTER ban is supposed to prevent for catalog-backed tables. Please reject DataLakeCatalog ON CLUSTER by database name before the local lookup, or re-check on the worker before executing the mutation.

clickhouse-gh · 2026-07-03T06:17:24Z

+        order_by,
+        context_);
+
+    catalog->createTable(namespace_name, table_name, /* metadata_path */ "", metadata_content);


This path is now generic for every DataLakeCatalog, but only RestCatalog and GlueCatalog currently override ICatalog::createTable / dropTable. hive, unity, onelake, biglake, and paimon_rest still hit the base ICatalog implementation and throw NOT_IMPLEMENTED, even though the new docs and changelog entry describe CREATE TABLE / DROP TABLE as DataLakeCatalog functionality in general.

Please either gate these statements to the catalog types that actually implement catalog mutations, or narrow the user-facing contract to REST / Glue only.

clickhouse-gh · 2026-07-03T15:27:03Z

+            columns_declare_list->set(columns_declare_list->columns, columns_expression_list);
+            create_query.set(create_query.columns_list, columns_declare_list);
+
+            for (const auto & column : properties.columns)


CREATE TABLE ... AS source can still silently drop source-column COMMENT, CODEC, column TTL, SETTINGS, and STATISTICS. properties.columns already carries those fields here, but this AST rebuild only copies name, type, and defaults, so DatabaseDataLake::createTable never sees the unsupported modifiers and the Iceberg table is created with weaker semantics than the source.

Could we reuse formatColumns(properties.columns) (or otherwise copy these fields into the synthesized ASTColumnDeclaration) so the existing rejection path fires for CTAS too?

clickhouse-gh · 2026-07-03T18:51:04Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered: 337/598 (56.35%) · Uncovered code

Full report · Diff report

zvonand force-pushed the zvonand-create-iceberg-table branch from 5133bc3 to a81e99e Compare March 4, 2026 17:59

zvonand changed the title ~~Allow to create Iceberg table in a DataLakeCatalog and allow to prune such tables on DROP.~~ Support Iceberg table creation in DataLakeCatalog; optionally prune them on DROP Mar 26, 2026

alesapin added the can be tested Allows running workflows for external contributors label Mar 26, 2026

alesapin self-assigned this Mar 26, 2026

clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 26, 2026

clickhouse-gh Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread docs/en/engines/database-engines/datalake.md

zvonand force-pushed the zvonand-create-iceberg-table branch from b3846c8 to b381484 Compare March 26, 2026 15:10

clickhouse-gh Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

zvonand force-pushed the zvonand-create-iceberg-table branch from b381484 to 97a60e7 Compare March 31, 2026 12:22

clickhouse-gh Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

clickhouse-gh Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated

clickhouse-gh Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/Interpreters/InterpreterCreateQuery.cpp Outdated

groeneai mentioned this pull request May 1, 2026

Add web terminal interface to clickhouse-server #100277

Merged

1 task

clickhouse-gh Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/Interpreters/InterpreterCreateQuery.cpp Outdated

clickhouse-gh Bot reviewed May 14, 2026

View reviewed changes

Comment thread tests/integration/test_database_iceberg/test.py Outdated

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

Comment thread tests/integration/test_database_iceberg/test.py Outdated

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/Interpreters/InterpreterCreateQuery.cpp Outdated

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

zvonand changed the title ~~Support Iceberg table creation in DataLakeCatalog; optionally prune them on DROP~~ Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP May 19, 2026

clickhouse-gh Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

clickhouse-gh Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

zvonand force-pushed the zvonand-create-iceberg-table branch from a0bf1f2 to b3cb15b Compare June 11, 2026 23:15

squash: native create, drop table

ce5a19c

zvonand force-pushed the zvonand-create-iceberg-table branch from b3cb15b to ce5a19c Compare June 11, 2026 23:15

clickhouse-gh Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/Interpreters/InterpreterCreateQuery.cpp

zvonand added 2 commits June 17, 2026 16:37

cleanup on failure

94d5354

Merge branch 'master' into zvonand-create-iceberg-table

4949468

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp

upd

8ba09f6

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated

DimensionWieldr mentioned this pull request Jun 25, 2026

Iceberg S3 path isn't cleaned up after DROP TABLE #108524

Open

alexey-milovidov and others added 9 commits June 29, 2026 11:17

Merge remote-tracking branch 'origin/master' into zvonand-create-iceb…

9640cfc

…erg-table

Merge remote-tracking branch 'origin/master' into zvonand-create-iceb…

37ff2e8

…erg-table # Conflicts: # src/Databases/DataLake/DatabaseDataLake.h # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp

clickhouse-gh Bot reviewed Jul 3, 2026

View reviewed changes

Merge branch 'master' into zvonand-create-iceberg-table

4dfe2af

clickhouse-gh Bot reviewed Jul 3, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

zvonand commented Mar 3, 2026 • edited by alexey-milovidov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zvonand commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Jul 3, 2026

Uh oh!

clickhouse-gh Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented Jul 3, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zvonand commented Mar 3, 2026 •

edited by alexey-milovidov

Loading

clickhouse-gh Bot commented Mar 26, 2026 •

edited

Loading

zvonand commented Jun 12, 2026 •

edited

Loading