Fix server abort on bad argument count for object-storage table engines (STID 4283-5f31) by groeneai · Pull Request #103544 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix server abort on bad argument count for object-storage table engines (STID 4283-5f31)#103544

Merged
kssenii merged 4 commits into
ClickHouse:masterfrom
groeneai:fix-azure-object-storage-bad-arg-count-4283-5f31
Jun 9, 2026
Merged

Fix server abort on bad argument count for object-storage table engines (STID 4283-5f31)#103544
kssenii merged 4 commits into
ClickHouse:masterfrom
groeneai:fix-azure-object-storage-bad-arg-count-4283-5f31

Conversation

@groeneai

@groeneai groeneai commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

BuzzHouse hit a server abort when CREATE TABLE ... ENGINE = DeltaLakeAzure(<single-arg>) reached addStructureAndFormatToArgsIfNeededAzure with one argument. The defensive arg-count check there (and in three sibling helpers) used LOGICAL_ERROR, which abortOnFailedAssertions in debug/sanitizer builds.

Logical error: Expected 3 to 10 arguments in table function azureBlobStorage, got 1 (STID: 4283-5f31)

Stack:

DB::addStructureAndFormatToArgsIfNeededAzure                  Configuration.cpp:681
DB::StorageAzureConfiguration::addStructureAndFormatToArgsIfNeeded   Configuration.cpp:852
DB::StorageObjectStorage::addInferredEngineArgsToCreateQuery   StorageObjectStorage.cpp:748
DB::InterpreterCreateQuery::doCreateTable                      InterpreterCreateQuery.cpp:2082

The path is reachable because AzureStorageParsedArguments::fromAST accepts a single argument as a "lightweight loading" case, while the structure/format helper assumes at least 3 arguments. The same anti-pattern (using LOGICAL_ERROR for an arg-count check) lived in four places:

  • src/Storages/ObjectStorage/Azure/Configuration.cpp -> addStructureAndFormatToArgsIfNeededAzure
  • src/Storages/ObjectStorage/HDFS/Configuration.cpp -> addStructureAndFormatToArgsIfNeededHDFS
  • src/Storages/ObjectStorage/S3/Configuration.cpp -> addStructureAndFormatToArgsIfNeededS3
  • src/TableFunctions/ITableFunctionFileLike.h -> updateStructureAndFormatArgumentsIfNeeded

All four now throw NUMBER_OF_ARGUMENTS_DOESNT_MATCH, which is what the upstream *StorageParsedArguments::fromAST validation already uses for the same kind of check. Users get a graceful, well-typed error instead of a server abort.

Added stateless test 04119_object_storage_bad_arg_count_no_logical_error that documents the expected error code for file, url, s3, hdfs, and azureBlobStorage table functions with bad argument counts.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98377&sha=402e066f308294fe00a31ac763f9a72d38f3e96e&name_0=PR&name_1=BuzzHouse%20%28amd_debug%29

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a server abort when creating a table with an object-storage engine (AzureBlobStorage, DeltaLakeAzure, S3, HDFS) using an unsupported number of arguments. The server now returns a clean NUMBER_OF_ARGUMENTS_DOESNT_MATCH error instead of aborting in debug/sanitizer builds.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.6.1.548

The defensive arg-count checks inside `addStructureAndFormatToArgsIfNeeded*`
(Azure, HDFS, S3) and `ITableFunctionFileLike::updateStructureAndFormatArgumentsIfNeeded`
threw `LOGICAL_ERROR`. That code path is reachable from user input via
`StorageObjectStorage::addInferredEngineArgsToCreateQuery`, after the
storage is constructed - notably for `DeltaLakeAzure(<single-arg>)` where
`AzureStorageParsedArguments::fromAST` accepts 1 argument as a "lightweight
loading" case but the structure/format helper expects at least 3.

`LOGICAL_ERROR` triggers `abortOnFailedAssertion` in debug/sanitizer
builds, so a malformed `CREATE TABLE` aborts the server, which is what
BuzzHouse hit (STID 4283-5f31):

  Logical error: Expected 3 to 10 arguments in table function
  azureBlobStorage, got 1

Stack: addStructureAndFormatToArgsIfNeededAzure -> addInferredEngineArgsToCreateQuery
       -> InterpreterCreateQuery::doCreateTable

Replaced `LOGICAL_ERROR` with `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` in all
four call sites - matching the upstream `*StorageParsedArguments::fromAST`
validation, which already uses `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` for the
same kind of check. Users now get a graceful, well-typed error instead of
a server abort.

Added stateless test `04119_object_storage_bad_arg_count_no_logical_error`
documenting the expected error code for `file`, `url`, `s3`, `hdfs`, and
`azureBlobStorage` table functions with bad argument counts.

Failure source:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98377&sha=402e066f308294fe00a31ac763f9a72d38f3e96e&name_0=PR&name_1=BuzzHouse%20%28amd_debug%29
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation (per task TASK.md Phase 4 Step 9):

(a) Deterministic repro? Yes via stack trace from CIDB STID 4283-5f31. The user-facing path azureBlobStorage(0_args), s3(>10_args), etc. is reproducible on demand and was used to validate the new test (30/30 passes locally with full randomization). The exact BuzzHouse trigger (CREATE TABLE ... ENGINE = DeltaLakeAzure(reinterpretAsString(unhex('...'))) SETTINGS iceberg_recent_metadata_file_by_last_updated_ms_field = 1) requires USE_DELTA_KERNEL_RS=ON build, but the same code path (addStructureAndFormatToArgsIfNeededAzure with args.size() < 3) is what trips.

(b) Root cause explained? addStructureAndFormatToArgsIfNeededAzure (and 3 sibling helpers) used LOGICAL_ERROR for an arg-count check that IS reachable from user input via StorageObjectStorage::addInferredEngineArgsToCreateQuery after StorageFactory::get() succeeds (e.g. for DeltaLakeAzure with the "lightweight loading" 1-arg case in AzureStorageParsedArguments::fromAST). LOGICAL_ERROR triggers abortOnFailedAssertion in debug/sanitizer builds → server abort. The correct error class for an arg-count mismatch is NUMBER_OF_ARGUMENTS_DOESNT_MATCH, which the upstream *StorageParsedArguments::fromAST already uses for the same kind of check.

(c) Fix matches root cause? Yes — direct fix: change error code from LOGICAL_ERROR to NUMBER_OF_ARGUMENTS_DOESNT_MATCH in all four sites with this anti-pattern. No defensive band-aids; the validation logic itself is unchanged and the user gets a graceful, well-typed error.

(d) Test intent preserved? / New tests added? New stateless test 04119_object_storage_bad_arg_count_no_logical_error.sql exercises file(), url(), s3(), hdfs(), azureBlobStorage() with bad argument counts and asserts NUMBER_OF_ARGUMENTS_DOESNT_MATCH. This covers the user-facing API; the inner BuzzHouse path needs USE_DELTA_KERNEL_RS so verifying the inner fix happens via CIDB monitoring of STID 4283 family after merge.

(e) Both directions demonstrated? Test passes 30/30 with the fix applied. Without the fix, the test would still pass for the user-facing table-function paths (because *StorageParsedArguments::fromAST already returns NUMBER_OF_ARGUMENTS_DOESNT_MATCH upstream). The defense-in-depth aspect — preventing the inner LOGICAL_ERROR from aborting the server — is verified by code review and CIDB monitoring after merge.

(f) Fix is general, not a narrow patch? Yes. Sibling check done across the entire ObjectStorage / TableFunctions surface: all 4 places with the LOGICAL_ERROR arg-count anti-pattern are fixed in the same commit (Azure, HDFS, S3, ITableFunctionFileLike). The fix is at the root cause (wrong error class), not a guard against the symptom.

Session: cron:clickhouse-ci-task-worker:20260425-234500

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 26, 2026
@clickhouse-gh

clickhouse-gh Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [704a360]

Summary:

job_name test_name status info comment
Bugfix validation (functional tests) FAIL
04119_object_storage_bad_arg_count_no_logical_error FAIL cidb IGNORED

AI Review

Summary

This PR changes user-reachable object-storage argument-count checks from LOGICAL_ERROR to NUMBER_OF_ARGUMENTS_DOESNT_MATCH for Azure, HDFS, S3, and file-like table-function helper paths, and adds regression coverage for both SQL-facing bad argument counts and the Azure helper path that originally regressed. The current code addresses the prior helper-coverage concern; I did not find any remaining correctness, safety, or compatibility issues worth an inline review comment.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 26, 2026
Address review feedback on PR ClickHouse#103544: the original test exercised only the
table function paths (`file`, `url`, `s3`, `hdfs`, `azureBlobStorage`); the
BuzzHouse failure surfaced through the engine path
(`InterpreterCreateQuery::doCreateTable` ->
`StorageObjectStorage::addInferredEngineArgsToCreateQuery` ->
`addStructureAndFormatToArgsIfNeededAzure`).

Added engine-path tests using `CREATE TABLE ... ENGINE = X(<bad arg count>)`
for `S3`, `HDFS`, `AzureBlobStorage`, and `DeltaLakeAzure`. Each asserts
`NUMBER_OF_ARGUMENTS_DOESNT_MATCH`, ensuring the CREATE TABLE codepath stays
covered for future regressions in either
`*StorageParsedArguments::fromAST` (the upstream validation) or
`addStructureAndFormatToArgsIfNeeded*` (the helper that previously aborted
with `LOGICAL_ERROR`).

Added `no-msan` tag because `delta-kernel-rs` is disabled under MSan, which
makes `DeltaLakeAzure` unavailable.

Reviewer: clickhouse-gh[bot]
PR: ClickHouse#103544
@groeneai

Copy link
Copy Markdown
Contributor Author

Addressed the engine-path coverage feedback in commit 9ae6712.

What's added:

Direct engine-path tests via CREATE TABLE ... ENGINE = X(<bad arg count>) for
all four touched configurations:

CREATE TABLE t_engine_s3_zero (x UInt32) ENGINE = S3(); -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH }
CREATE TABLE t_engine_s3_many (x UInt32) ENGINE = S3('a','b','c','d','e','f','g','h','i','j','k','l');

CREATE TABLE t_engine_hdfs_zero (x UInt32) ENGINE = HDFS();
CREATE TABLE t_engine_hdfs_many (x UInt32) ENGINE = HDFS('a','b','c','d','e','f');

CREATE TABLE t_engine_azure_zero (x UInt32) ENGINE = AzureBlobStorage();
CREATE TABLE t_engine_azure_many (x UInt32) ENGINE = AzureBlobStorage('a','b','c','d','e','f','g','h','i','j','k','l');

CREATE TABLE t_engine_delta_azure_zero (x UInt32) ENGINE = DeltaLakeAzure();
CREATE TABLE t_engine_delta_azure_many (x UInt32) ENGINE = DeltaLakeAzure('a','b','c','d','e','f','g','h','i','j','k','l');

Each asserts NUMBER_OF_ARGUMENTS_DOESNT_MATCH, exercising the CREATE TABLE
codepath (InterpreterCreateQuery::doCreateTable -> StorageFactory::get ->
StorageObjectStorage::addInferredEngineArgsToCreateQuery). Future
regressions to LOGICAL_ERROR in either *StorageParsedArguments::fromAST
(the upstream validation) or addStructureAndFormatToArgsIfNeeded* (the
helper this PR fixes) would be caught.

DeltaLakeAzure is the engine that surfaced the original BuzzHouse failure,
so I added explicit zero-arg and too-many-args coverage for it. The 1-arg
BuzzHouse trigger that reaches addStructureAndFormatToArgsIfNeededAzure
itself (AzureStorageParsedArguments::fromAST accepts 1 arg as the
"lightweight loading" case, then DataLake forces format = Parquet and
execution falls through to the helper) requires a real Azure backend for
local reproduction, so it is documented in the test comment and continues to
be exercised post-merge by the BuzzHouse fuzzer rather than added directly
here. The zero/many-args engine-path tests give us deterministic regression
coverage.

Added no-msan tag because delta-kernel-rs is disabled under MSan, which
makes DeltaLakeAzure unavailable.

Local verification: All non-DeltaLake test cases pass via clickhouse client
multi-query with the rebuilt binary (Code 42 NUMBER_OF_ARGUMENTS_DOESNT_MATCH
returned for every assertion). DeltaLakeAzure requires USE_DELTA_KERNEL_RS=ON
which is the default in CI debug/asan/tsan/release builds; locally it returns
UNKNOWN_STORAGE (expected) on builds without the Rust feature flag.

…nt check

These tests call addStructureAndFormatToArgsIfNeededAzure directly via the
single-argument lightweight-loading path and an over-the-maximum arg list, the
only shapes that reach the helper's arg-count guard. They assert
NUMBER_OF_ARGUMENTS_DOESNT_MATCH; reverting the guard to LOGICAL_ERROR aborts
the test in debug/ASan builds. The stateless test cannot reach this branch
because fromAST rejects those argument counts earlier.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.40% 84.40% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.00% 77.00% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 29/42 (69.05%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@kssenii kssenii self-assigned this Jun 8, 2026
@kssenii kssenii added this pull request to the merge queue Jun 9, 2026
Merged via the queue into ClickHouse:master with commit 3db3ad2 Jun 9, 2026
164 of 166 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 9, 2026
@clickgapai

clickgapai commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

pull Bot pushed a commit to stnxo2023/ClickHouse that referenced this pull request Jun 11, 2026
04119_object_storage_bad_arg_count_no_logical_error is a regression test for
PR ClickHouse#103544: every query in it throws on purpose (expected serverError
NUMBER_OF_ARGUMENTS_DOESNT_MATCH). On a thrown query, executeQuery logs the
full query text -- via toOneLineQuery, which keeps comment tokens verbatim --
at Error level, so the file's leading comment ends up in clickhouse-server.err.log.

That comment quoted the original pre-fix crash message word for word
("Logical error: Expected 3 to 10 arguments in table function azureBlobStorage,
got 1"). The CI server-log scanner (ci/jobs/scripts/log_parser.py) greps the
server log with `rg --text -A 10 -o 'Logical error.*'`, matched the comment,
used its first line as the failure title and a nearby unrelated stack for the
STID, and reported a bogus blocking failure. After ClickHouse#103544 merged on 2026-06-09
the server can no longer emit that message (it now throws
NUMBER_OF_ARGUMENTS_DOESNT_MATCH), so the only remaining source of the string
was this comment -- which is why the failure appeared on master and across
dozens of unrelated PRs starting exactly that day (STID 3574-4812 / 2508-4994).
log_parser.py already documents this SQL-comment false-match class.

Reword the comment so it no longer contains the verbatim runtime message
(refer to the error by its code name LOGICAL_ERROR), and add a note telling
future editors to keep the file free of scanner trigger strings. Test-only
change; the queries and their expected errors are unchanged and the test still
passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants