Throw on unknown configuration option in server config by alexey-milovidov · Pull Request #100332 · ClickHouse/ClickHouse · GitHub
Skip to content

Throw on unknown configuration option in server config#100332

Open
alexey-milovidov wants to merge 124 commits into
masterfrom
throw-on-unknown-config-option
Open

Throw on unknown configuration option in server config#100332
alexey-milovidov wants to merge 124 commits into
masterfrom
throw-on-unknown-config-option

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Mar 22, 2026

Copy link
Copy Markdown
Member

Previously, wrong or misspelled configuration entries in the server config were silently ignored, making it hard to diagnose configuration issues (e.g., <max_thred_pool_size> would be silently ignored instead of alerting the user).

Added ServerSettings::checkUnknownSettings that validates all top-level config keys against known ServerSettings entries and a comprehensive list of known complex config sections. Throws UNKNOWN_ELEMENT_IN_CONFIG if an unrecognized key is found.

The check runs on both server startup and config reload. It respects the existing <skip_check_for_incorrect_settings> escape hatch.

Known keys come from three sources:

  • ServerSettings scalar setting names (settings without paths)
  • Top-level prefixes from ServerSettings path-based settings (e.g., logger, openSSL, distributed_ddl)
  • A static list of ~170 known complex config sections (ports, clusters, storage, access control, system log tables, engine-specific sections, etc.)
  • Prefix-based matching for user-defined sections (e.g., graphite_rollup_*, http_handlers_*)

Sections referenced from elsewhere in the config (incl substitutions, config:// references in HTTP handlers, custom protocol handler sections, and top-level tags of include_from source files kept under config.d/) are exempt from the check.

Known limitation (ZooKeeper-backed substitutions): resolving a from_zk substitution would require a ZooKeeper connection at config-validation time, before the <zookeeper> section itself has been validated, so:

  • If <include_from> is specified with a from_zk substitution inside a users-config fragment and its source file is kept under config.d/, the source's top-level tags are not exempted. Such setups can either move the substitution source out of config.d/ or set <skip_check_for_incorrect_settings>.
  • If the server config contains a top-level <include from_zk=.../> (which imports the referenced ZooKeeper node's children as top-level keys), the whole check is skipped for that config, with a warning, so a valid config is never rejected — but a genuine typo alongside such an include is not caught.

Literal paths, from_env (including a top-level <include from_env=.../>), the default /etc/metrika.xml substitution source, and incl in the main server config are all covered.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

The server now throws an exception on unknown configuration options in the server config, helping catch typos and misconfigured options early. The check can be disabled with <skip_check_for_incorrect_settings>.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

Previously, wrong or misspelled configuration entries in the server
config were silently ignored, making it hard to diagnose configuration
issues (e.g., `<max_thred_pool_size>` would be silently ignored).

Added `ServerSettings::checkUnknownSettings` that validates all
top-level config keys against known ServerSettings entries and a
comprehensive list of known complex config sections. Throws
`UNKNOWN_ELEMENT_IN_CONFIG` if an unrecognized key is found.

The check is called on both server startup and config reload, and
respects the existing `skip_check_for_incorrect_settings` escape hatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 22, 2026
Comment thread src/Core/ServerSettings.cpp
Comment thread src/Core/ServerSettings.cpp
alexey-milovidov and others added 2 commits March 28, 2026 17:45
…gration tests

Address review feedback and fix CI failures:

- Add `instrumentation_trace_log` to known system log sections (present in
  default `programs/server/config.xml`, caused Fast Test server startup failure)
- Add `allow_experimental_cluster_discovery` to known cluster sections
- Add `lemmatizers`, `synonyms_extensions`, `catboost_lib_path`,
  `path_to_regions_hierarchy_file`, `path_to_regions_names_files` to known
  dictionaries/functions sections
- Add `transaction_log` to known miscellaneous sections

Fix integration tests that use custom top-level config keys:
- `test_http_handlers_config`: uses arbitrary config key referenced via
  `config://` in HTTP handler static responses
- `test_database_hms`: uses non-standard `experimental_features` section

Both tests now set `skip_check_for_incorrect_settings` to bypass the check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp
alexey-milovidov and others added 2 commits March 29, 2026 00:49
…ndling

- Add command-line option names (`pid-file`, `config-file`, `log-file`,
  `errorlog-file`) to the known config sections allowlist. These keys are
  injected into the config by `argsToConfig` which processes all argv
  entries, including Poco options that were already handled via bindings.

- Dynamically extract top-level keys referenced via `config://` in
  `http_handlers` `response_content`/`response_expression` fields and
  whitelist them. This preserves backward compatibility for deployments
  using custom config-backed static HTTP responses via `StaticRequestHandler`.

- Remove the `skip_check_for_incorrect_settings` workaround from the
  `test_static_handler` integration test config, since the dynamic
  extraction now properly handles it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp Outdated
alexey-milovidov and others added 2 commits March 30, 2026 06:39
…ig allowlist

The `pid-file` Poco option has `.binding("pid")`, so Poco stores the
value under the key `pid` in the config — causing `UNKNOWN_ELEMENT_IN_CONFIG`
on server startup.

Also add hyphen-less variants (`pidfile`, `configfile`, `logfile`,
`errorlogfile`) since `argsToConfig` stores raw CLI key names without
normalization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp Outdated
alexey-milovidov and others added 4 commits March 30, 2026 11:08
Poco's `SystemConfiguration` injects a `system` top-level key into the
layered configuration. When `checkUnknownSettings` enumerates all
top-level keys via `config.keys("")`, it sees this Poco-internal key
and rejects it with `UNKNOWN_ELEMENT_IN_CONFIG`, preventing server
startup (as seen in the Fast test CI failure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a test that verifies the server rejects unknown top-level config
keys with `UNKNOWN_ELEMENT_IN_CONFIG`. This ensures future allowlist
edits don't silently reintroduce startup exceptions or allow invalid
keys through.

The existing `test_http_handlers_config/test_static_handler` test
already validates that custom top-level keys referenced via `config://`
pass validation without `skip_check_for_incorrect_settings`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp
alexey-milovidov and others added 3 commits April 6, 2026 23:13
… allowlist, fix integration test

- Add `core_path` to `known_complex_sections` — this key is read by
  `BaseDaemon` for the core dump directory and is used in stress test
  configs. Missing it caused all stress tests and install tests to fail
  with `UNKNOWN_ELEMENT_IN_CONFIG`.

- Add `experimental_features` to `known_complex_sections` and remove
  the `skip_check_for_incorrect_settings` workaround from
  `test_database_hms/config.xml` (per review feedback).

- Fix `test_unknown_config_option` integration test: when the server
  fails to start, the error message goes to the error log file (via
  `--errorlog-file`), not to container stdout. Read the error log from
  the host-mounted logs directory so the assertion can find the
  `UNKNOWN_ELEMENT_IN_CONFIG` message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994.

Comment thread src/Core/ServerSettings.cpp Outdated
alexey-milovidov and others added 3 commits April 8, 2026 01:37
- Add `daemon`, `umask`, `help`, `version` to the known config sections
  allowlist. These are injected by Poco `ServerApplication` and
  `argsToConfig` when the server is started with `--daemon --umask` etc.
  Without this, those startup flows throw `UNKNOWN_ELEMENT_IN_CONFIG`.

- Tighten prefix-based matching for dynamic config sections
  (`graphite_rollup`, `http_handlers`, `users`) to require an exact match
  or prefix followed by `_` separator. Previously, bare `starts_with`
  would silently accept typos like `graphite_rollupTypo`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This key is used in the AST fuzzer config (`fuzz-server-settings.xml`)
for shared catalog support. Without it, the server fails to start
with `UNKNOWN_ELEMENT_IN_CONFIG` during fuzzer CI runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed f8708ec114c addressing the AI-Review Request changes verdict on 49f75420b50 — the two remaining checkUnknownSettings false negatives, both of which exempted a top-level section the runtime consumer falls back past and never reads:

  1. <protocols>...<handlers>NAME</handlers> exemption. The top-level component of the referenced handler prefix was exempted unconditionally. createHTTPHandlerFactory consults the prefix only when config.has(NAME); otherwise it falls back to the default http_handlers and never reads the named section. A misspelled prefix (<handlers>custom.handlers</handlers> while the config only has <custom><typo>...</typo></custom>, so custom.handlers does not exist) therefore whitelisted the typo'd top-level <custom>. The exemption (and the config:// scan) is now gated on config.has(handler_section), mirroring the consumer.

  2. Static handler config:// exemption. config://payload.suffix exempted the top-level <payload> even when the exact path payload.suffix was absent. StaticRequestHandler::writeResponse reads it with getRawString("payload.suffix", "Ok.\n"), so a missing path serves the default Ok.\n and never consumes <payload>. The top-level component is now exempted only when the full referenced path exists.

New negative integration cases (fail to start with UNKNOWN_ELEMENT_IN_CONFIG before the fix, pass after):

  • test_protocols_missing_handler_prefix_not_exempted
  • test_config_ref_missing_path_does_not_exempt_unknown_key

ServerSettings.cpp compiled clean locally (RelWithDebInfo, zero diagnostics). Ran the focused integration subset (both new cases plus the positive protocols_custom_handlers / protocols_nested / config:// / bracket-index guards): 7 passed, 0 failed, confirming the tightened exemptions still accept every legitimate handler/config:// reference. Both AI-Review threads replied to and resolved.

/// `retention*` block (`startsWith(key, "retention")`), throwing on anything else. It
/// reads `<retention>` only as `.age`/`.precision`, so its grandchildren are not checked.
const bool ok = rule_child == "regexp" || rule_child == "function"
|| rule_child == "rule_type" || rule_child.starts_with("retention");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This checks only the presence of allowed child names, but appendGraphitePattern rejects more than unknown child names: a rule must contain at least one function or retention*, rule_type must parse to a known value, and a default rule must keep rule_type as all. As written, a typo section like <some_typo><pattern><regexp>.*</regexp></pattern></some_typo> passes looks_like_graphite_rollup and is exempted from the new unknown-key check, even though the real GraphiteMergeTree parser would throw if the section were used. Please mirror those semantic checks (or delegate the shape validation to the existing parser) and add a negative test for an allowed-key-but-invalid rule.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1b10585feab. Tightened looks_like_graphite_rollup (renamed to graphite_rule_valid, now taking an is_default flag) to mirror the rest of appendGraphitePattern, not just the child names:

  • a <pattern>/<default> rule must carry at least one <function> or <retention*> (otherwise the parser throws NO_ELEMENTS_IN_CONFIG), so <some_typo><pattern><regexp>.*</regexp></pattern></some_typo> is now rejected;
  • every <rule_type> value must parse to all/plain/tagged/tag_list (otherwise ruleType throws BAD_ARGUMENTS);
  • the <default> rule keeps rule_type as all (otherwise BAD_ARGUMENTS: "Default must have rule_type all").

The tightening only ever rejects sections the parser also rejects, so it never turns a valid rollup config into a startup failure (verified against the existing positive graphite_arbitrary_name.xml/graphite_columns_only.xml). Added three negative tests — graphite_rule_no_aggregation.xml, graphite_default_rule_type_not_all.xml, graphite_invalid_rule_type.xml — one per added check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not fully aligned with appendGraphitePattern. The current code at src/Core/ServerSettings.cpp:2737 treats any <function> child as valid and only flips has_function_or_retention. The real parser resolves that value with getAggregateFunctionNameAndParametersArray and AggregateFunctionFactory, then also rejects aggregate functions that allocate memory in an arena.

So a typo section such as <some_typo><pattern><function>not_a_function</function></pattern></some_typo> is still exempted as a GraphiteMergeTree section even though appendGraphitePattern would throw if the section were used, masking an unknown top-level key. Please validate the function value or delegate the shape check to the actual parser, and add a negative case with an invalid aggregate function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real false-negative, but fully mirroring appendGraphitePattern's function handling is not possible here: checkUnknownSettings is a static method with no ContextPtr, whereas the parser resolves the value with getAggregateFunctionNameAndParametersArray(..., context) and then constructs the function via AggregateFunctionFactory::instance().get(name, action, {Float64}, params_row, properties) before rejecting one that allocatesMemoryInArena.

Reproducing only the name check (e.g. isAggregateFunctionName on a split-off name) would still miss the arena case and, worse, risks a new false positive — rejecting a valid <function>quantile(0.5)</function> if the name/params are not split exactly as the parser does — which turns a recognition miss into a server-won't-start regression (unverifiable in this change without a full integration run).

Since this is best-effort recognition of an arbitrary-named section — the <skip_check_for_incorrect_settings> escape hatch remains, and a genuinely bogus function is still caught by appendGraphitePattern the moment a GraphiteMergeTree table references the section — leaving this narrow false-negative in place is the fail-open choice. Leaving this thread open as a design decision (accept the best-effort limitation as-is, or thread a ContextPtr into checkUnknownSettings so the graphite recognition can delegate to the real parser).

Comment thread tests/integration/test_unknown_config_option/test.py
alexey-milovidov and others added 2 commits June 30, 2026 22:44
…ig_option`

The tests `test_protocols_missing_handler_prefix_not_exempted` and
`test_config_ref_missing_path_does_not_exempt_unknown_key` reference
`configs/config.d/protocols_missing_handler_prefix.xml` and
`configs/config.d/static_handler_config_ref_missing_path.xml`, but those files
were never committed. `ClickHouseInstance` copies `main_configs` with
`shutil.copy` in `create_dir` *before* `self.logs_dir` is assigned, so the
missing files raised `FileNotFoundError` at cluster start, and the fixtures'
`except` blocks then failed with `AttributeError: 'ClickHouseInstance' object
has no attribute 'logs_dir'` instead of reaching the
`UNKNOWN_ELEMENT_IN_CONFIG` assertions.

Add both configs:
- `protocols_missing_handler_prefix.xml`: a `<protocols>` HTTP endpoint whose
  `<handlers>custom_missing.handlers</handlers>` names a prefix that is absent
  (the `<custom_missing>` section has a mistyped `<handlerss>` child), so
  `config.has("custom_missing.handlers")` is false and the top-level
  `<custom_missing>` must be rejected as a genuine unknown key.
- `static_handler_config_ref_missing_path.xml`: a `static` handler that
  references `config://missing_payload.suffix` whose path does not exist
  (`<missing_payload>` has no `<suffix>` child), so the top-level
  `<missing_payload>` must be rejected.

CI report: #100332

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dGraphitePattern`

`looks_like_graphite_rollup` exempted arbitrary-named rollup sections from the
unknown-config-key check by their structure, but it only validated the child
*names* of each `<pattern>`/`<default>` rule. `appendGraphitePattern` enforces
more: a rule must carry at least one `<function>` or `<retention*>` block
(`NO_ELEMENTS_IN_CONFIG`), every `<rule_type>` value must parse to a known type
(`all`/`plain`/`tagged`/`tag_list`, otherwise `BAD_ARGUMENTS`), and the
`<default>` rule's `rule_type` must be `all`. A typo'd section such as
`<some_typo><pattern><regexp>.*</regexp></pattern></some_typo>` therefore passed
the exemption even though the real parser would reject it if the section were
used — a false negative that let an unknown top-level key slip through.

Mirror those semantic checks in the renamed `graphite_rule_valid` helper (now
taking an `is_default` flag). The tightening only ever rejects sections the
parser also rejects, so it never turns a valid rollup config into a startup
failure.

Add three negative tests, one per added check:
- `graphite_rule_no_aggregation.xml`: a `<pattern>` with only `<regexp>`.
- `graphite_default_rule_type_not_all.xml`: a `<default>` with `rule_type` not `all`.
- `graphite_invalid_rule_type.xml`: a `<pattern>` with an unrecognized `rule_type`.

CI report: #100332

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed 9dff5e9ffbb and 1b10585feab addressing the AI-Review Request changes verdict on f8708ec114c — both Majors, which were also the two failing test_unknown_config_option cases in CI.

  1. Missing negative-case config files (9dff5e9ffbb). test_protocols_missing_handler_prefix_not_exempted and test_config_ref_missing_path_does_not_exempt_unknown_key referenced protocols_missing_handler_prefix.xml and static_handler_config_ref_missing_path.xml, which were never committed. ClickHouseInstance.create_dir copies main_configs with shutil.copy before assigning self.logs_dir, so the missing files raised FileNotFoundError at cluster start and the fixtures' except blocks then died with AttributeError: 'ClickHouseInstance' object has no attribute 'logs_dir' (the actual CI error) before reaching the assertions. Added both configs; custom_missing (absent handler prefix) and missing_payload (absent config:// path) are now correctly rejected with UNKNOWN_ELEMENT_IN_CONFIG.

  2. Graphite-shape exemption tightened (1b10585feab). looks_like_graphite_rollup only validated the child names of each rollup rule, so a typo such as <some_typo><pattern><regexp>.*</regexp></pattern></some_typo> was exempted even though appendGraphitePattern would reject it. The helper (renamed graphite_rule_valid, taking an is_default flag) now also mirrors: at least one <function>/<retention*> per rule (NO_ELEMENTS_IN_CONFIG), a valid <rule_type> value (BAD_ARGUMENTS), and rule_type == all for the <default> rule (BAD_ARGUMENTS). It only ever rejects sections the parser also rejects, so no valid rollup config breaks. Added one negative test per check: graphite_rule_no_aggregation.xml, graphite_default_rule_type_not_all.xml, graphite_invalid_rule_type.xml.

The remaining red, Unit tests (asan_ubsan, function_prop_fuzzer) (FunctionsStress.stress/AllTests), is unrelated: the function-property fuzzer found a reinterpret/Decimal scale mismatch (Decimal64 returned for a requested Decimal(18, 18)), which a server-config-validation change cannot affect. It is tracked by the open issue #107242 and fails intermittently on master as well.

Did not re-merge master: the branch is MERGEABLE with no conflicts and the merge-base is from earlier today (< 1 day old).

Comment thread src/Core/ServerSettings.cpp
…on` in the config-key exemption

The unknown-config-key check exempts arbitrary-named `GraphiteMergeTree`
rollup sections by recognizing their shape via `graphite_rule_valid`, which is
meant to mirror `appendGraphitePattern` exactly. It only checked that a rollup
rule's child was named `retention*`, but `appendGraphitePattern` materializes
each `retention*` block by reading both `age` and `precision` with `getUInt`.
A block missing either field (or with a non-numeric value) makes the parser
throw, so a typo'd section such as
`<some_typo><pattern><retention><precision>60</precision></retention></pattern></some_typo>`
was wrongly exempted from the unknown-key check even though it could never load.

Mirror the retention validation: require both `age` and `precision` and that
they parse as unsigned integers, otherwise the section is not a valid rollup
and must be rejected as an unknown top-level key.

Added a negative integration test (`graphite_retention_missing_age.xml` /
`test_graphite_retention_missing_age_rejected`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed 256d06be630 addressing the sole remaining unresolved review thread on graphite_rule_valid (ServerSettings.cpp).

graphite_rule_valid now mirrors appendGraphitePattern's retention handling: for each retention* block it requires both <age> and <precision> to be present and to parse as unsigned integers (via config.getUInt, exactly as the parser reads them), and only then counts it toward has_function_or_retention. A block missing or with a non-numeric <age>/<precision> now fails the shape check, so a typo like <some_typo><pattern><retention><precision>60</precision></retention></pattern></some_typo> is rejected with UNKNOWN_ELEMENT_IN_CONFIG instead of being exempted. Added the negative integration test test_graphite_retention_missing_age_rejected (config graphite_retention_missing_age.xml).

Triage of the CI reds on the previous commit 1b10585feab (all pre-existing, unrelated to this config-validation change; a fresh run will re-evaluate on 256d06be630)

@groeneai, the last two have no tracking issue and are systemic. Could you investigate and provide fixes in separate PRs (or link existing work)?

Comment thread src/Core/ServerSettings.cpp Outdated
/// solely to resolve `incl` references — so exempting its tags would let a genuinely
/// unknown top-level key pass merely because the lookup table happens to define a tag
/// of the same name, masking exactly the typo/misplaced-section class this check catches.
if (!merge_files.contains(to_canonical(include_from_path)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gate is correct for a plain lookup-table incl, but it is too broad for an external source consumed by a top-level <include>. ConfigProcessor::doIncludesRecursive treats an element named include specially: it inserts the children of the referenced node into the parent. For example, with <include_from>/etc/metrika.xml</include_from> and a top-level <include incl="extra_root"/>, an external source containing <extra_root><my_custom_section>...</my_custom_section></extra_root> produces a real top-level <my_custom_section> in the loaded server config. Since /etc/metrika.xml is not in merge_files, this branch skips the source, my_custom_section is never marked referenced, and startup now throws UNKNOWN_ELEMENT_IN_CONFIG for a configuration that was valid before. Please handle top-level include substitutions from external include_from sources and exempt only the child names they actually import, while keeping the negative lookup-table case covered.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed and fixed in d6b625a8ae1.

checkUnknownSettings treated every external <include_from> source as a pure incl lookup table and skipped it entirely, so a top-level <include incl="X"/> that imports <X>'s children into the root (via ConfigProcessor::doIncludesRecursive) had those imported top-level keys rejected with UNKNOWN_ELEMENT_IN_CONFIG — a regression for a previously valid substitution.

The fix collects top-level <include incl="X"/> reference names from the server config only (the main config and merged config.d/conf.d fragments, never the users config, whose <include> targets a separate tree), resolves <X> in each <include_from> source exactly as the processor does (getRootNode(include_from)->getNodeByPath(name)), and exempts only the imported children — nothing else, so an unrelated typo elsewhere stays rejected. The negative lookup-table case (a plain incl attribute on a non-include element) is unchanged and still rejected, covered by test_external_include_from_source_does_not_exempt_unknown_key.

Added the positive regression test test_top_level_include_from_external_source_accepted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not fully fixed for recursive top-level includes. The current code at src/Core/ServerSettings.cpp:2672 only inserts the immediate children of the referenced include_from node. But ConfigProcessor::doIncludesRecursive recurses after expanding an <include>, so a valid source like <root><include incl="nested"/></root> can import <nested>'s children into the server root. The merged config then contains those child top-level keys, while referenced_top_level_keys only contains include, so the validator still rejects a previously valid config with UNKNOWN_ELEMENT_IN_CONFIG.

Please recurse through imported top-level <include> nodes, or reuse the already processed config/contributing include information, when collecting the keys imported by top-level <include incl="..."/>.

@groeneai

groeneai commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov Both already have open fixes; linking them:

1. Upgrade-check mysqlxx/DatabaseMySQL (and PostgreSQLConnectionPool) 192.0.2.1 log-noise flake

Fix: #108560 (open, you approved 2026-06-28; currently only UNSTABLE from CI in progress, no red review). It allowlists the benign background/startup connection failures both remote engines emit against the RFC5737 TEST-NET-1 fixture host 192.0.2.1 created by 04210_show_remote_databases_in_system_tables, while still surfacing any non-connection DatabaseMySQL/DatabasePostgreSQL error. It covers the MySQL variant (mysqlxx::Pool: Failed to connect to MySQL, DatabaseMySQL ... Connections to mysql failed) as well as the Postgres one, so this signature is handled.

2. 03230_keeper_cp_mv_commands on Fast test (arm_darwin)

Fix: #107438 (open, approved by alesapin 2026-06-19). Same keeper-client teardown flake as 03135_keeper_client_find_commands / 03161: on macOS, socket.shutdown() inside Coordination::ZooKeeper::finalize() throws ENOTCONN (errno 57) at connection teardown, and the Fast test harness treats any stderr output as a failure. 03230's stderr is exactly that path:

Net Exception: Socket is not connected
3. Poco::Net::SocketImpl::shutdown()
4. Coordination::ZooKeeper::finalize(...)
5. Coordination::ZooKeeper::~ZooKeeper()

#107438 catches POCO_ENOTCONN in both the constructor and finalize() and downgrades it to LOG_TRACE, silencing the benign teardown. CIDB confirms it is macOS-only: 5 hits / 5 PRs in 30d, all Fast test (arm_darwin), 0 Linux, 0 master.

Both are open and ready for merge; no new fix needed.

alexey-milovidov and others added 2 commits July 1, 2026 05:33
…nal `include_from` sources

`checkUnknownSettings` treated every external `<include_from>` source as a
pure `incl` lookup table that contributes no top-level key, so it skipped
such sources entirely. But `ConfigProcessor::doIncludesRecursive` expands a
top-level `<include incl="X"/>` element by inserting the *children* of node
`X` (resolved from the `<include_from>` source) into the root, so those child
tags become real top-level keys of the merged config. A previously valid
configuration using a top-level `<include>` against an external source (e.g.
`/etc/metrika.xml`) therefore started failing with `UNKNOWN_ELEMENT_IN_CONFIG`.

Collect top-level `<include incl="X"/>` reference names from the server config
(the main config and merged `config.d`/`conf.d` fragments only, never the
users config, whose `<include>` targets a separate tree), resolve `<X>` in
each `<include_from>` source exactly as the processor does
(`getRootNode(include_from)->getNodeByPath(name)`), and exempt only the
imported children — nothing else, so an unrelated typo stays rejected. The
negative lookup-table case (a plain `incl` attribute on a non-`include`
element) is unchanged and still rejected.

Add positive integration test `test_top_level_include_from_external_source_accepted`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed d6b625a8ae1 (fix) and merged master (ec66067e82f), addressing the AI-Review Request changes verdict on 256d06be630.

Fix — top-level <include> from external <include_from> sources (ServerSettings.cpp)

checkUnknownSettings skipped every external <include_from> source, treating it as a pure incl lookup table with no top-level contribution. But ConfigProcessor::doIncludesRecursive expands a top-level <include incl="X"/> by inserting the children of node X into the root, so those child tags become real top-level keys of the merged config — and were wrongly rejected with UNKNOWN_ELEMENT_IN_CONFIG, a regression for a previously valid substitution.

The fix collects top-level <include incl="X"/> reference names from the server config only (the main config and merged config.d/conf.d fragments, never the users config), resolves <X> in each <include_from> source exactly as the processor does (getRootNode(include_from)->getNodeByPath(name)), and exempts only the imported children — nothing else, so an unrelated typo stays rejected. Added the positive regression test test_top_level_include_from_external_source_accepted; the negative lookup-table case (test_external_include_from_source_does_not_exempt_unknown_key) is unchanged.

Re-merge

Merged master (was 664 commits behind, merge-base 2026-06-30): known_keys is auto-derived from ServerSettingsImpl().all(), so being far behind risks the validator false-positive-rejecting master's newer default configs across the full test suite. master had also modified ServerSettings.cpp; the merge was conflict-free and the file still compiles.

Triage of the CI reds on 256d06be630 (a fresh run re-evaluates on ec66067e82f)

  • Stress test (arm_msan)Hung check failed, possible deadlock found. Master-wide flake tracked in Hung check failed, possible deadlock found #107941 (32 distinct PRs today, 183 on 2026-06-30). The stack is a server-side AST-fuzzer query (executeASTFuzzerQueriesTCPHandler::run) with no config-validation frames; a config-key validator cannot cause a hung fuzzer query.
  • CH Inc sync — internal sync; re-evaluates on the fresh head.

Comment thread src/Core/ServerSettings.cpp
Comment thread src/Core/ServerSettings.cpp
…v/from_zk and the `/etc/metrika.xml` default

`ServerSettings::checkUnknownSettings` only mirrored the `incl` attribute of a
top-level `<include>` and only picked up explicit `<include_from>` sources. Two
previously valid configurations were therefore wrongly rejected with
`UNKNOWN_ELEMENT_IN_CONFIG` (AI-Review "Request changes" on `ec66067e`):

- A top-level `<include from_env="X"/>` or `<include from_zk="X"/>` is expanded by
  `doIncludesRecursive` by importing the *children* of the referenced node as
  top-level keys, but only `incl` was handled. `from_env` is now resolved
  deterministically (the environment value is wrapped as `<from_env>...</from_env>`
  and its children are exempted, exactly as `ConfigProcessor` does). `from_zk`
  cannot be resolved without a ZooKeeper connection at config-validation time, so
  the check is skipped (fail open, with a warning) when such a top-level element
  is present, rather than falsely rejecting the imported keys.

- When the server config declares no `<include_from>`, `ConfigProcessor` falls
  back to the default substitution source `/etc/metrika.xml`. A top-level
  `<include incl="X"/>` then imports `<X>`'s children from it. The validator now
  seeds the same default so those imported keys are exempted.

Added integration regression tests `test_top_level_include_from_env_accepted`,
`test_top_level_include_from_zk_accepted`, and
`test_metrika_default_include_source_top_level_include_accepted`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed fbf709911c7 addressing the AI-Review Request changes verdict on ec66067e — both Majors, which are compatibility regressions where a previously valid <include>-based config would now be wrongly rejected with UNKNOWN_ELEMENT_IN_CONFIG. The fix is confined to the top-level-<include> handling in ServerSettings::checkUnknownSettings.

1. Top-level <include> with from_env / from_zk (ServerSettings.cpp:2465)

The scan only recorded the incl attribute, but doIncludesRecursive expands a top-level <include> for any of the three SUBSTITUTION_ATTRS (incl, from_env, from_zk), importing the referenced node's children as top-level keys. It now mirrors all three:

  • from_env is resolved deterministically: the environment value is wrapped as <from_env>VALUE</from_env> and parsed exactly as the processor does, and its imported children are added to referenced_top_level_keys.
  • from_zk cannot be resolved without a ZooKeeper connection at config-validation time (it runs before <zookeeper> is validated). So when a top-level <include from_zk=.../> is present the whole check is skipped (fail open, with a LOG_WARNING) rather than falsely rejecting the imported keys — a valid config is never refused; a genuine typo alongside such an include is not caught. This extends the existing documented from_zk limitation and is noted in the PR description.

2. Implicit /etc/metrika.xml default include source (ServerSettings.cpp:2504)

When the server config declares no <include_from>, ConfigProcessor::processConfig still uses /etc/metrika.xml if it exists. The validator now seeds the same default (only when it exists and there is a top-level <include incl=X/> to resolve against it), so <X>'s imported children are exempted.

Tests (integration, all reload/start-based)

  • test_top_level_include_from_env_accepted — dedicated instance whose env_variables supply <my_env_imported_section>...; the node must start.
  • test_top_level_include_from_zk_accepted — creates the ZooKeeper node, reloads a config with a top-level <include from_zk=.../>; reload must succeed.
  • test_metrika_default_include_source_top_level_include_accepted — no <include_from>, injects /etc/metrika.xml and a top-level <include incl=.../>; reload must succeed.

ServerSettings.cpp was verified to compile clean (-fsyntax-only, -Werror, this branch's headers, 0 diagnostics); test.py and the new configs parse clean. Both AI-Review threads replied to and resolved. No master re-merge: the branch is MERGEABLE with no conflicts and the merge-base is from earlier today.

alexey-milovidov and others added 2 commits July 1, 2026 10:27
…n config section

The `Fast test` job's `Start ClickHouse Server` step failed with
`Code: 137. DB::Exception: Unknown element
'user_defined_executable_function_drivers_config' found in config`
(`UNKNOWN_ELEMENT_IN_CONFIG`), so the server did not start.

That top-level key was added to the default `programs/server/config.xml`
in `b767a27c206` ("Load executable UDF drivers during server startup").
It is read directly from the config by
`Context::loadUserDefinedExecutableFunctionDrivers` via
`getMultipleValuesFromConfig`, not declared in `ServerSettings`, so the
new unknown-config-option check did not recognize it and rejected a
valid config.

Add it to `known_complex_sections`, next to its sibling
`user_defined_executable_functions_config`. The two accompanying
`ServerSettings` (`allow_experimental_executable_udf_drivers` and
`dynamic_user_defined_executable_functions_path`) are `DECLARE`d
settings and are already covered by `known_keys`.

Also add the key to the `existing_keys.xml` regression fixture, whose
purpose is to guarantee that top-level keys consumed outside
`ServerSettings` keep the server starting.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100332&sha=fbf709911c71a573640c84af8aaa6e071bfd370e&name_0=PR&name_1=Fast%20test

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/Core/ServerSettings.cpp
…nknown-key check

`ServerSettings::checkUnknownSettings` recognizes arbitrary-named
`GraphiteMergeTree` rollup sections by their shape via the
`looks_like_graphite_rollup` helper, which returned `false` for a section with
zero children — so such a section was rejected with `UNKNOWN_ELEMENT_IN_CONFIG`
and the server did not start.

But `setGraphitePatternsFromConfig` only throws `NO_ELEMENTS_IN_CONFIG` when the
section is *missing*: a present-but-empty section (e.g. `<retention_cfg/>`) is
accepted, keeping the default column names and an empty pattern list. So the new
check rejected a previously valid config — a regression.

Accept the empty case, mirroring the parser exactly. An empty section takes no
effect, so exempting it does not weaken typo detection in any meaningful way.

Added the integration regression test `test_graphite_empty_section_accepted`
(a present-but-empty `<retention_empty/>` section; the node must start).

Addresses the AI-Review finding at `src/Core/ServerSettings.cpp:2794`.
#100332

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed 76bacb45a5e addressing the AI-Review Request changes verdict on 85a141b0 (the two GraphiteMergeTree findings).

Fixed — empty arbitrary-named Graphite section (ServerSettings.cpp:2794, compatibility regression)

looks_like_graphite_rollup returned false for a present-but-empty section, so a config like <retention_cfg/> was rejected with UNKNOWN_ELEMENT_IN_CONFIG and the server did not start. But setGraphitePatternsFromConfig only throws NO_ELEMENTS_IN_CONFIG when the section is missing; an existing empty section is accepted (default column names + empty pattern list). The helper now returns true for the empty case, mirroring the parser. Added the regression test test_graphite_empty_section_accepted.

Verified -fsyntax-only -Werror clean on ServerSettings.cpp (0 diagnostics); test.py py_compiles and the new config XML is well-formed. A full integration run was not feasible this session (the branch is 85 commits from master, no incremental build cache), so CI's integration job is authoritative for the new test. The fix is a loosening (accept → exempt), so it cannot convert a valid config into a rejected one — it cannot cause the server to fail to start.

Deferred as a design decision — function-name validation (ServerSettings.cpp:2737, false-negative)

Fully mirroring appendGraphitePattern's function handling is not possible in checkUnknownSettings: it is a static method with no ContextPtr, whereas the parser resolves the value with getAggregateFunctionNameAndParametersArray(..., context) and constructs the function via AggregateFunctionFactory::instance().get(name, action, {Float64}, params_row, properties) before rejecting one that allocatesMemoryInArena. A partial name-only check would still miss the arena case and risks a new false positive (e.g. rejecting a valid <function>quantile(0.5)</function>), turning a recognition miss into a server-won't-start regression — which I can't verify safely here without a build. This is best-effort recognition (the <skip_check_for_incorrect_settings> escape hatch remains, and a bogus function is still caught by appendGraphitePattern when a table references the section), so I left that thread open for a call: accept the limitation as-is, or thread a ContextPtr in so the recognition can delegate to the real parser.

No master re-merge: MERGEABLE, no conflicts, merge-base is from earlier today (2026-07-01 10:16Z).

bool has_function_or_retention = false;
for (const auto & rule_child : rule_children)
{
const bool ok = rule_child == "regexp" || rule_child == "function"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appendGraphitePattern also validates the regexp payload semantically: after tag_list rewriting it constructs OptimizedRegularExpression and throws CANNOT_COMPILE_REGEXP on malformed patterns. graphite_rule_valid currently accepts any <regexp> child by name alone, so a typo section like <some_typo><pattern><regexp>(</regexp><function>any</function></pattern></some_typo> is still exempted from the new unknown-key check even though the real GraphiteMergeTree parser would reject it.

Please compile the regex here (mirroring the tag_list path too) or delegate this rule check to appendGraphitePattern, and add a negative case for an invalid Graphite regexp.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same best-effort-recognition limitation as the <function> thread above, and it can't be cleanly delegated to the parser either.

In appendGraphitePattern the regexp is compiled at the very end, after the key loop, and for a tag_list rule the raw <regexp> value is first rewritten by buildTaggedRegex and only then passed to OptimizedRegularExpression. Mirroring that here would mean either:

  • calling appendGraphitePattern — but the same loop resolves <function> via getAggregateFunctionNameAndParametersArray(..., context) and AggregateFunctionFactory, so it needs the ContextPtr this static method doesn't have (the blocker from the <function> thread); or
  • reproducing the compile path standalone — a second copy of buildTaggedRegex (which itself indexes tags[0] with no empty-guard) plus the OptimizedRegularExpression construction, inside a static config check.

The standalone copy's dangerous failure mode is a false positive: for a tag_list rule the raw value is not what the parser actually compiles (the buildTaggedRegex output is), so any divergence rejects a valid Graphite section and the server fails to start. Today such an arbitrarily-named section (with no table referencing it yet) lets the server boot and only fails at table creation with CANNOT_COMPILE_REGEXP; converting that into a startup rejection with UNKNOWN_ELEMENT_IN_CONFIG is a compatibility change I don't want to ship without an integration run proving no valid config is newly rejected.

So this stays best-effort recognition — the <skip_check_for_incorrect_settings> escape hatch remains, and a genuinely malformed regexp is still caught by appendGraphitePattern when a GraphiteMergeTree table references the section. Leaving it open for the same call as the <function> thread: accept the limitation, or thread a ContextPtr through so the recognition can delegate to the real parser (which would then cover the function resolution, the regexp compile, and the tag_list rewrite in one place).

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Status on 76bacb45a5e:

  • AI-Review "Request changes" — both findings are the GraphiteMergeTree semantic false-negatives. The <function>-value one is already dismissed as best-effort by design (above). The new <regexp>-compile one (ServerSettings.cpp:2749) is the same class: it can't be delegated to appendGraphitePattern without a ContextPtr (which the static check lacks and which the parser needs to resolve <function> before it ever compiles the regexp), and a standalone copy of buildTaggedRegex + OptimizedRegularExpression risks false-positive-rejecting a valid tag_list config and stopping the server from starting. Full reasoning and the two options in the inline thread; left open as a design call — no unverifiable tightening pushed.
  • Public CI — fresh run on this head, 0 failed (coverage / a couple of flaky shards still in progress).
  • CH Inc sync — the private mirror countSubstrings may hang #53245 is MERGEABLE (no conflict) but still at the stale head e2da1d1045a; its reds are the usual cloud-only flakes (db disk / distributed cache / meta in keeper / SharedCatalog, scattered single shards, not a uniform config rejection). The scheduled IncrementalSync is re-running and re-evaluates on the current head.
  • No master re-mergeMERGEABLE, no conflicts, merge-base 6ff851961b5 from 10:16Z today; the only red is CH Inc sync, which a re-merge would not change.

The previous commit `76bacb45a5e` changed `looks_like_graphite_rollup` to
return `true` for a childless section, so that a present-but-empty
`GraphiteMergeTree` rollup section (e.g. `<retention_cfg/>`) keeps starting.
But `Poco::AbstractConfiguration::keys` reports only child *elements*, so a
scalar leaf key such as `<max_thred_pool_size>16</max_thred_pool_size>` (a
typo'd setting — precisely what this check exists to catch) also has zero
children and was therefore exempted from `UNKNOWN_ELEMENT_IN_CONFIG`. This
regressed the core of the feature: every unknown top-level key that carries a
scalar value silently passed, so all the negative `test_unknown_config_option`
cases (`test_unknown_config_option_rejected`, `test_users_prefix_typo_rejected`,
the `..._not_exempted` / reload cases, ...) failed uniformly across every
integration-test shard.

A `GraphiteMergeTree` section is always a container and never carries a scalar
value of its own, so distinguish the two: exempt a childless section only when
its (whitespace-trimmed) value is empty — a genuine empty container — and let a
childless section with a non-empty value fall through to be rejected. This
keeps `test_graphite_empty_section_accepted` (`<retention_empty/>`) passing
while restoring rejection of valued leaf typos.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100332&sha=76bacb45a5ea5b683284560152c406c674e9706d
Related: #100332

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Pushed 6b91d23e469 fixing a real, uniform integration-test failure on 76bacb45a5e. After that head's shards finished, all 55 failures were test_unknown_config_option (every negative case, across every shard: asan/arm/tsan/msan/coverage) — CI report. The server was starting successfully with a clearly unknown top-level key (caught_exception empty, reloads succeeding), i.e. the check's core was silently disabled.

Root cause

test_unknown_config_option_rejected last passed on 85a141b01de and failed starting exactly at 76bacb45a5e. The only functional change in that commit was, in looks_like_graphite_rollup:

    if (children.empty())
-       return false;
+       return true;

The intent was to accept a present-but-empty GraphiteMergeTree section (<retention_cfg/>). But Poco::AbstractConfiguration::keys reports only child elements, so a scalar leaf key such as <some_completely_unknown_option>1</...> (a typo'd setting — precisely what this check exists to catch) also has zero children and was therefore exempted from UNKNOWN_ELEMENT_IN_CONFIG. Every unknown top-level key carrying a scalar value silently passed, defeating the feature's primary purpose.

Fix

A GraphiteMergeTree section is always a container and never carries a scalar value of its own, so the two cases are distinguishable: exempt a childless section only when its (whitespace-trimmed) value is empty — a genuine empty container — and let a childless section with a non-empty value fall through to be rejected.

    if (children.empty())
        return Poco::trim(config.getString(section, "")).empty();

Verified against every failing negative case (all inject valued leaf keys: <...>1</...> / text → now rejected) and against test_graphite_empty_section_accepted (<retention_empty/>, valueless → still accepted).

This is orthogonal to the two open graphite_rule_valid threads (the <function>/<regexp> semantic-validation design calls at lines 2749/2750) — those are untouched.

Comment thread src/Core/ServerSettings.cpp
Comment thread src/Core/ServerSettings.cpp Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Status on 6b91d23e469:

  • Public CI — fresh run on this head, 0 failed (16 jobs, incl. coverage / integration, still in progress).

  • CH Inc sync (the sole red) — the private mirror countSubstrings may hang #53245 is MERGEABLE (no conflict) but still at the stale head e2da1d1045a, not yet re-mirrored to this head. Its 7 reds are the usual Cloud-only flakes on scattered single shards (db disk 6/7 ×2, distributed cache / meta in keeper / s3 storage 1/2, amd_tsan 5/6, amd_msan 5/6, SharedCatalog + meta in keeper stress ×2 — everything else green), not a uniform config rejection: if this PR's check were wrongly rejecting a Cloud config key, the server would fail to start on every shard (as in the earlier enable_http_stacktrace / format_alter_operations_with_parentheses cases), which is not what we see here. The scheduled IncrementalSync re-evaluates on the current head.

  • AI-Review "Request changes" (16:07Z on this head) — the two non-dismissed Majors are the same design-call class as the already-dismissed <function> / <regexp> graphite findings, left open for a decision rather than tightened blind:

    • graphite_rule_valid retention ordering (ServerSettings.cpp:2760). Mirroring appendGraphitePattern's compareRetentions — a throw-if-not-monotonic comparator invoked inside ::sort (Graphite.cpp:173), sensitive to ties / equal ages, the single-retention case, and the TypeRetention-only guard — in a standalone static check is subtle. More importantly it is the same compatibility trade-off as the other graphite findings: the rollup section is not parsed at server start (only at GraphiteMergeTree table creation), so an out-of-order-retention section currently boots; rejecting it at startup with UNKNOWN_ELEMENT_IN_CONFIG is a behavior change whose downside is a false positive that stops the server from starting.
    • top-level <include incl="X"/> vs users-config include_from (ServerSettings.cpp:2639). Narrowing which include_from sources a server <include> resolves against re-implements ConfigProcessor's server-vs-users scoping; getting it wrong wrongly rejects a legitimately imported top-level key — again a false positive that stops the server from starting.

    Both would need a full build plus the two suggested regression tests to verify safely, which is why they are not tightened in this pass. Leaving them as a design call.

  • No master re-mergeMERGEABLE, merge-base 6ff851961b5 from 10:16Z today (behind only by velocity); the only red is CH Inc sync, which a re-merge would not change.

Addresses the two remaining (non-dismissed) findings of the AI review on this
PR, both false negatives in `ServerSettings::checkUnknownSettings` that let a
genuinely unknown top-level key escape validation:

1. `GraphiteMergeTree` retention ordering. `graphite_rule_valid` validated that
   each `retention*` block has an unsigned `age`/`precision`, but did not mirror
   `appendGraphitePattern`'s final `::sort(..., compareRetentions)`, which throws
   `BAD_ARGUMENTS` ("Age and precision should only grow up") unless every compared
   pair's `age` and `precision` grow together. An arbitrary top-level section whose
   retentions are not co-monotonic therefore passed `looks_like_graphite_rollup`
   even though the real parser rejects the rule. Fix: collect the `(age, precision)`
   pairs in config order and replicate that exact `::sort` (same comparator logic,
   same input), treating a throw as "not a valid rollup rule". Because a valid
   (co-monotonic) rule never triggers a throwing comparison regardless of the sort's
   internal order, this only rejects sections the parser also rejects — it never
   turns a valid rollup config into a startup failure. Unlike the deliberately
   deferred `<function>`/`<regexp>` checks, this needs no `ContextPtr`.

2. `include_from` source separation. Top-level server `<include incl="X"/>` refs
   were resolved against every discovered `include_from` source, including the
   active users config's source. `ConfigProcessor` resolves a server `<include>`
   only against the server config's own `<include_from>`, never the users config's,
   so a users source that happens to define `<X>` could whitelist a top-level server
   key the server config never imports. Fix: track the server-side `include_from`
   sources (server-config files, the merged server config's `include_from`, and the
   default `/etc/metrika.xml`) separately and resolve top-level include refs only
   against them; a users-config source is still consulted for the merged-fragment
   exemption but never for include-ref resolution.

Adds two focused negative integration tests in `test_unknown_config_option`:
`test_graphite_retention_bad_ordering_rejected` (two `retention*` blocks whose
`age` ascends while `precision` descends) and
`test_users_config_include_from_source_does_not_exempt_server_key` (server include
ref plus a users-config `include_from` source defining the referenced node).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/// scalar value of its own, so a childless section with a non-empty value is not a rollup
/// section: exempt the empty container, but let the valued leaf fall through and be rejected.
if (children.empty())
return Poco::trim(config.getString(section, "")).empty();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This empty-section carveout is broader than the GraphiteMergeTree compatibility case it is trying to preserve. Any unknown top-level tag with no child elements now reaches this branch, so a config like <my_typo/> or <my_unknown></my_unknown> is silently accepted as “Graphite-like” just because config.getString(section, "") trims to empty.

That breaks the PR’s core contract for one of the simplest typo shapes: empty unknown options are ignored again instead of throwing UNKNOWN_ELEMENT_IN_CONFIG. Unless we have some independent signal that the section is actually referenced by GraphiteMergeTree, this branch needs to reject the empty case rather than treating every childless empty tag as a valid rollup section.

@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.70% 92.60% -0.10%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 493/515 (95.73%) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants