Add embedded documentation and a system table for dictionary sources#106184
Conversation
There was no system table exposing the available dictionary sources. This adds `system.dictionary_sources`, and attaches the shared `Documentation` struct (introduced for table engines) to dictionary sources. - `DictionarySourceFactory::registerSource` now takes a final `Documentation` argument, stored in a per-source map, with a `getDocumentation` accessor. - The new `system.dictionary_sources` table exposes `name` and the embedded documentation columns `description`, `syntax`, `examples`, `introduced_in`, and `related`. - All 16 sources (`clickhouse`, `mysql`, `postgresql`, `mongodb`, `redis`, `cassandra`, `file`, `executable`, `executable_pool`, `http`, `library`, `odbc`, `jdbc`, `ytsaurus`, `null`, `yamlregexptree`) get a description and syntax. Sources share a single combined documentation page, so concise descriptions are used. This is a follow-up to the embedded-documentation changes for table engines, database engines, data types, formats, and dictionary layouts, and reuses the `Documentation` struct from `src/Common/Documentation.h`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The new `system.dictionary_sources` documentation page linked to the non-existent `/sql-reference/dictionaries` path, breaking the `Build docusaurus` job. Repoint it to the existing combined dictionary sources reference page `/sql-reference/statements/create/dictionary/sources`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The embedded `syntax` for the `ytsaurus` dictionary source used `http_proxy_url`, but the source reads only `http_proxy_urls` from the configuration, so a user copying the documented `SOURCE(YTSAURUS(...))` would get a missing-key exception. Use the real key `http_proxy_urls` and add a regression check in `04304_dictionary_sources_documentation` that pins the documented key to the one the source actually reads. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `jdbc` dictionary source creator unconditionally throws `SUPPORT_IS_DISABLED`, so listing it in `system.dictionary_sources` with a description that reads as a usable source is misleading. State in the embedded description that the source is currently disabled, pending consistent support for nullable fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sources whose support is compiled out (`cassandra`, `mongodb`, `ytsaurus`, `yamlregexptree`, `mysql`, `postgresql`) are still registered so that creating such a dictionary throws a helpful `SUPPORT_IS_DISABLED` exception rather than `UNKNOWN_ELEMENT_IN_CONFIG`. Before this change, `system.dictionary_sources` still showed a working-looking description for them, so a user could copy a `SOURCE(...)` clause that the current build cannot use. Append a build-time note to the `description` (guarded by the same `USE_*` macros) so the embedded documentation reflects the disabled state of the current build, mirroring the existing treatment of the permanently disabled `jdbc` source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The failures of They were introduced by #104690 ("Add #104690 was merged in violation of the ClickHouse team rules: its own CI already showed these two tests failing (10 times between May 12 and June 1) before it was merged. Please update your branch to pick up the revert; the tests should pass again. |
…ocumentation # Conflicts: # src/Storages/System/attachSystemTables.cpp
The `syntax` column of `system.dictionary_sources` shows the structure of the `SOURCE` clause, but some sources are subject to access control when a dictionary is created from a DDL query rather than from a server configuration file. Copying such a row into a `CREATE DICTIONARY` query could fail at runtime. Reword the `syntax` column description to clarify it documents the clause structure (not that it is always permitted as DDL), and add per-source notes to the affected descriptions: - `executable` and `executable_pool` cannot be created from DDL at all (only from a server configuration file), for security reasons. - `file`, `library`, and `yamlregexptree` accept DDL only when the path is inside the configured safe directory (`user_files` or the dictionaries library directory). Keep the `docs/en/operations/system-tables/dictionary_sources.md` page in sync with the new `syntax` column description. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked up the PR:
Built clean and verified locally: CI was green before the merge. |
|
Merged The only red was @groeneai, please take a look at the flakiness of |
|
@alexey-milovidov, fix posted as #106680. Investigation summary. PR #103586 (merged 2026-06-05 by @ tiandiwonder) added Mechanism. Fix in #106680. Lowers the throttle limit and dataset 5x (1 MB/s -> 200 KB/s, 1e6 -> 2e5 rows). Wall-clock stays ~8s, with a 4.5x safety margin against the worst observed CI natural rate. Local 10/10 iterations against S3 produced |
The test asserts that ProfileEvents['UserThrottlerSleepMicroseconds'] is greater than half of the target query duration. The throttler only sleeps when the natural read rate exceeds the configured throttle limit; when the natural rate drops close to the limit the token bucket never goes negative and the throttler skips its sleep path (see `Throttler::throttle` in `src/Common/Throttler.cpp`). PR ClickHouse#103586 added `no-random-settings` to remove S3-prefetch settings as one source of slow natural rate, but contention on the `Stateless tests (amd_tsan, parallel, 2/2)` runner still occasionally drives the natural S3 read rate close to or below the previous 1 MB/s limit. Post-merge CIDB: `pull_request_number != 0 AND check_start_time > 2026-06-05` shows 4 sightings (PRs ClickHouse#106222, ClickHouse#102039, ClickHouse#49966, ClickHouse#106184) all on amd_tsan parallel 2/2 with the `read 1 1 0` shape (duration ok, bytes ok, sleep below threshold). All four PRs already include the `no-random-settings` tag, so the remaining flakiness is contention-driven, not random-settings driven. Drop the throttle limit and dataset 5x so the throttle is well below any plausible natural rate (200 KB/s vs ~0.9 MB/s observed worst case = 4.5x safety margin). Test wall-clock stays at ~8s. Local 10/10 runs against an S3 disk produced sleep_us between 16.8s and 17.3s, comfortably above the 3.5s threshold. Closes: ClickHouse#103422 Related: ClickHouse#103586 Related: ClickHouse#106184 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ry source The `ytsaurus` dictionary source is gated by the `allow_experimental_ytsaurus_dictionary_source` setting: creating a dictionary with `SOURCE(YTSAURUS(...))` throws `UNKNOWN_STORAGE` unless that setting is enabled. The `system.dictionary_sources` description did not mention this, so the row looked as readily usable as the other remote sources and a user copying the syntax would get an exception. Mention the experimental setting in the description, and pin that text in the stateless test alongside the existing `http_proxy_urls` check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 113/116 (97.41%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |

Sixth PR in the series that adds embedded, runtime-introspectable documentation to ClickHouse component registries (after table engines #106177, database engines #106178, data types #106180, formats #106181, and dictionary layouts #106182). This one covers dictionary sources — and, since there was no system table for them, adds one.
What it does:
system.dictionary_sourcestable listing every dictionary source with embedded documentation columnsdescription,syntax,examples,introduced_in, andrelated.Documentationstruct (src/Common/Documentation.h) throughDictionarySourceFactory::registerSource, stores it in a per-source map, and adds agetDocumentationaccessor.clickhouse,mysql,postgresql,mongodb,redis,cassandra,file,executable,executable_pool,http,library,odbc,jdbc,ytsaurus,null,yamlregexptree.Note on doc embedding: like dictionary layouts, dictionary sources are documented on a single combined reference page rather than one page per source, so this PR uses accurate concise descriptions (with
syntaxandrelated) rather than embedding a full per-source markdown page.Note: this PR shares
src/Common/Documentation.h/.cppwith the earlier PRs in the series (trivial add/add on merge).Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added a new
system.dictionary_sourcestable that lists the available dictionary sources together with embedded documentation (description,syntax,examples,introduced_in,related).Documentation entry for user-facing changes
system.dictionary_sourcesand a newsystem.dictionary_sourcessystem-table page).Version info
26.6.1.528