Add embedded documentation and a system table for dictionary layouts by alexey-milovidov · Pull Request #106182 · ClickHouse/ClickHouse · GitHub
Skip to content

Add embedded documentation and a system table for dictionary layouts#106182

Merged
alexey-milovidov merged 12 commits into
masterfrom
dictionary-layout-documentation
Jun 13, 2026
Merged

Add embedded documentation and a system table for dictionary layouts#106182
alexey-milovidov merged 12 commits into
masterfrom
dictionary-layout-documentation

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented May 31, 2026

Copy link
Copy Markdown
Member

Fifth PR in the series that adds embedded, runtime-introspectable documentation to ClickHouse component registries (after table engines #106177, database engines #106178, data types #106180, and formats #106181). This one covers dictionary layouts — and, since there was no system table for them, adds one.

What it does:

  • Adds a new system.dictionary_layouts table listing every dictionary layout with its is_complex flag and embedded documentation columns description, syntax, examples, introduced_in, and related.
  • Threads the shared Documentation struct (src/Common/Documentation.h) through DictionaryFactory::registerLayout, stores it in RegisteredLayout, and adds a getDocumentation accessor.
  • Populates all 21 layouts (flat, hashed, sparse_hashed, hashed_array, cache, ssd_cache, direct, range_hashed, ip_trie, regexp_tree, polygon*, and their complex_key_* variants).

Note on doc embedding: unlike engines/types/formats, dictionary layouts are all documented on a single combined reference page rather than one page per layout, so this PR uses accurate concise descriptions (with syntax and related) rather than embedding a full per-layout markdown page. The combined page therefore stays for now.

Note: this PR shares src/Common/Documentation.h/.cpp with the earlier PRs in the series (trivial add/add on merge).

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a new system.dictionary_layouts table that lists the available dictionary layouts together with embedded documentation (description, syntax, examples, introduced_in, related).

Documentation entry for user-facing changes

  • Documentation is provided as part of this change (embedded documentation in system.dictionary_layouts and a new system.dictionary_layouts system-table page).

Version info

  • Merged into: 26.6.1.762

There was no system table exposing the available dictionary layouts. This adds
`system.dictionary_layouts`, and attaches the shared `Documentation` struct
(introduced for table engines) to dictionary layouts.

- `DictionaryFactory::registerLayout` now takes a final `Documentation`
  argument, stored in `RegisteredLayout`, with a `getDocumentation` accessor.
- The new `system.dictionary_layouts` table exposes `name`, `is_complex`, and
  the embedded documentation columns `description`, `syntax`, `examples`,
  `introduced_in`, and `related`.
- All 21 layouts (`flat`, `hashed`, `cache`, `ip_trie`, `polygon`, …) get a
  description and syntax. Layouts share a single combined documentation page, so
  concise descriptions are used rather than embedding a full per-layout page.

This is a follow-up to the embedded-documentation changes for table engines,
database engines, data types, and formats, and reuses the `Documentation`
struct from `src/Common/Documentation.h`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label May 31, 2026
Comment thread docs/en/operations/system-tables/dictionary_layouts.md
alexey-milovidov and others added 3 commits June 1, 2026 09:20
The `Docs check` job failed because the `See also` section linked to
`/sql-reference/dictionaries`, which is not a valid slug, so the Docusaurus
build reported a broken link. Point the link to the combined dictionary
layouts reference page (`/sql-reference/statements/create/dictionary/layouts`),
which is the page that documents all layouts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Master added three 04303_* stateless tests, so the dictionary layouts
documentation test is renumbered to the next free prefix 04305. Verified
it passes against a freshly built binary; system.dictionary_layouts
populates all 21 layouts with non-empty description and syntax.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The failures of 00175_obfuscator_schema_inference and 00096_obfuscator_save_load in Stateless tests (amd_tsan, parallel) are NOT caused by this PR.

They were introduced by #104690 ("Add UntrackedMemory asynchronous metric"), which made clickhouse-obfuscator abort (SIGABRT) on process teardown: the query results are correct, but the Aborted line on stderr fails the test. #104690 has now been reverted (#106365).

#104690 was merged in violation of the ClickHouse team rules: its own CI already showed these two tests failing (10 times between May 12 and June 1) before it was merged.

Please update your branch to pick up the revert; the tests should pass again.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master to resolve an add/add conflict in src/Storages/System/attachSystemTables.cpp (the new system.data_skipping_index_types registration landed adjacent to system.dictionary_layouts — kept both). After the merge the shared Common/Documentation.h/.cpp from the earlier series PRs are now fully on master, so the diff is cleanly scoped to the dictionary-layout changes and the new system table.

Built clean and empirically verified: system.dictionary_layouts exposes all 21 layouts, the 04305_dictionary_layouts_documentation test output matches the reference exactly, and the "See also" link in the new docs page resolves to /sql-reference/statements/create/dictionary/layouts. CI was already green.

Comment thread src/Storages/System/attachSystemTables.cpp
alexey-milovidov and others added 2 commits June 7, 2026 09:59
Address review: the new `system.dictionary_layouts` table makes its
columns, engine name, and table comment part of the system-table schema
contract. Add a `SHOW CREATE TABLE` of it to the
`02117_show_create_table_system` regression so accidental schema changes
are caught, and update the reference accordingly.

Verified the engine name (`SystemDictionaryLayouts`), columns, and
comment against a freshly built binary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/System/attachSystemTables.cpp

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@alexey-milovidov alexey-milovidov self-assigned this Jun 9, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 9, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 9, 2026
alexey-milovidov and others added 3 commits June 8, 2026 19:39
…ocumentation

# Conflicts:
#	src/Storages/System/attachSystemTables.cpp
Add `dictionary_layouts` to the `always_accessible_tables` list in
`ContextAccess::addImplicitAccessRights`, alongside the other registry
introspection tables (`system.formats`, `system.data_type_families`,
`system.disk_types`, etc.). Without this, users without explicit grants
keep implicit `SELECT` on the comparable documentation tables but not on
the new `system.dictionary_layouts`, making it unavailable when
`access_control_improvements.select_from_system_db_requires_grant=true`.

Update `03247_show_grants_with_implicit` accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The only CI failure here is unrelated to this PR. 04033_tpc_ds_q04 failed with Code: 159 ... Timeout exceeded: elapsed 60021 ms, maximum: 60000 ms in Stateless tests (arm_asan_ubsan, azure, parallel) — a heavy TPC-DS query timing out under the combined ASan+UBSan build. CIDB confirms it fails on master (pull_request_number = 0) and across many unrelated PRs (#95659, #106650, #106918, #106865, #106020, #105847), so it is a master-wide timeout flake, not caused by the dictionary-layouts changes.

@groeneai, could you investigate this failure: https://github.com/ClickHouse/ClickHouse/actions/runs/27180382496/job/80245166901 and provide a fix in a separate PR? If a fix is already in progress, please link it here.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov Investigated. This is not a q04 timeout or slowness bug, it is the runner-wide RSS-contention OOM on the arm_asan_ubsan, azure, parallel shard, and the Code: 159 timeout is the driver-side symptom of it.

Evidence from CIDB for the exact run you linked (q04 on this PR, 2026-06-09 12:09:37):

  • The server log for that failure is Code: 241 ... (total) memory limit exceeded ... current RSS: 53.78 GiB, maximum: 53.69 GiB. OvercommitTracker decision: Query was selected to stop, while reading tpcds.customer. q04's own footprint at kill time was 3.41 GiB, so it was selected as a victim, not the cause.
  • In the peak window (2026-06-09 19:00 UTC), 199 distinct tests on this shard were OvercommitTracker-killed at the identical 53.69 GiB cap. q04 is one of ~200 victims of the same host-level event (same family as q09/q23/q61).
  • q04 is not slow on this shard: 5673 OK runs averaging 11.5s (max 40.8s), well under the 60s budget. Raising its per-test timeout would not prevent the 241 kill, and capping its memory would only make this 3.4 GiB query fail sooner while the real hogs keep running.

So no per-test fix is warranted. Disabling it would just shift the OOM to the next query, which is why I closed the equivalent q61 disable (#106797) after rienath's pushback. q35 stays disabled because it OOMs standalone with reasonable memory; q04 and q61 only die under shard-wide contention.

No separate fix PR is in progress to link. The contention has self-resolved: zero q04 failures since 2026-06-11 00:00 UTC, coinciding with the revert #107122 of the multistage-distributed-queries change (#106020) that drove the heap-corruption and thread-teardown storm on the asan shards from 06-08 to 06-10.

The only real fix is infra-level (lower per-shard test parallelism or raise the server memory budget on this runner), which is a shard-wide CI decision rather than a q04 change. Want me to open that, or is the self-resolution acceptable for now?

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.40% +0.10%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 153/154 (99.35%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into master with commit df91f51 Jun 13, 2026
166 checks passed
@alexey-milovidov alexey-milovidov deleted the dictionary-layout-documentation branch June 13, 2026 20:43
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants