Add support for complex types in dictionary by ylw510 · Pull Request #98627 · ClickHouse/ClickHouse · GitHub
Skip to content

Add support for complex types in dictionary#98627

Merged
Avogar merged 21 commits into
ClickHouse:masterfrom
ylw510:feat-support_complex_types_for_dictionary
Apr 13, 2026
Merged

Add support for complex types in dictionary#98627
Avogar merged 21 commits into
ClickHouse:masterfrom
ylw510:feat-support_complex_types_for_dictionary

Conversation

@ylw510

@ylw510 ylw510 commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add support for Map and JSON/Object types as dictionary attributes. Now dictionaries can store and retrieve complex types including Map(String, String), Map(String, Array(String)), JSON, and Nullable(JSON) types in both FLAT and HASHED layouts.

Closes #86829

Detailed description

This PR adds support for complex types (Map and JSON/Object) as dictionary attributes, extending the existing dictionary functionality to handle nested and structured data types.

Supported dictionary layouts:

  • FLAT
  • HASHED
  • HashedArray
  • RangeHashed
  • IPAddress
  • Polygon
  • Cache

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Documentation should include:

Motivation:
This PR addresses Issue #86829 which requested support for JSON type in dictionary attributes. Previously, attempting to create a dictionary with JSON attributes resulted in DB::Exception: Unknown type JSON for dictionary attribute. (UNKNOWN_TYPE).

Dictionaries are commonly used for lookups and data enrichment. Supporting Map and JSON types allows users to store and retrieve complex, nested data structures directly from dictionaries, enabling more flexible data modeling and reducing the need for multiple dictionary lookups or post-processing. This feature enables use cases such as:

  • Storing unstructured or semi-structured data in dictionaries
  • Accessing nested JSON fields directly via dictionary lookups (e.g., dictGet('dict', 'json_field', key).nested.field)
  • Using Map types for key-value lookups within dictionary attributes

Parameters:

  • Dictionary attribute types can now be declared as:
    • Map(KeyType, ValueType) where KeyType is typically String and ValueType can be String, Array, or numeric types
    • JSON or Nullable(JSON) for JSON/Object data
  • All existing dictionary layouts (FLAT, HASHED, etc.) support these types
  • Dictionary operations (dictGet, dictGetOrDefault, etc.) work seamlessly with Map and JSON types

Example use:

-- Step 1: Create source table with Map and JSON columns
CREATE TABLE users
(
    user_id UInt64,
    name String,
    metadata Map(String, String),
    profile JSON
)
ENGINE = Memory;

-- Step 2: Insert sample data
INSERT INTO users VALUES
(1, 'Alice', {'age': '30', 'city': 'New York'}, '{"age": 30, "city": "New York", "country": "USA", "address": {"street": "123 Main St", "zip": "10001"}}'),
(2, 'Bob', {'age': '25', 'city': 'London'}, '{"age": 25, "city": "London", "country": "UK", "address": {"street": "456 High St", "zip": "SW1A 1AA"}}');

-- Step 3: Create dictionary with Map and JSON attributes
CREATE DICTIONARY user_profiles
(
    user_id UInt64,
    name String,
    metadata Map(String, String),
    profile JSON
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(MIN 0 MAX 0);

-- Step 4: Query Map values using dictionary
SELECT dictGet('user_profiles', 'metadata', 1)['city'] as city;
-- Result: New York

-- Step 5: Query JSON values using dictionary
SELECT dictGet('user_profiles', 'profile', 1).age as age;
-- Result: 30

SELECT dictGet('user_profiles', 'profile', 1).city as city;
-- Result: New York

-- Step 6: Access nested JSON fields
SELECT dictGet('user_profiles', 'profile', 1).country as country;
-- Result: USA

SELECT dictGet('user_profiles', 'profile', 1).address.city as address_city;
-- Result: New York

Backward compatibility:
This change is fully backward compatible. Existing dictionaries continue to work as before, and the new complex types are additive features.


Note

Medium Risk
Expands multiple dictionary storage implementations and attribute handling to support Map and Object/JSON, which touches core lookup and column materialization paths and could surface type/nullable edge cases or memory regressions.

Overview
Adds first-class support for complex dictionary attribute types Map and Object (including JSON/Nullable(JSON)) across major dictionary layouts (e.g. Flat/Hashed/HashedArray/RangeHashed/IPAddress/Polygon/Cache), including proper column creation/insertion paths and container storage selection.

Extends dictionary type plumbing by mapping Map/Object in TypeId/AttributeUnderlyingType, updating default-value extraction for ColumnMap/ColumnObject, and enabling StorageDictionary to supportsDynamicSubcolumns() so JSON subcolumn access works via dictionary-backed tables.

Adds explicit error handling for unsupported binary-encoded types and introduces new gtests plus a stateless SQL/reference test covering Map lookups, JSON field/subcolumn access, and Nullable(JSON) behavior.

Written by Cursor Bugbot for commit 7a2b61f. This will update automatically on new commits. Configure here.

Version info

  • Merged into: 26.4.1.895

@ylw510

ylw510 commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

@Avogar Avogar self-assigned this Mar 3, 2026
@Avogar

Avogar commented Mar 3, 2026

Copy link
Copy Markdown
Member

Sure, I will take a look when I have time

@Avogar Avogar added the can be tested Allows running workflows for external contributors label Mar 9, 2026
@clickhouse-gh

clickhouse-gh Bot commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [ebeb4fd]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, flaky check) failure
03291_json_big_structure_deserialization FAIL cidb IGNORED
01037_polygon_dicts_correctness_fast FAIL cidb IGNORED
01037_polygon_dicts_correctness_fast FAIL cidb IGNORED

AI Review

Summary

This PR adds Map and JSON/Object dictionary attribute support across multiple dictionary layouts, updates dictionary/type plumbing, and extends stateless/gtest coverage (including Nullable(JSON) cases). I reviewed changed C++ paths and tests, and did not find any additional high-confidence correctness, safety, or performance issues beyond comments already posted by clickhouse-gh[bot].

Missing context
  • ⚠️ No CI logs/artifacts were provided in this review request, so runtime/perf validation is based on static analysis and test coverage in the diff.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 9, 2026
@clickhouse-gh

clickhouse-gh Bot commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

@ClickHouse/integrations team, please, take a look

Comment thread src/Dictionaries/DictionaryHelpers.h
Comment thread src/Storages/StorageDictionary.h Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/FlatDictionary.cpp
@@ -262,6 +262,24 @@ class CacheDictionaryStorage final : public ICacheDictionaryStorage
[&](Array & value) { fetched_column.insert(value); },

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds Map/Object handling across several layouts (Cache, RangeHashed, IPAddress, Polygon, HashedArray), but the new stateless test only validates FLAT and HASHED. Please add at least one integration/stateless coverage case for the newly touched non-FLAT/non-HASHED layouts to catch regressions in their distinct fetch paths.

@ylw510

ylw510 commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

@Avogar Avogar left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, loogs good. But we need more tests. Current tests only covers FLAT and HASHED, but the PR modifies 8 layouts. Please add coverage for at least HASHED_ARRAY, RANGE_HASHED, and CACHE.

Even for the tested layouts, these cases are missing:

dictGetOrDefault with a Map/JSON default value
Lookup of a missing key (default value materialization)
JSON attribute with explicitly typed paths, e.g. JSON(city String, age UInt32)

Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/FlatDictionary.cpp Outdated
@ylw510 ylw510 requested a review from Avogar March 20, 2026 14:49
Comment thread src/Dictionaries/FlatDictionary.cpp Outdated
Comment thread src/Dictionaries/HashedDictionary.h
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread tests/queries/0_stateless/03822_map_json_dictionary.sql
Comment thread src/Dictionaries/DictionaryHelpers.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h
@ylw510 ylw510 requested a review from Avogar March 27, 2026 11:36
ylw510 and others added 2 commits April 1, 2026 22:12
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Comment thread src/Dictionaries/RangeHashedDictionary.h Outdated
Comment thread src/Dictionaries/DictionaryHelpers.h
@alexey-milovidov

Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@alexey-milovidov

Copy link
Copy Markdown
Member

The test 02859_replicated_db_name_zookeeper is fixed in #101952

@alexey-milovidov

Copy link
Copy Markdown
Member

The Hung check failure is fixed by #102008 and #102010, let's update the branch

@clickhouse-gh

clickhouse-gh Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 62.75% (320/510) | lost baseline coverage: 4 line(s) · Uncovered code

Full report · Diff report

@Avogar

Avogar commented Apr 13, 2026

Copy link
Copy Markdown
Member

@Avogar Avogar added this pull request to the merge queue Apr 13, 2026
Merged via the queue into ClickHouse:master with commit ca31dea Apr 13, 2026
162 of 164 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support JSON type for dictionaries

4 participants