Add support for complex types in dictionary#98627
Conversation
|
Sure, I will take a look when I have time |
|
Workflow [PR], commit [ebeb4fd] Summary: ❌
AI ReviewSummaryThis PR adds Missing context
ClickHouse Rules
Final Verdict
|
|
@ClickHouse/integrations team, please, take a look |
| @@ -262,6 +262,24 @@ class CacheDictionaryStorage final : public ICacheDictionaryStorage | |||
| [&](Array & value) { fetched_column.insert(value); }, | |||
There was a problem hiding this comment.
This PR adds Map/Object handling across several layouts (Cache, RangeHashed, IPAddress, Polygon, HashedArray), but the new stateless test only validates FLAT and HASHED. Please add at least one integration/stateless coverage case for the newly touched non-FLAT/non-HASHED layouts to catch regressions in their distinct fetch paths.
Avogar
left a comment
There was a problem hiding this comment.
In general, loogs good. But we need more tests. Current tests only covers FLAT and HASHED, but the PR modifies 8 layouts. Please add coverage for at least HASHED_ARRAY, RANGE_HASHED, and CACHE.
Even for the tested layouts, these cases are missing:
dictGetOrDefault with a Map/JSON default value
Lookup of a missing key (default value materialization)
JSON attribute with explicitly typed paths, e.g. JSON(city String, age UInt32)
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
…upport_complex_types_for_dictionary
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
…upport_complex_types_for_dictionary
|
The test |
LLVM Coverage Report
Changed lines: 62.75% (320/510) | lost baseline coverage: 4 line(s) · Uncovered code |
ca31dea

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add support for Map and JSON/Object types as dictionary attributes. Now dictionaries can store and retrieve complex types including Map(String, String), Map(String, Array(String)), JSON, and Nullable(JSON) types in both FLAT and HASHED layouts.
Closes #86829
Detailed description
This PR adds support for complex types (Map and JSON/Object) as dictionary attributes, extending the existing dictionary functionality to handle nested and structured data types.
Supported dictionary layouts:
Documentation entry for user-facing changes
Documentation should include:
Motivation:
This PR addresses Issue #86829 which requested support for JSON type in dictionary attributes. Previously, attempting to create a dictionary with JSON attributes resulted in
DB::Exception: Unknown type JSON for dictionary attribute. (UNKNOWN_TYPE).Dictionaries are commonly used for lookups and data enrichment. Supporting Map and JSON types allows users to store and retrieve complex, nested data structures directly from dictionaries, enabling more flexible data modeling and reducing the need for multiple dictionary lookups or post-processing. This feature enables use cases such as:
dictGet('dict', 'json_field', key).nested.field)Parameters:
Map(KeyType, ValueType)where KeyType is typically String and ValueType can be String, Array, or numeric typesJSONorNullable(JSON)for JSON/Object datadictGet,dictGetOrDefault, etc.) work seamlessly with Map and JSON typesExample use:
Backward compatibility:
This change is fully backward compatible. Existing dictionaries continue to work as before, and the new complex types are additive features.
Note
Medium Risk
Expands multiple dictionary storage implementations and attribute handling to support
MapandObject/JSON, which touches core lookup and column materialization paths and could surface type/nullable edge cases or memory regressions.Overview
Adds first-class support for complex dictionary attribute types
MapandObject(includingJSON/Nullable(JSON)) across major dictionary layouts (e.g. Flat/Hashed/HashedArray/RangeHashed/IPAddress/Polygon/Cache), including proper column creation/insertion paths and container storage selection.Extends dictionary type plumbing by mapping
Map/ObjectinTypeId/AttributeUnderlyingType, updating default-value extraction forColumnMap/ColumnObject, and enablingStorageDictionarytosupportsDynamicSubcolumns()so JSON subcolumn access works via dictionary-backed tables.Adds explicit error handling for unsupported binary-encoded types and introduces new gtests plus a stateless SQL/reference test covering Map lookups, JSON field/subcolumn access, and
Nullable(JSON)behavior.Written by Cursor Bugbot for commit 7a2b61f. This will update automatically on new commits. Configure here.
Version info
26.4.1.895