Extend function tuple to return named tuple and add function tupleNames#54881
Conversation
|
Interesting - I needed this feature just yesterday! One thought - maybe we should make the But this is only if named tuples support both the |
Yeah. It's reasonable to have |
|
It works: Let's anticipate the analyzer enablement, and avoid adding a separate |
|
Casting to named tuples has been actually working without new analyzer for a long time, there is a test for that: https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/00547_named_tuples.sql. But it is really inconvenient to hardcode datatypes during casting. So |
I'd prefer this way of implementation. |
It breaks INSERT SELECT query when trying to insert into predefined named tuples from constructed named tuple with different tuple column names. See 01533_multiple_nested, 02475_bson_each_row_format, etc. It breaks geometry type construction when using named tuple. See 01306_polygons_intersection. It breaks JOIN analysis because we use tuple types to compare left and right columns. See 02861_join_on_nullsafe_compare, 02911_join_on_nullsafe_optimization, etc. These are found by fast tests. There might be other issues. |
| Returns a tuple by grouping input arguments. | ||
|
|
||
| For columns C1, C2, … with the types T1, T2, …, it returns a Tuple(C1 T1, C2 T2, …) type tuple containing these columns. There is no cost to execute the function. | ||
| If there are duplicate column names, subsequent names will be suffixed with `_N` starting from zero. For example, `tuple(C1, C1)` will return a Tuple(C1 T1, C1_0 T1). |
There was a problem hiding this comment.
If there are duplicate column names, subsequent names will be suffixed with
_Nstarting from zero. For example,tuple(C1, C1)will return a Tuple(C1 T1, C1_0 T1).
This doesn't sound very good. I mean such name would look quite random if that element of a tuple is not just a column in some table. I think maybe it's better instead to give the ability to specify names explicitly:
tuple(1 AS x, 2 AS y)
There was a problem hiding this comment.
It will be more ergonomic to allow named tuples from tuple(x, y) instead of tuple(x AS x, y AS y).
About identical column names - what about not assigning any names when there is ambiguity?
There was a problem hiding this comment.
It will be more ergonomic to allow named tuples from tuple(x, y) instead of tuple(x AS x, y AS y)
I agree, but it seems not very useful to generate names like plus(number, 1) for the second element of tuple tuple(number, number + 1).
There was a problem hiding this comment.
And also for SELECT tuple(3, 2, 1) AS t, t.1 there is ambiguity: should t.1 mean 1 or 3?
There was a problem hiding this comment.
what about not assigning any names when there is ambiguity?
But we can't assign some names of a tuple and not assign the other names of the same tuple.
Maybe it makes sense to assign names like element_1, element_2, ... and so on for each element of a tuple which is not a literal name, or if that element is a duplicate of another element. There would be still ambiguity with SELECT tuple(number, number + 1 AS element_1) AS t, t.element_1 (because both number and number + 1 should be element_1) but this case is tricky enough to just throw an exception that we can't deduct tuple names.
There was a problem hiding this comment.
What if a column is named as 1?
tuple(`1`, `2`)
There was a problem hiding this comment.
@amosbird, the tuple will be unnamed because 1 does not work as unquoted identifier.
There was a problem hiding this comment.
Numbers are dangerous as the names of tuple's elements because there could be confusion with accessing by index:
> SELECT CAST(('Hello', 'world') AS Tuple(`2` String, `1` String)) AS t, t.1, t.2, t.`1`, t.`2`
┌─t─────────────────┬─tupleElement(t, 1)─┬─tupleElement(t, 2)─┬─t.1───┬─t.2───┐
1. │ ('Hello','world') │ Hello │ world │ world │ Hello │
└───────────────────┴────────────────────┴────────────────────┴───────┴───────┘
So let's at least not make such named tuples implicitly.
There was a problem hiding this comment.
I guess
tuple(a AS `42`, b)
will also be unnamed then.
|
@amosbird We can make it under a setting, enable it by default, and put it under the |
Sure. I believe I've covered everything. Let's see if |
|
contrib/orc submodule is changed (fixed) intentionally. |
8360f1f to
012161b
Compare
01158_zookeeper_log_long #64790 |
|
Ok. It is almost ready to merge, but why the ORC submodule was updated? Let's reset it. |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240710) * Fix UT due to ClickHouse/ClickHouse#54881 * style --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>
|
Seems like it breaks compatibility for case Before: Now: |
|
Yes. The |
|
Can you create a fix for it? |
|
Sure. Will do it now. And we also need to handle true/false. |
…ble_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…when enable_named_columns_in_function_tuple=1 (default value)
…ction_tuple_tests Added some tests in relation with #54881
|
Hello. |
|
@Hubbitus it is in 24.7 |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Extend function
tupleto construct named tuples in query. Introduce functiontupleNamesto extract names from tuples.Documentation entry for user-facing changes